DUE to the small size and low cost of a sensor node, a

1992 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 10, OCTOBER 2015 A Networ Coding Baed Energy Efficient Data Bacup in Survivability-Heterogeneou Senor Networ Jie Tian, Tan Yan, and Guiling Wang Abtract Senor node deployed outdoor are ubject to environmental detriment and often need to cache data for an extended period of time. Thi paper introduce enor node which are robut to environmental damage, and propoe to utilize Networ Coding to bac up data in the robut enor for future data retrieval in an energy efficient way. Our goal i to help regular enor elect robut enor to bac up their data with low energy conumption, uch that when needed, all the data can be retrieved by querying only a ubet of robut enor. We formally formulate thi bacup problem, theoretically prove it NP-Completene, dicover two novel theoretical guideline for problem olving, and propoe two algorithm accordingly to tacle thi NP-C problem. The guideline are baed on random linear networ coding and provide lower bound of the number of robut enor that each regular enor hould chooe for data bacup, uch that the required fault tolerance i provided. A centralized algorithm and a ditributed algorithm are developed baed on the guideline uch that regular enor can bac up their data efficiently. Both analyi and imulation how our algorithm are effective in achieving fault tolerance, low energy conumption, and high retrieval efficiency. Index Term Heterogeneou enor networ, data bacup, networ coding Ç 1 INTRODUCTION DUE to the mall ize and low cot of a enor node, a wirele enor networ compoed of a large number of uch node can be deployed cloe to the phenomena or event of interet, monitoring them and generating data. The generated preciou data may not be able to be collected contantly and immediately conidering many contraint in the phyical world, epecially in remote and hotile area. For example, in Great Duc Iland, a enor networ ha been monitoring the habitat of wild bird [1]. The habitat data can only be collected from the enor occaionally to minimize the interference on bird natural life. To let enor increae the tranmiion power and remotely end data to human operator drain the battery power quicly or imply i infeaible if the ditance between a data collector and the enor networ i too large. Therefore, a enor networ ha to act a a ditributed data torage before data collection. The duration that data have to be cached in a enor networ varie from minute to month. The environment in which a enor networ need to cache data for an extended period of time before data collection are generally remote or le acceible. Epecially in thee environment, enor node, which are tiny electronic device, are ubject to environmental damage, uch a rain and fire. When a enor node die due to the phyical damage, the data in the node are lot. Therefore, it i important to bac up the data. To imply duplicate data in multiple The author are with the Department of Computer Science, New Jerey Intitute of Technology. E-mail: {jt66, ty7, gwang}@njit.edu. Manucript received 28 Apr. 2014; revied 29 Oct. 2014; accepted 3 Nov. 2014. Date of publication 24 Nov. 2014; date of current verion 31 Aug. 2015. For information on obtaining reprint of thi article, pleae end e-mail to: reprint@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TMC.2014.2374168 tiny enor node cannot provide enough fault tolerance becaue enor are liely to fail at the ame time when the harh environmental attribute act on them. For example, after a torm, mot of thee mall electronic device may fail imultaneouly. Even though after they are dried in unhine and are able to wor again, it i not liely that the lot data can be recovered. To deal with the problem, we propoe to incorporate enor node which are robut to environmental damage. We aume they are water-proof, can tolerate high temperature and withtand other environmental attribute. Conidering uch robut enor node are of higher cot, we propoe to contruct heterogeneou enor networ with both regular and robut enor. The focu of the paper i to deign cheme for regular enor to bac up data in robut enor. Our objective i to deign energy-efficient data-bacup cheme for the propoed heterogeneou enor networ to achieve high fault tolerance in harh environment. Conidering in harh environment, all regular enor may loe data after a torm and ome robut enor may fail due to energy depletion or other reaon, a deired cheme hould be able to tolerate the failure of all regular enor and a portion of the robut enor. In other word, by acceing any b out of n robut enor (b4n), all the data tored in the networ can be recovered. Exiting ditributed data torage ytem [2], [3], [4], [5] cannot be directly applied to olve our problem either becaue they cannot achieve the deired fault tolerance or becaue their ytem requirement i too high. For example, cluter baed torage ytem [2], [3] cannot tolerate the failure of all torage node in a cluter. Some coding-baed data torage ytem [4], [5] can tolerate that, but they are under the prerequiite that there are more torage node than the data node. Thi mean we have to budget more robut enor 1536-1233 ß 2014 IEEE. Peronal ue i permitted, but republication/reditribution require IEEE permiion. See http://www.ieee.org/publication_tandard/publication/right/index.html for more information.

TIAN ET AL.: A NETWORK CODING BASED ENERGY EFFICIENT DATA BACKUP IN SURVIVABILITY-HETEROGENEOUS SENSOR... 1993 than regular enor and the cot of a enor networ i greatly increaed. Moreover, the energy conumption of communication for bacup i not conidered in many exiting cheme. Thi paper aim to develop energy-efficient data bacup cheme which can provide required level of fault tolerance. To achieve the goal, we firt theoretically analyze the problem, formulate it a a weighted bacup problem, and prove it NP-Complete nature. We alo dicover two novel theoretical guideline baed on random linear networ coding atifying any of the two can guarantee the deired fault tolerance requirement: all data can be recovered by acceing any b out of the n robut enor. Baed on the two guideline, two bacup cheme are deigned to achieve fault tolerance and energy efficiency imultaneouly. Theoretical analyi and performance evaluation how that our cheme greatly outperform comparable one in term of fault tolerance, energy conumption and retrieval efficiency. The remainder of the paper i organized a follow. We formulate the problem and prove it NP-Completene in Section 2. Section 3 preent an overview of our olution. Guideline 1 and the algorithm baed on it are preented in Section 4. Section 5 preent Guideline 2 and the algorithm deigned on it. The dicuion about coefficient matrix i preented in Section 6. Section 7 report the imulation reult. The related wor i dicued in Section 8. Finally, we conclude the paper in Section 9. 2 PROBLEM FORMULATION In thi ection, we firt introduce the notation and data tructure employed. Then we formally define the problem and prove it i an NP-Complete problem. 2.1 Notation and Data Structure We conider a wirele enor networ compoed of regular enor and n robut enor, where n. Without lo of generality, we normalize the data torage of regular enor to be one unit and robut enor have -unit data torage, where 1. In the bacup, a regular enor bac up it one-unit data in ome robut enor. A each robut node can tore -unit data, a data collector ha to query at leat b robut enor to recover all the data, where b ¼ p q n. To enure the above inequality, we aume n ; otherwie, the problem become unolvable. We ue three type of graph to illutrate and analyze the data bacup cenario, Deployment Graph, Bacup Graph and Storage Graph, baed on which, we formulate Weighted Bacup Problem and formally prove it NP-Complete nature. Deployment graph. Given a Bipartite Graph G ¼ððV g þ V r Þ;EÞ, which contain two et of vertice, V g and V r. V g ð¼ fu 1 ;u 2 ;...;u gþ i the et of regular enor, and V r ð¼ fv 1 ;v 2 ;...;v n gþ i the et of robut enor. An edge ðu; vþ 2E, u 2 V g and v 2 V r, if and only if there i a routing path between enor u and enor v. W i a et of weight aociated to each edge ðu; vþ 2E, which i the routing cot between u and v. Bacup graph. Given a Deployment Graph G, the Bacup Graph G b ¼ððV g þ V r Þ;E b Þ i a graph uch that all the Fig. 1. Definition of graph. vertice are the ame a that in G, and an edge ðu; vþ 2E b, u 2 V g and v 2 V r, if and only if ðu; vþ 2E i in G and regular enor u bac up it data in robut enor v. Storage graph. For a Bacup Graph G b, aume each robut enor in V r ha unit of torage. The Storage Graph G i a graph contructed in the way uch that, for each vertex in V r, duplicate it and it connected edge time. Fig. 1a i an example of a Deployment Graph with four regular enor and three robut enor, where w ðu;vþ i the routing cot (weight) between vertice u and v. In the networ repreented by uch Deployment Graph, if regular enor f1; 2; 3; 4g bac up their data to robut enor f1; 2g, f1; 3g, f1; 3g, and f2; 3g, repectively, the contructed Bacup Graph i a hown in Fig. 1b. Fig. 1c i the correponding Storage Graph with each robut enor having two unit of torage. 2.2 Problem Definition The formal definition of the Weighted Bacup Problem i preented a follow. Definition 2.1 (Weighted Bacup Problem). Given a networ repreented by Deployment Graph G ¼ððV g þ V r Þ;EÞ, we aume each robut enor in V r ha ( jv g j=jv r j) torage unit. Let b ¼ p jvgj q jv rj. Our problem i: each regular enor in V g forward it data to ome robut enor in V r for bacup, uch that: (1) by picing any arbitrary b robut enor from V r, one can recover the data from all the enor in V g, and (2) the total forwarding cot i minimized. The Weighted Bacup Problem i NP-Complete. Before we preent the proof, we firt preent Lemma 2.1 and it corollary, Corollary 2.1. Then we prove the NP-Completene of the problem. Lemma 2.1. Conider a networ that can be repreented by a Deployment Graph G, the data from all the regular enor in G can be recovered if and only if the degree of each vertex in V g in the contructed Storage Graph G i no le than. Corollary 2.1. The data from all the regular enor in V g in a Deployment Graph G can be recovered if and only if the contructed Bacup Graph G b ¼ððV g þ V r Þ;E b Þ ha a Hitting Set with ize no larger than jv r j. That mean, in G b, every vertex in V g connect to at leat a vertex from V r.

1994 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 10, OCTOBER 2015 Fig. 2. Proof of Theorem 2.1. Lemma 2.1 and Corollary 2.1 can be proved by contradiction.inlemma2.1,ifinstoragegraphthereexita vertex in V g that ha degree le than, then thi vertex doe not have any edge in the correponding Bacup Graph becaue every edge in Bacup Graph hould be duplicated time in the Storage Graph. That mean thi regular enor doe not bacup it data to any robut enor, and thu it data cannot be recovered. Same for Corollary 2.1, the data of a regular enor cannot be recovered if it doe not connect to at leat a vertex from V r. Aume each regular enor bac up it data to at leat one robut enor. From Corollary 2.1 we can ee that, among all the robut enor, if we pic up a et of b enor, the piced enor can recover the data of all the regular enor if and only if the ize of the Hitting Set of the Bacup Graph contructed from the piced robut enor and all the regular enor i no larger than b. Thu, the objective of Weighted Bacup Problem in Definition 2.1 can be rewritten a follow: Given a Deployment Graph G ¼ððV g þ V r Þ;EÞ, elect a et of edge E 0 E, which i ubjected to: Subj. (1). Contructing a graph G b ¼ððV g þ Vr 0Þ;Eb Þ with: (1) all the vertice from V g in G, (2) an arbitrary et of b vertice Vr 0 V r, and (3) E b E 0 uch that E b connect V g and Vr 0 in G, one hould have the ize of the Hitting Set of G b no larger than b. Subj. (2). P ðu;vþ2e 0 w ðu;vþ i minimized. 2.3 NPC Proof Theorem 2.1. The Weighted Bacup Problem i NP-Complete. Proof of Theorem 2.1. To facilitate the proof, we formulate the deciion verion of the Weighted Bacup Problem a, given a poitive value W 0, i there a et of edge E 0 E that olve the problem with P ðu;vþ2e 0 w ðu;vþ W 0? Prove to be in NP. It i obviou to ee that Weighted Bacup Problem 2 NP, ince a nondeterminitic algorithm only need to gue a ubet of edge and chec in polynomial time to determine whether that ubet atifie both Subj. (1) and Subj. (2) and with the um of the weight le than or equal to W 0. Component contruction. We reduce Weighted Hitting Set Problem [6] to Weighted Bacup Problem by applying component contruction to the problem. Let an arbitrary intance of the Weighted Bacup Problem be given by the graph G ¼ðV g þ V r ;EÞ a defined in Definition 2.1. We contruct a graph intance bg ¼ðbV g þ bv r ; beþ through the following tep: 1) Duplicate V g ( jvrj b ) time and aign them to bv g. Thu, j bv g j¼ð jvrj b ÞjV gj. 2) Among all the vertice in V r, elect ( jv rj b ) combination and aign to bv r. 3) For each combination in bv r, aociate them one by one to a et of vertice in bv g. Each combination i conidered a a ubgraph of G. b 4) In each ubgraph generated in Step 3, for each vertex u in bv g and v in bv r, connect them and aociate them to correponding weight if there i an edge ðu; vþ 2E in the original graph G. 5) If there i a olution with a ubet of edge E 0 E for the Weighted Bacup Problem in original graph G, repeat Step 3; change E to E 0 in the etting of Step 4 and then repeat Step 4. 6) For each vertex u in G, b it weight cw u i the um of all it edge weight. Fig. 2 i an example of thi contruction with four regular enor, three robut enor, and b ¼ 2. The contructed graph G b i divided into ( jv rj b ) ¼ (3 2 ) ¼ 3 ubgraph, where the red rectangle and line are the vertice and edge added in Step 5. We now claim the Weighted Bacup Problem ha a olution with E 0 E and P ðu;vþ2e 0 w ðu;vþ W 0, if and only if the Weighted Hitting Set Problem in the contructed graph G b ha a olution bv ðbv g þ bv r Þ and P cw u2b u ðb 1ÞW 0. V P Reduction. If there i a olution with E 0 E and ðu;vþ2s w u;v W 0 that atifie Subj. (1), according to Step 5, we add each combination of V r and the correponding edge in E 0 to G. b Obviouly, each combination of V r i the Hitting Set of each ubgraph of G. b For example, in Fig. 2, the vertice f1 0 ; 2 0 g are the Hitting Set for vertex et f1; 2; 3; 4g in the left in ubgraph 1, and f1 0 ; 3 0 g and f2 0 ; 3 0 g are for ubgraph 2 and 3, repectively. Furthermore, all vertice added in Step 5 together are the Hitting Set of the entire graph G. b Now we chec the weight. In original graph G, to atify Subj. (1), the um of all the weight in S i no larger than W 0, In the contructed graph G, b according to Step 5, by adding ( jvrj b ) combination of enor from V r, each vertex in V r i

TIAN ET AL.: A NETWORK CODING BASED ENERGY EFFICIENT DATA BACKUP IN SURVIVABILITY-HETEROGENEOUS SENSOR... 1995 added exactly ðb 1Þ time in thi tep, and thu each edge in the olution E 0 and the correponding weight are added exactly ðb 1Þ time. Since all vertice added in Step 5 together are the Hitting Set of G b and we duplicate E 0 exactly ðb 1Þ time, in total the um of the weight of all the vertice are no larger than ðb 1ÞW 0. Converely, if there i a olution bv ðbv g þ bv r Þ to the contructed P Weighted Hitting Set Problem with cw u2b u ðb 1ÞW 0, imply pic all the unique vertice from bv r and the correponding edge. It i eay to V ee the piced edge are the olution to the Weighted Bacup Problem. Moreover, ince every piced edge appear exactly ðb 1Þ time for ( jv rj b ) combination, the um of the weight of all the piced edge are le than ðb 1ÞW 0 =ðb 1Þ ¼W 0. Concluion. Therefore, the Weighted Hitting Set Problem i reducible to the Weighted Bacup Problem. Since all the component contruction and reduction are done in polynomial time, the original Weighted Bacup Problem i NP-Complete. tu Tae Fig. 2 a an example. If W 0 ¼ 26, the figure i a ye intance to the Weighted Bacup Problem a the um of weight of all the red edge i 25 W 0. 3 SOLUTION OVERVIEW The objective of the addreed problem are to meet the required fault tolerance and minimize energy conumption. The problem i NP-Complete and no polynomial-time algorithm can provide an optimal olution. Our realitic goal i to deign algorithm which can meet the fault tolerance requirement and have a low energy conumption even though the conumption i not minimized. We adopt the random linear networ coding framewor to provide the required fault tolerance. Baed on networ coding, regular enor bac up data on robut enor in an encoded format; encoded data on robut enor are retrieved and original data can be recovered. Inide thi framewor, our algorithm pecify how a regular enor determine at which robut enor it data i baced up, uch that by querying b out of n robut enor, all the data generated in the networ can be recovered. Our trategy i to firt dicover condition atifying which the fault tolerance requirement can be met, and then deign centralized and ditributed algorithm which can atify thoe condition and have low energy conumption. In thi ection, we firt introduce the bacground of random linear networ coding. Then we preent the aumption and energy model in data bacup and recovery. After that, we preent how to ue the networ coding technique to do data bacup and recovery. Finally, we preent Lemma 3.1 which i the foundation of our dicovered condition. We name the condition Guideline 1 and Guideline 2. The two guideline and the deigned algorithm baed on them are preented in Section 4 and 5, repectively. 3.1 Bacground on Random Linear Networ Coding Random linear networ coding i widely ued in data torage ytem [7]. In networ coding, each one-unit data d i i viewed a an element over the finite field GFð2 q Þ. m-unit original data Dð¼ fd 1 ;...;d m gþ for a ource node can be encoded into -unit data Xð¼ fx 1 ;...;x gþ and tored in a torage node with at leat -unit pace. Here m can be equal to or le than or more than. To perform the encoding, an m coefficient matrix G i choen, each element of which i uniformly and independently generated on GFð2 q Þ. The encoded data X ¼ D G. Thu, in addition to toring X, each torage node alo need to tore the coefficient matrix G for encoding and decoding. The coefficient matrix occupie mq bit. To recover n-unit data which originated from one or more ource data node with a high probability, a data collector need to retrieve n-unit encoded data along with their coefficient from one or multiple torage node. A linear ytem of n linear equation and n variable i generated and then olved to retrieve original n-unit data. A neceary condition to encode and decode uccefully i that the coefficient vector mut be linearly independent. A hown in [8], in a large enough field, the probability of linear independency in coefficient i cloe to 1 and thu the ucce ratio of decoding i cloe to 1. For example, the probability i over 99:6 percent when q ¼ 8. Our wor i baed on the above reult. 3.2 Aumption and Energy Model In the paper, we mae the following aumption: In the networ, robut enor are aumed to be ynchronized ince a regular enor bac up data in multiple robut enor and the verion conitency i an iue. The ynchronization can be achieved by many mature technique with low overhead [9], [10], [11]. The ynchronization between regular enor i not required. The ynchronization between a regular enor and a robut enor i not required either. In term of the energy conumption, we adopt the energy model propoed in [12]. According to [12], the energy conumed in tranmitting and receiving a meage with l-bit over a ditance d i denoted by E Tx ðl; dþ and E Rx ðlþ, repectively. The ditance d i called tranmiion ditance. The formula to calculate E Tx ðl; dþ and E Rx ðlþ are a follow: E Tx ðl; dþ ¼l ðe elec þ d a Þ; E Rx ðlþ ¼l E elec ; where E elec i the electronic energy depending on factor uch a the digital coding, modulation, filtering, and preading of the ignal, 2f f ; mp g i the tranmitter amplifier in the free-pace ð f Þ model or the multipath ð mp Þ model, and a i the path-lo exponent, with 2 a 4. The energy model ha been widely adopted in many application and protocol deign [13], [14], [15]. 3.3 Data Storing and Recovery In our heterogeneou enor networ, the data produced by all the regular enor are to be baced up in the n robut enor in an encoded and redundant way, uch that by

1996 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 10, OCTOBER 2015 Fig. 3. A data bacup example. querying any b robut enor, the original data can be recovered. To be implicity, we normalize the data generated in each regular enor to be one unit between two conecutive bacup. In the bacup, each regular enor end thi one unit data to a number of robut enor. 1 Since all the bacup procee are the ame, in the following, we only focu on one-time bacup. In each data bacup, a robut enor tore -unit encoded data. i calculated by p b q. When a robut enor receive data from m regular enor, it encode m-unit data into -unit data uing random linear networ coding. It i obviou that m4. In addition, a robut enor alo tore a coefficient matrix, which i ued for the every encoding and decoding. In the data recovery, a data collector retrieve all the coefficient from b robut enor, and f-unit encoded data. Here f i the bacup frequency between two data retrieval. 2 Then a linear ytem involving equation and variable i built and ued to decode the original f-unit data. We ue the data bacup in Fig. 1b a an example to illutrate the data bacup and recovery proce. In thi example, regular enor u 1 bac it data up at robut enor v 1 and v 2. u 2 bac it data up at v 1 and v 3. u 3 bac it data up at v 1 1. The ize of data generated in each regular enor can be varied. The bacup in each robut enor i only performed on a predefined unit of data ize, which i q bit a mentioned in Section 3.1. When the data ize i larger than predefined data ize, the data are eparated into everal bloc, each of which i encoded in the robut enor eparately. 2. The bacup in every robut enor i performed at a frequency predefined by the application, uch a once an hour. The frequency can be dynamically adjuted baed on data generating peed, weather condition, and failure model of the regular enor. If a regular enor fail before the next bacup, it data are lot and cannot be recovered. However, ince the weather information i nown beforehand, thi frequency can be adjuted uch that data can be baced up before the next torm and uch that they will not be lot. and v 3. u 4 bac it data up at v 2 and v 3. Therefore, robut enor v 1 receive three-unit data in total from three regular enor, u 1, u 2, and u 3. v 2 receive two-unit data from u 1 and u 4. v 3 receive three-unit data from u 2, u 3, and u 4. (A for why the regular enor chooe thee robut enor to do bacup, it i determined by our algorithm, which will be preented in the next two ection.) Given thi bacup tructure, robut enor v 1 and v 3 will generate ix coefficient (a 3 2 coefficient matrix), while v 2 generate four coefficient (a 2 2 coefficient matrix), ince v 1 and v 3 need to encode three-unit data into two-unit data while v 2 need to encode two-unit data into the ame ize data. Once the coefficient matrice are determined in the networ initialization phae, they are ued throughout the networ lifetime for each data bacup. The above bacup plan calculated by our algorithm guarantee that by querying any two robut enor, all the four unit data can be recovered. Without lo of generality, we aume v 1 and v 2 are queried. In the following, we will how the data encoding and decoding proce at v 1 and v 2. Let d j denote the data from regular enor u j, and x ðlþ i denote the lth unit encoded data in robut enor v i. Let g ðlþ i;j denote the coefficient of robut enor v i for encoding the data received from regular enor u j to generate the lth unit encoded data. The encoding proce at v 1 and v 2 i illutrated at Fig. 3a and 3b, repectively. After receiving data from regular enor, the robut enor multiply them by their coefficient matrice and generate the encoded data. The encoding proce can be viewed from a different perpective. All the data generated by thi imple networ (d 1, d 2, d 3 and d 4 ) i a row of data. Baed on the coefficient matrice of v 1 and v 2, a global 4 4 matrix M can be generated, a hown in Fig. 3c. In matrix M, certain coefficient are zero becaue correponding regular enor and robut enor do

TIAN ET AL.: A NETWORK CODING BASED ENERGY EFFICIENT DATA BACKUP IN SURVIVABILITY-HETEROGENEOUS SENSOR... 1997 Fig. 4. A data recovery example. not have thi bacup relationhip. Multiplying the row of data with the coefficient matrix reult in the encoded data. Thu, to recover the data, we can imply multiply the encoded data with the invere of the coefficient matrix. When a data collector need to retrieve data, it requet the encoded data and coefficient matrice from robut enor v 1 and v 2, a illutrated in Fig. 4. Note that it only need to requet the coefficient matrice once and tore them to avoid unneceary communication overhead. Then the collector can build the 4 4 coefficient matrix M out of the received matrice from v 1 and v 2 and calculate it invere. Multiplying the four-unit row of encoded data and the invere can recover the original four-unit data. Note that the matrix invere mut exit in order to recover the original data. Our algorithm calculate the bacup chedule, following the chedule can guarantee the exitence of M invere with a high probability. In the paper, the detailed proce of data collection from b robut enor to a certain data collector to recover original data i not conidered. There are everal exiting method [16], [17] about how to collect the data in an energy efficient way. In the following, we provide a preliminary analyi. Then in the next two ection, we preent our algorithm and prove they can provide uch a guarantee. 3.4 Preliminary Analyi Our objective i that a data collector can recover all the original -unit data after it querie arbitrary b robut enor. By querying arbitrary b robut enor, the coefficient matrix M can be obtained from the contructed ytem of linear equation. Obviouly, whether there i a olution in the linear ytem and the original unit data can be recovered depend on whether coefficient matrix M i noningular, which mean the determinant of matrix M i not 0. Beforepreentingthe neceary and ufficient condition we have derived to guarantee M i noningular, we firt define two graph. Bacup SubGraph. Given a Bacup Graph G b ¼ððV g þ V r Þ; E b Þ, a correponding Bacup SubGraph G b ub ¼ððV g þ Vr 0Þ; Eub b 0 Þ i a graph in which Vr i an arbitrary ubet of V r containing b vertice, and Eub b i the correponding ubet of Eb connecting Vr 0 and V g. Given a data bacup cheme expreed by a Bacup Graph, querying b arbitrary robut enor can be expreed by the correpondingly generated Bacup SubGraph. Storage SubGraph. The Storage SubGraph G ub ¼ððV gþ Vr þþ;e ubþ i a bipartite graph contructed from a Bacup SubGraph G b ub ¼ððV g þ Vr 0Þ;Eb ubþ by the following two operation: (1) for each vertex in Vr 0, duplicate it and it connected edge time to contruct Vr þ and Eub ; (2) identify the mallet degree vertex in Vr 0 and remove b vertice in Vr þ which are duplicated from thi vertex and their correponding edge. Note that after the firt operation in Storage SubGraph contruction, Vr þ ha b vertice, which can be greater than or equal to. If b >, b vertice need to be removed from Vr þ to contruct a bipartite graph. Since >b, we can chooe one vertex in Vr 0 and remove b vertice which are duplicated from thi vertex. To maintain mot of graph information, we chooe the vertex with mallet degree in Vr 0 and remove b vertice duplicated from thi vertex. Among all vertice in Vr þ, thee b vertice mut have mallet number of edge a well and removing them reult in the leat information lo. After the econd operation, both Vr þ and V g have vertice in the graph. So in G ub ¼ððV g þ Vr þþ;e þ ubþ, jvr j¼jv gj¼. Conidering a Bacup SubGraph repreenting a data collector querying b arbitrary robut enor, a coefficient matrix M can be obtained from the correponding Storage SubGraph. We dicover that to chec the ingularity of M i equivalent to chec whether there exit a perfect matching in the Storage SubGraph, motivated by Edmond

1998 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 10, OCTOBER 2015 Theorem [18], which tate a connection between the determinant of a matrix and graph matching in a bipartite graph. We expre the dicovery in Lemma 3.1. Lemma 3.1. The coefficient matrix M i noningular with a high probability if and only if there exit a perfect matching in G ub. Lemma 3.1 can be eaily proved baed on Edmond Theorem. A ey thing in Lemma 3.1 i perfect matching. In a bipartite graph, a perfect matching i a et of edge uch that no two edge hare a common vertex and no vertex i iolated. Baed on Lemma 3.1, to develop a data bacup cheme atifying our objective reduce to elect edge from Deployment Graph G uch that there alway exit a perfect matching in any arbitrarily generated G ub. Baed on the theoretical foundation, we derive Guideline 1 and Guideline 2. 4 GUIDELINE 1 AND ASSOCIATE ALGORITHM 4.1 Guideline 1 Guideline 1 pecifie the minimum number of robut enor that a regular enor hould randomly chooe to bac up it data uch that the required level of fault tolerance can be provided without impoing any other condition. Guideline 1. It i ufficient to guarantee that a data collector can decode all the data with a high probability by querying any arbitrary b robut enor, if every regular enor randomly chooe at leat d5 n lnðþe robut enor to bac up it data. Guideline 1 i derived from the following two theorem: Theorem 4.1 and Theorem 4.2. Theorem 4.1 indicate that VðlnðÞÞ i the minimum magnitude of the number of robut enor at which a regular enor hould bac up it data to achieve the required level of fault tolerance. We ue c 1 lnðþ to denote the lower bound. Theorem 4.2 preent the value of c 1. Theorem 4.1. If every vertex in V g (a regular enor) elect vertice in V r (robut enor) independently and randomly, it mut elect VðlnðÞÞ robut enor to enure detðmþ 6¼ 0 with a high probability. Proof of Theorem 4.1. To enure detðmþ 6¼ 0 with a high probability, every vertex in Vr þ i at leat covered by one vertex in V g in G ub. Otherwie, if there i even one vertex in Vr þ with no edge, there i 1 column of zero in the contructed coefficient matrix M, reulting in a ingular M. In other word, all vertice in V g can be viewed a a big vertex, and the big vertex need to cover every vertex in Vr þ.then it become to a claic problem coupon collector problem. 3 The big vertex act a the collector, which collect different vertice randomly. According to [18], the big vertex need to at leat randomly connect vertex in Vr þ for blnðþ time to cover all vertice with a high probability under ome b. (More detail are preented in the proof of Theorem 4.2.) Then for each vertex in V g, the number of random connection i blnðþ= ¼ blnðþ, which i at the 3. In the coupon collector problem, there are n type of coupon and at each trial a coupon i choen randomly. Each randomly elected coupon can be one of the n type at equal probability. The random election are mutually independent. Let m be the number of trial. The goal i to tudy the relationhip between m and the probability of collecting at leat one coupon of each of the n type. magnitude of lnðþ.sinceg ub i linearly tranformed from G b ub and Gb ub i a ubet of Gb, the number of random connection for every regular enor i at the magnitude of lnðþ in G b. Therefore, each regular enor node mut connect VðlnðÞÞ robut enor when the connection are made independently and uniformly. tu Theorem 4.2. When c 1 5 n, there exit a perfect matching in any arbitrarily generated G ub with a high probability. Before theoretically deriving the value of c 1, we firt preent a lemma about perfect matching in bipartite graph[19]. Lemma 4.1. Let G bi be a bipartite graph with vertex clae V g and Vr þ, where jv gj¼jvr þj¼. Suppoe G bi ha no iolated vertice and it doe not have a perfect matching. Then there i a et A V g or Vr þ uch that: i) GðAÞ ¼fv j : ðv i ;v j Þ2EðG bi Þ for ome v i 2 Ag ha jaj 1 element, ii) the ubgraph panned by A [ GðAÞ i connected and iii) 2 4 jaj4ð þ 1Þ=2. Lemma 4.1 i ued to analyze the cenario that a perfect matching exit in G ub with no iolated vertice in V g and Vr þ. Proof of Theorem 4.2. Baed on Lemma 4.1, G ub ha no perfect matching only in the following two cae: Cae I: there exit a et A atifying Lemma 4.1. Cae II: G ub ha one or more iolated vertice, denoted a I. Thu, the probability that G ub ha no perfect matching i P ð9i S 9AÞ Pð9IÞþP ð9aþ. We firt analyze P ð9aþ in cae I. From Lemma 4.1, the ize of A varie from 2 to ð þ 1Þ=2. Therefore, Pð9AÞ ¼P ðþ1þ=2 [ a¼2! ð9a; jaj ¼aÞ ðþ1þ=2 X a¼2 P ð9a; jaj ¼aÞ: Furthermore, et A can be the ubet of V g (cae I.a) or the ubet of Vr þ (cae I.b). In cae I.a, A V g. In cae I.b, A Vr þ. Then we have: ðþ1þ=2 P ð9aþ X ðpð9a V g ; jaj ¼aÞ a¼2 þ P ð9a Vr þ ; jaj ¼aÞÞ: In the following, we will calculate P ð9a V g ; jaj ¼aÞ and P ð9a Vr þ ; jaj ¼aÞ, repectively. Cae I.a. A V g. When et A exit in G ub, there mut be a et A 0 in G b ub, which contain the ame element a et A. One example i hown in Fig. 5, in which ¼ 6, ¼ 2 and b ¼ 3. Then et A ¼f1; 2; 3g in G ub i illutrated in Fig. 5a, and et A 0 ¼f1; 2; 3g in G b ub i illutrated in Fig. 5b. Since G ub i equivalently converted from Gb ub, then we have Pð9A 0 g ; ja0 j¼aþ ¼P ð9a V g ; jaj ¼aÞ. Now, let u conider a general cae. Suppoe there are a node in a et A 1 V g and a 1 node in a et A 2 Vr þ in G ub, where GðA 1Þ¼A 2. Then there mut be p a 1 q (1) (2)

TIAN ET AL.: A NETWORK CODING BASED ENERGY EFFICIENT DATA BACKUP IN SURVIVABILITY-HETEROGENEOUS SENSOR... 1999 Therefore, in order to let the probability to be cloe to 0, it need to how that: Pð9A V g ; jaj ¼aÞ ¼oð1Þ; 8a 2½2; ð þ 1Þ=2Š (4) a!1. From Stirling approximation, we obtain the bound a ð e a Þa. Let Y¼1 aþ1 n, we have Pð9A V g Þ e a a 1 e Y c 1 alnðþ ¼ e F a a 1 F¼lnðÞþaln e þ a 1 e ln þ c 1 alnðþlnðyþ: a a 1 (5) Fig. 5. One example of Cae I.a. node A 0 2 V r 0 contradiction. in G b ub. Such value can be proved by Proof by contradiction. Let Q denote the quotient and R denote the remainder of a 1, repectively. Then we have p a 1 q ¼ Q if R ¼ 0 and p a 1 q ¼ Q þ 1 if R>0. Aume ja 0 a 1 2j¼D<p q in Gb ub. Then after contructing a Storage Subgraph G ub, there mut be D node in A 1 in G ub. If R ¼ 0, ja0 2 j¼d<q and then ja 1j¼D < Q ¼ a 1. If R>0, ja 0 2j¼D<Qþ 1 and then D Q. Then we have ja 1 j¼d Q. Since R>0, Q <Q þ R ¼ a 1 and thu ja 1 j <a 1. Therefore both cae conflict the aumption that ja 1 j¼a 1. Similarly, we can prove that ja 0 2j can not be greater than p a 1 q. Thu, the number of node in et A0 2 in Gb ub i p a 1 q. The probability that et A ¼ A 1 with GðAÞ ¼A 2 atifying Lemma 4.1 in G ub i equal to the probability that all the edge tarting from A 1 connect A 0 2 in Gb ub. Note that every node in V g pic c 1 lnðþ neighbor from the et V r in G according to Theorem 4.1. We calculate the probability P ð9a 0 V g ; ja 0 j¼aþ by allowing c 1 alnðþ edge tarting from A 1 to land in A 0 S 2 ðvr Vr 0 Þ hown in Fig. 5b. Since ja 0 a 1 2j¼p q and jðv r Vr 0Þj ¼ n p q,we have ja 0 S 2 ðvr Vr 0Þj ¼ n p q þ p a 1 q. The the probability that a edge tarting from A 1 to land in A 0 S 2 ðvr Vr 0Þ i n p qþpa 1 q n. There are ( a ) choice for A 1 and ( p q p a 1q) choice for A0 2. Then we have: P ð9a V g Þ ¼ P ð9a 0 V g Þ ðþ1þ=2 X a a¼2 ðþ1þ=2 X a a¼2! q ð1 p q p a 1 q n! ð1 a þ 1 Þ c1alnðþ : n p q p a 1 a 1 Þ c 1alnðÞ We can alway bound the above ummation by the maximum value of ( a )( )ð1 aþ1 Þ c 1alnðÞ time. a 1 n (3) When e F ¼ oð1þ, it i ufficient to have F < 0 and thu the coefficient of lnðþ be negative. Since tend to þ1, we need to have: a þ a 1 þ ac 1 lnðyþ þ 1 < 0; (6) which give u a bound for c 1 : c 1 > 1 þ a þ a 1 : (7) alnðyþ Notice that Y < 1, ince n and >a. It i poible to atify thi inequality for a poitive c 1. Thi bound hould be true for every a 2½2; ð þ 1Þ=2Š. So we have: and 1 þ a þ a 1 a 5 2 Y1 þ 1 2n : (9) Then according to Taylor Approximation, we have and 1 þ a þ a 1 alnðyþ 1 lnðyþ < 2n þ 1 (8) (10) < 2n þ 1 5 2 < 5n : (11) Therefore, a ufficient condition for 9A V g i c 1 5n (12) Cae I.b. A Vr þ. With imilar analyi, we can obtain a bound if A Vr þ. When et A exit in G ub, there mut be a et A 0 in G b ub. Set A0 contain p jaj q element, which can alo be proved by the imilar contradiction preented in cae I.a. A the example hown in Fig. 6, in which ¼ 6, ¼ 2 and b ¼ 3, et A ¼f1; 2; 3g in G ub i illutrated in Fig. 6a, and et A 0 ¼f1 0 ; 2 0 g in G b ub i illutrated in Fig. 6b. Then we have P ð9a 0 Vr 0; ja0 j¼aþ ¼Pð9A ; jaj ¼aÞ. V þ r

2000 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 10, OCTOBER 2015 for c 1, it uffice to how that c 1 lnðþ bnlnðnþ > 1 (15) Thu, b < 5lnðÞ lnðnþ : (16) Fig. 6. One example of Cae I.b. We aume a et A 2 Vr þ with a node and a et A 1 V g with a 1 node in G ub. Then we have p a q node in et A 0 2 V r 0 in Gb ub. To atify LEMMA 4.1 with A ¼ A 2 in G ub, we require that all edge that lin to A 2 land in et A 1. To have GðA 2 Þ¼A 1 in G ub, all the edge tarting from V g A 1 mut land outide A 0 2 of Gb, which i V r A 0 2 a hown in Fig. 6b. There are c 1 lnðþ c 1 ða 1ÞlnðÞ uch edge and every edge ha a probability 1 pa q n to land outide A0 2 of Gb. Thu we have: P ð9a Vr þ Þ¼P ð9a0 Vr 0 Þ ðþ1þ=2 X p q a 1 p a q ð1 p a q n Þc 1lnðÞ c 1 ða 1ÞlnðÞ : a¼2 (13) After imilar calculation, it can be een that when c 1 > nþn aþ1, Pð9A V r þ Þ¼oð1Þ a!1. Notice that nþn aþ1 i an increaing function a a increae. So the maximum value of p q a 1 p a ð1 pa q q n Þc 1lnðÞ c 1 ða 1ÞlnðÞ i obtained at a ¼ 2 or a ¼ þ1 2, whichever i larger. After we examine the extreme cae, it can be hown that c 1 5 n i required to atify both extreme cae. Cae II. There exit iolated vertice in G ub. A vertex in G ub i iolated only when it ha no neighbor. It i obviou that if there are iolated node in G ub, they can only be in Vr þ. It alo mean that iolated node can only be in V r of G. In other word, we need to how that each vertex in V r i at leat covered by one vertex in V g with a high probability. The problem alo become a coupon collector problem a dicued in Theorem 4.1. All regular enor acting a a collector to randomly collect n different robut enor. Let C denote the number of total edge required to cover all n vertice in V r. According to the analyi reult of the coupon collector problem in [18], we have: P ½C >bnlnðnþš n ðb 1Þ : (14) To atify uch condition, we need to have b 2 uch that the probability tend to 0. From above calculation For a reaonable deployment, i alway larger than n, which mean the number of robut enor i le than the regular enor, and i alway larger than 1. Thu we can alway find ome b 2 with c 1 5 n to atify the condition that each vertex in V r i covered by at leat one vertex in V g with a high probability. Concluion. Since c 1 5 n i ufficient for a perfect matching exiting in any arbitrary G ub with a high probability, we prove the theorem. tu A!1, the probability of the exitence of a perfect matching approache 1. Since generally a enor networ i compoed of a large number of regular enor, the probability i almot 100 percent. 4.2 A Robut Randomized Algorithm Following Guideline 1, a robut randomized algorithm i developed: every regular enor randomly chooe d5 n lnðþe robut enor to bac up it data. The algorithm i imple and robut. In the algorithm, the random election of robut enor through the networ i a mut. Thi mean when we chooe lnðþe cloet robut enor to bac up data, the randomne i detroyed and the condition of previou theorem are violated. (In the future wor, we will tudy whether we can acrifice certain randomne to incorporate election rule baed on bacup cot.) After the enor deployment, energy-efficient route between any two enor can be etablihed by exiting routing protocol [13], [14], [15], which contruct route baed on enor location and the ame energy model preented in Section 3.2. When the regular enor tart data bacup proce, they elect robut enor for d5 n lnðþe time randomly and independently. All robut enor ID are preloaded in regular enor before networ deployment. The randomne of the election can be eaily achieved by randomly electing robut enor ID. Then every regular enor decide the robut enor that it bac up it data and end it data to uch robut enor via etablihed route. When a robut enor i elected more than once by the ame regular enor, the algorithm till wor effectively according to previou analyi. d5 n 5 GUIDELINE 2 AND ASSOCIATE ALGORITHM 5.1 Guideline 2 Guideline 2 pecifie two condition that mut be atified imultaneouly uch that the required level of fault tolerance can be achieved. Guideline 2. It i ufficient to guarantee that a data collector can decode all the data with a high probability by

TIAN ET AL.: A NETWORK CODING BASED ENERGY EFFICIENT DATA BACKUP IN SURVIVABILITY-HETEROGENEOUS SENSOR... 2001 querying any arbitrary b robut enor, if the following two condition are atified imultaneouly: Every regular enor bac up it data at at leat c 2 ¼ n b þ 1 different robut enor. Every robut enor receive data from at leat different regular enor. Guideline 2 i derived from following theorem. Theorem 5.1. Given a Deployment Graph G ¼ððV g ;V r Þ;EÞ, in which the degree of each vertex in V g i no le than c 2 ¼ n b þ 1 and the degree of each vertex in V r i no le than, it i ufficient to guarantee the exitence of a perfect matching in any G ub generated from G. Proof of Theorem 5.1. We prove the theorem by contradiction. Aume a vertex u i in V g i connected to c 0 vertice in V r, where c 0 n b<n b þ 1. There are alway at leat bð¼ n c 0 Þ vertice in V r, which are not connected to u i in V g. When G ub i contructed from thee b vertice, there i alway a row of 0 in matrix M. Thu M i ingular. According to Lemma 3.1, there doen t exit a perfect matching. Aume a vertex v j in V r i connected to c 00 vertice in V g, where c 00 1 <. When G ub i contructed from uch vertex v j, vertice duplicated from v j in Vr þ mut connect c 00 <vertice in V g. Then we can alway find a ubet A of the et containing uch vertice atifying Lemma 4.1. Thu, there doen t exit a perfect matching. Therefore, we prove the theorem. tu 5.2 A Centralized Algorithm Baed on Guideline 2, we develop a centralized algorithm to determine which regular enor bac up it data in which robut enor uch that the energy conumption in communication i minimized. In the algorithm, a central erver firt perform a one-time networ dicovery to obtain the location of each enor and then determine route and calculate energy cot between regular enor and robut enor baed on the location information. Then it calculate the bacup chedule at the networ initialization phae. Conider that any arbitrary election of robut enor atifying the condition of Guideline 2 can guarantee the ucceful decoding of all the data. Thu we aim to elect robut enor which can minimize energy conumption in data bacup. Note that in Section 2, we have proved the NP-Completene of the Weighted Bacup Problem and we cannot have an optimal olution. Baed on the random linear networ coding technique and Guideline 2, a centralized algorithm can be deigned to tacle the problem. It general idea i to let a robut enor chooe regular enor with leat energy cot to receive data from and let each regular enor chooe c 2 ð¼ n b þ 1Þ robut enor with leat energy cot to end data to. In the following, the proce of the centralized algorithm i preented in detail. Same a the randomized algorithm, after the enor deployment, energy-efficient route are contructed in the networ by exiting routing protocol. The central erver etablihe a cot matrix to tore the energy cot of the route between every regular enor and every robut enor. Then the algorithm run the following three tep to chooe robut enor for each regular enor to bac up it data. Step 1: each robut enor elect regular enor with the leat energy cot from the cot matrix to receive bacup data. Step 2: each regular enor elect c 2 robut enor with the leat energy cot from the cot matrix to bac up it data. Step 3: remove the redundant edge, the removal of which will not violate the two condition in the Guideline 2. After the firt two tep, both two condition in the Guideline2candefinitelybeatified.Notethat,theminimum unit of data that have to be ent by all regular enor, which i c 2, i generally larger than the minimum unit of data that have to be received by robut enor, which i n. Thu in the Bacup Graph, only c 2 edge are required ideally. However, ince the bacup election between regular enor and robut enor may not be overlapped, the edge elected after the firt two tep can be at mot c 2 þ n. Thi mean there are redundant election. The final tep i to remove the redundant edge in the graph. The algorithm examine every edge from highet cot to lowet cot. If the removal of a edge doe not violate the two condition in the Guideline 2, then thi edge i a redundant bacup edge and i removed from the Bacup Graph. The algorithm i formally preented in Algorithm 1. Algorithm 1. A Centralized Algorithm Output: B: a n bacup matrix. b ij ¼ 1 indicate regular enor u i bac up data in robut enor v j and 0 indicate otherwie. b i and b j repreent the correponding row/vector of B. W: a n energy cot matrix. w ij indicate the energy cot from regular enor u i to robut enor v j. w i and w j repreent the correponding row/vector of W. 1: Calculate route between any two enor. 2: for all v j 2 V r do 3: for all u i 2 V g do 4: Put route cot from u i to v j in w ij. 5: end for 6: end for 7: for all v j 2 V r do 8: Chooe regular enor with leat energy cot in W for bacup. 9: Update b j and w j. 10: end for 11: for all u i 2 V g do 12: Chooe c 2 robut enor with leat energy cot in W. 13: Update b i and w i. 14: end for 15: Find a w ij with the larget value in W. 16: while w ij i nonzero do 17: Remove b ij from B and w ij from W, the removal of which will not mae regular enor u i bac up data at le than c 2 robut enor, and robut enor v j receive le than unit of data. 18: Go to find w ij with the next highet value in W. 19: end while

2002 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 10, OCTOBER 2015 6 DISCUSSION ABOUT COEFFICIENT MATRIX The coefficient matrix i dynamically determined after the deployment. The determination proce i a follow: Before the deployment, each robut enor i preloaded with a large matrix which wor for all the regular enor. After the deployment and the bacup relationhip i determined by our algorithm, only a ubet of thi large matrix i needed and thu the unued column/row are deleted to ave pace, reulting in the coefficient matrix that we need. 7 PERFORMANCE EVALUATION 7.1 Simulation Methodology and Setting In thi ection, we evaluate the randomized algorithm baed on Guideline 1 and the centralized algorithm baed on Guideline 2. Our objective in conducting the evaluation tudy i two-fold: (1) Evaluating the efficiency of our algorithm in minimizing energy conumption while providing required fault tolerance; (2) Teting the performance of our algorithm under different ytem parameter. To evaluate our algorithm, four metric are employed: (1) total energy conumption per bacup, (2) energy conumption per node, (3) number of torage unit needed for data bacup per robut node and (4) whether the required level fault tolerance i achieved. Since the centralized algorithm require cot information about the communication between a regular enor and a robut enor, we evaluate the centralized algorithm under the ituation when the cot can be obtained from other application running in the enor networ, and the ituation when the cot i not available and need to be dicovered by a flooding. We compare our algorithm with DISC protocol [3], which ha the ame objective a our. In DISC protocol, the networ i compoed of data node and torage node, and the networ i divided into cluter. In each cluter, one of the torage node i randomly elected a the cluter head for data bacup. In one bacup, all data node bac up their data in their cluter, and every cluter head then bac up it data in a neighbor cluter. In our implementation, every robut enor i viewed a a torage node and every regular enor i viewed a a data node in DISC. To have a fair comparion, we let the cluter head in DISC bac up data at neighboring cluter multiple time following their election algorithm to achieve the comparable fault tolerance a our deigned algorithm. If we bac up data only once trictly following DISC, one cluter data are tored in only two robut enor in different cluter. Then the data recovery will fail with a high probability, ince a large amount of robut enor don t have any data. According to THEOREM 5.1, at leat n b þ 1 different robut enor need to tore data. Then at leat ðn b þ 1Þ=2 different robut enor hould be elected in both current cluter and neighbor cluter to achieve required fault tolerance. Since the robut enor for bacup are randomly elected, it turn to be a verion of the coupon collector problem. According to [18], the expected value of bacup that need to be performed to cover m different robut enor i mlnðmþ. Thu, mlnðmþ (m ¼ðn b þ 1Þ=2) bacup are performed to achieve the required fault tolerance, which Fig. 7. Total energy conumption under different n ( ¼ 6). i recovering all data by acceing arbitrary b torage enor in DISC. We divide the networ into four cluter and enure n=4 m in the evaluation uch that enough robut enor for bacup are provided in each cluter. We evaluate the performance through imulation. In the imulation, 200 regular enor are randomly deployed in a 40 m 40 m area. Each enor communication range i 10 m. The energy model i the free pace mode preented in Section 3.2. According to [12], the communication energy parameter are et a: E elec ¼ 50nJ=bit, f ¼ 10pJ=bit=m 2 and a ¼ 2. The central erver broadcat it location information, which are 64 bit including longitude and latitude in the data type of float. Sening data include a timetamp and a meaurement, which are 64 and 32 bit repectively. We adopt the routing algorithm propoed in [15] to etablih route. The communication overhead in route contruction i not conidered. We et to be 6 unle we evaluate the impact of robut enor torage. We alo et n= ¼ 1:5. Correpondingly, the number of robut enor i 50, which i =4, unle we evaluate the impact of number of robut enor. In each imulation, the networ topology i different by randomly redeploying all enor node in the networ. All imulation reult are the average of 100 time imulation. 7.2 Evaluation of Total Energy Conumption We evaluate total energy conumption under different and n. The number of robut enor n range from 34 to 64, which i from 1=6 to 1=3 of the number of regular enor. Fig. 7 how the total energy conumption of the three algorithm. The green line i the energy conumption of the centralized algorithm with cot dicovery overhead. From the figure, we can ee that our cheme outperform DISC. The cot dicovery doe not introduce a big overhead. When n i 34, the total energy conumption under randomized algorithm i 0:2485 J. It i 34:2 percent le than that under DISC, which i 0:3776 J. The total total energy conumption under centralized algorithm i 0:0067 J, which i about 98:2 percent le than that under DISC. A n increae, total energy conumption of all the algorithm increae. The performance under DISC increae much fater than under our algorithm. The reaon i that the number of bacup time increae a n increae. Among the two of our algorithm, the performance under the centralized algorithm i better than that under the randomized algorithm. Thi how that the centralized algorithm perform better with a maller n.

TIAN ET AL.: A NETWORK CODING BASED ENERGY EFFICIENT DATA BACKUP IN SURVIVABILITY-HETEROGENEOUS SENSOR... 2003 Fig. 8. Total energy conumption under different (n/ ¼ 1.5). We vary the number of torage unit in a robut enor from 1 to 10. The reult are hown in Fig. 8. From the figure, we can ee that our cheme outperform DISC under different. When i 1, the total energy conumption under randomized algorithm i 0:3547 J. It i 93:9 percent le than that under DISC, which i 5:8425 J. The total energy conumption under centralized algorithm i 0:7512 J, which i 87:1 percent le than that under DISC. A increae, the randomized algorithm eep an almot contant energy conumption, but the centralized algorithm conume le. The reaon i that under given n= and, c 1 doen t change while c 2 decreae with le number of robut enor, which reduce the number of bacup meage. Thi how that the randomized algorithm perform better with a maller. 7.3 Energy Conumption per Node We evaluate the energy conumption per node under different n and. n range from 34 to 64. Fig. 9 how the energy conumption per node under three algorithm. From the figure, we can ee that our algorithm outperform DISC. For example, when n i 34, the energy conumption per node under randomized algorithm i 1:06 10 3 J, which i 35:0 percent le than that under DISC, which i 1:63 10 3 J. The energy conumption per node under centralized algorithm i 2:85 10 5 J, which i 98:3 percent le than that under DISC. The reaon i that the data bacup between cluter head conume a lot of energy in DISC. It can alo be oberved that a n increae, energy conumption per node increae under all three algorithm. The reaon i that a n increae, the total number of meage tranmitted from every regular enor increae fater than the number of robut enor, and thu the energy conumption per node increae. Fig. 10. Energy conumption per node under different (n/ ¼ 1.5). Fig. 10 how the energy conumption per node under different. From the reult, our algorithm have better performance than DISC. For example, when i 1, the energy conumption per node under randomized algorithm and centralized algorithm i 7:09 10 4 J and 1:50 10 3 J, which are 93:9 and 87:2 percent le than that under DISC, which i 1:17 10 2 J. From the reult, we can alo ee that energy conumption per node under both centralized algorithm and DISC decreae a increae, and the energy conumption per node under randomized algorithm increae a increae. The reaon i that a increae, the number of robut enor decreae with given n= and. In centralized algorithm, the total number of meage tranmitted from every regular enor decreae fater than the total number of enor. Thu the energy conumption per node decreae under centralized algorithm. In randomized algorithm, the total number of meage tranmitted from every regular enor i a contant value under given n=. Thu the energy conumption per node increae under randomized algorithm a total number of enor decreae. 7.4 Storage Size Required for One Bacup in a Robut Node We evaluate the ize of torage unit required for one bacup in a robut enor on average under different and n. A mall variance on uch metric indicate that the torage requirement of the algorithm i table. n range from 34 to 64. Fig. 11 how the ize of torage unit ued in a robut enor under three algorithm. From the reult, we can ee that the ize of torage unit taen on a robut enor i much lower under our algorithm than that Fig. 9. Energy conumption per node under different n ( ¼ 6). Fig. 11. Storage unit under different n ( ¼ 6).

2004 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 10, OCTOBER 2015 Fig. 12. Storage unit under different (n/ ¼ 2). under DISC. For example, when n ¼ 34, the ize of torage unit required on a robut enor under our algorithm i 576 bit on average, which i 87:0 percent le than that under DISC, which i 4439:1 bit on average. Moreover, the torage unit required in a robut enor under our algorithm are contant. The torage unit required are not table under DISC. The reaon i that by uing networ coding, our algorithm only need to tore unit data in each robut enor. In DISC, ince each cluter head may need to tore original data from other cluter head, the difference of number of torage unit between robut enor i large. Fig. 12 how the the ize of torage unit required in a robut enor under different. From the reult, we can ee that the value increae under our algorithm a increae. The value decreae and become more untable under DISC a increae. Our algorithm only need unit torage in a robut enor. Then a increae, the number of torage unit on every robut enor increae. Moreover, the requirement of torage unit under our algorithm i till much lower than that under DISC. It i obviou that DISC tae lot of torage in robut enor to meet the required level of bacup fault tolerance. 7.5 Evaluation of Fault Tolerance In all the above imulation etting, both our centralized and randomized algorithm can achieve the required fault tolerance, which i, querying any b robut node can recover all the data generated by the enor networ. For DISC, if a cluter head bac up data multiple time, the required fault tolerance can alo be achieved; however, the energy conumption on data bacup i high, which ha been hown in previou ection. If we trictly follow DISC protocol, which i a cluter head bac up data on one neighbor torage node, only 6:06 percent time, the required fault tolerance i achieved. 8 RELATED WORK Networed data torage and bacup have been intenively tudied in the pat. Depending on methodology, exiting wor can be claified into two categorie: cheme uing networ coding and cheme not. Mechanim baed on networ coding are tudied in [4], [5], [8], [20], [21], [22], [23], [24], [25], [26], [27], [28]. Albano and Chea provide an abtract model of in-networ torage by uing eraure code in [4]. A decentralized eraure code for data torage i propoed in [5], in which data ource are ditributed in the networ. Wang et al. in [8] preent a partial networ coding (PNC) cheme for collecting recent data in wirele enor networ. Storing large file in a ditributed manner i tudied in [20]. Dimai et al. invetigate the problem of contructing fountain code for ditributed torage in enor networ in [21]. Regenerating code with exiting available node to repair failure node in ditributed torage ytem i tudied in [22]. Liu et al. in [23] ue Slepian-Wolf Code to minimize the communication cot in a networ with a ingle in. Hu et al. propoe a mutually cooperative recovery (MCR) mechanim for multiple node failure in ditributed torage ytem in [24]. A networ coding cheme baed on ociality of wirele enor networ i propoed in [25]. Yang et al. propoe a compreed networ coding baed ditributed data torage cheme baed on compreed ening and networ coding theorie in [26]. To guarantee data integrity and availability, Zeng et al. in [27] propoe a ditributed fault/intruion-tolerant data torage cheme baed on networ coding and homomorphic fingerprinting. In [28], Wu tudie the problem of contructing networ code to achieve an optimal tradeoff between torage efficiency and networ bandwidth. Without uing networ coding, data torage uing other technique are preented in [2], [3], [29], [30], [31], [32], [33]. A Geographic Hah Table ytem for data-centric torage i propoed by Ratnaamy et al. in [29], which hahe ey into geographic coordinate, and tore a ey-value pair at the enor node geographically nearet the hah of it ey. Tanuhetty et al. preent a concept of multiple hah location for toring ened data to provide efficient reiliency for data centric torage in [30]. A ditributed data torage protocol for large-cale heterogeneou WSN with mobile in i propoed in [31], which guarantee robutne in data collection by intelligently managing data replication among elected torage node in the networ. Liao and Yang propoe a power-aving data torage cheme for WSN baed on grid architecture in [32]. A ditributed data torage algorithm that generate redundant data baed on Luby tranform coding i propoed in [34]. Sheng et al. in [33] tudy the deployment problem of torage node to minimize the total energy cot in data torage ytem, and preent an optimal algorithm baed on dynamic programming. A line baed data diemination protocol in wirele enor networ with mobile in i tudied in [35]. DISC in [3] randomly chooe cluter head in neighboring cluter to conduct data bacup, and ue a Bloom filter baed earch engine to retrieve data and minimize energy conumption. Hahmi et al. in [2] propoe to rotate cluter head to do bacup to balance networ energy conumption. None of the exiting wor focue on deigning a data bacup cheme in heterogenou enor networ to maximize the fault tolerance and minimize the communication cot, and cannot be directly applied to olve the problem in thi paper. 9 CONCLUSION In thi paper, a weighted bacup problem in heterogeneou wirele enor networ i formulated and tudied. We formally prove it NP-completene. Leveraging random linear networ coding, we derive two deign guideline,

TIAN ET AL.: A NETWORK CODING BASED ENERGY EFFICIENT DATA BACKUP IN SURVIVABILITY-HETEROGENEOUS SENSOR... 2005 atifying which, the required fault tolerance can be provided. Baed on the two guideline, we deign two data bacup cheme, which outperform exiting olution. ACKNOWLEDGMENTS Thi wor i upported by the US National Science Foundation grant NSF-1128369. G. Wang i the correponding author. REFERENCES [1] A. Mainwaring, D. Culler, J. Polatre, R. Szewczy, and J. Anderon, Wirele enor networ for habitat monitoring, in Proc. 1t ACM Int. Worhop Wirele Sen. Netw. Appl., 2002, pp. 88 97. [2] S. Hahmi, H. Mouftah, and N. D. Georgana, Achieving reliability over cluter-baed wirele enor networ uing bacup cluter head, in Proc. IEEE Global Telecommun. Conf., 2007, pp. 1149 1153. [3] C. Jarda, E. Oipov, and P. Mahonen, Ditributed information torage and collection for wn, in Proc. IEEE Int. Conf. Mobile Adhoc Sen. Syt., 2007, pp. 1 10. [4] M. Albano and S. Chea, Ditributed eraure coding in data centric torage for wirele enor networ, in Proc. IEEE Symp. Comput. Commun., 2009, pp. 22 27. [5] A. Dimai, V. Prabhaaran, and K. Ramchandran, Decentralized eraure code for ditributed networed torage, IEEE Tran. Inf. Theory, vol. 52, no. 6, pp. 2809 2816, Jun. 2006. [6] D. Wet, Introduction to Graph Theory, 2nd ed. Englewood Cliff, NJ, USA: Prentice-Hall, 2001. [7] R. Koetter and M. Medard, An algebraic approach to networ coding, IEEE/ACM Tran. Netw., vol. 11, no. 5, pp. 782 795, Oct. 2003. [8] D. Wang, Q. Zhang, and J. Liu, Partial networ coding: Concept, performance, and application for continuou data collection in enor networ, ACM Tran. Sen. Netw., vol. 4, no. 3, pp. 14:1 14:22, May 2008. [9] Z. Yang, L. Cai, Y. Liu, and J. Pan, Environment-aware cloc ew etimation and ynchronization for wirele enor networ, in Proc. IEEE INFOCOM, 2012, pp. 1017 1025. [10] Z. Zhong, P. Chen, and T. He, On-demand time ynchronization with predictable accuracy, in Proc. IEEE INFOCOM, 2011, pp. 2480 2488. [11] Y. Chen, Q. Wang, M. Chang, and A. Terzi, Ultra-low power time ynchronization uing paive radio receiver, in Proc. 10th Int. Conf. Inform. Proce. Sen. Netw., 2011, pp. 235 245. [12] W. Heinzelman, A. Chandraaan, and H. Balarihnan, An application-pecific protocol architecture for wirele microenor networ, IEEE Tran. Wirele Commun., vol. 1, no. 4, pp. 660 670, Oct. 2002. [13] A. E. Abdulla, H. Nihiyama, and N. Kato, Extending the lifetime of wirele enor networ: A hybrid routing algorithm, Comput. Commun., vol. 35, no. 9, pp. 1056 1063, May 2012. [14] A. Papadopoulo, A. Navarra, J. A. McCann, and C. M. Pinotti, Vibe: An energy efficient routing protocol for dene and mobile enor networ, J. Netw. Comput. Appl., vol. 35, no. 4, pp. 1177 1190, Jul. 2012. [15] D. Zhang, G. Li, K. Zheng, X. Ming, and Z.-H. Pan, An energybalanced routing method baed on forward-aware factor for wirele enor networ, IEEE Tran. Ind. Inform., vol. 10, no. 1, pp. 766 773, Feb. 2014. [16] C. Kontantopoulo, G. Pantziou, D. Gavala, A. Mpitziopoulo, and B. Mamali, A rendezvou-baed approach enabling energyefficient enory data collection with mobile in, IEEE Tran. Parallel Ditrib. Syt., vol. 23, no. 5, pp. 809 817, May 2012. [17] R. Sugihara and R. Gupta, Optimal peed control of mobile node for data collection in enor networ, IEEE Tran. Mobile Comput., vol. 9, no. 1, pp. 127 139, Jan. 2010. [18] R. Motwani and P. Raghavan, Random Algorithm. Cambridge, U.K.: Cambridge Univ. Pre, 1995. [19] B. Bolloba, Random Graph, 2nd ed. Cambridge, U.K.: Cambridge Univ. Pre, 2001. [20] S. Acedani, S. Deb, M. Medard, and R. Koetter, How good i random linear coding baed ditributed networed torage, in Worhop Netw. Coding, Theory Appl., 2005, pp. 1 6. [21] A. G. Dimai, V. Prabhaaran, and K. Ramchandran, Ditributed fountain code for networed torage, in Proc. IEEE Int. Conf. Acoutic, Speech Signal Proce., vol. 5, 2006, pp. V V. [22] A. G. Dimai, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, Networ coding for ditributed torage ytem, IEEE Tran. Inform. Theory, vol. 56, no. 9, pp. 4539 4551, Sep. 2010. [23] J. Liu, M. Adler, D. Towley, and C. Zhang, On optimal communication cot for gathering correlated data through wirele enor networ, in Proc. 12th Annu. Int. Conf. Mobile Comput. Netw., 2006, pp. 310 321. [24] Y. Hu, Y. Xu, X. Wang, C. Zhan, and P. Li, Cooperative recovery of ditributed torage ytem from multiple loe with networ coding, IEEE J. Sel. Area Commun., vol. 28, no. 2, pp. 268 276, Feb. 2010. [25] W. Ou, Z. Yang, L. Tang, and G. Zhongyang, Deign of wirele enor networ baed on random linear networ coding, in Proc. Int. Conf. Comput. Sci. Serv. Syt., 2012, pp. 986 990. [26] X. Yang, X. Tao, E. Dutiewicz, X. Huang, Y. Guo, and Q. Cui, Energy-efficient ditributed data torage for wirele enor networ baed on compreed ening and networ coding, IEEE Tran. Wirele Commun., vol. 12, no. 10, pp. 5087 5099, Oct. 2013. [27] R. Zeng, Y. Jiang, C. Lin, Y. Fan, and X. Shen, A ditributed fault/ intruion-tolerant enor data torage cheme baed on networ coding and homomorphic fingerprinting, IEEE Tran. Parallel Ditrib. Syt., vol. 23, no. 10, pp. 1819 1830, Oct. 2012. [28] Y. Wu, Exitence and contruction of capacity-achieving networ code for ditributed torage, IEEE J. Sel. Area Commun., vol. 28, no. 2, pp. 277 288, Feb. 2010. [29] S. Ratnaamy, B. Karp, L. Yin, F. Yu, D. Etrin, R. Govindan, and S. Shener, Ght: A geographic hah table for data-centric torage, in Proc. 1t ACM Int. Worhop Wirele Sen. Netw. Appl., 2002, pp. 78 87. [30] R. Tanuhetty, L. H. Ngoh, and P. H. Keng, An efficient reiliency cheme for data centric torage in wirele enor networ, in Proc. IEEE 60th Veh. Technol. Conf., vol. 4, 2004, pp. 2936 2940. [31] G. Maia, D. L. Guidoni, A. C. Viana, A. L. Aquino, R. A. Mini, and A. A. Loureiro, A ditributed data torage protocol for heterogeneou wirele enor networ with mobile in, Ad Hoc Netw., vol. 11, no. 5, pp. 1588 1602, Jul. 2013. [32] W.-H. Liao and H.-C. Yang, A power-aving data torage cheme for wirele enor networ, J. Netw. Comput. Appl., vol. 35, no. 2, pp. 818 825, Mar. 2012. [33] B. Sheng, Q. Li, and W. Mao, Data torage placement in enor networ, in Proc. 7th ACM Int. Symp. Mobile ad hoc Netw. Comput., 2006, pp. 344 355. [34] S. Jafarizadeh and A. Jamalipour, Data peritency in wirele enor networ uing ditributed luby tranform code, Senor J., IEEE, vol. 13, no. 12, pp. 4880 4890, Dec. 2013. [35] E. Hamida and G. Cheliu, A line-baed data diemination protocol for wirele enor networ with mobile in, in Proc. IEEE Int. Conf. Commun., 2008, pp. 2201 2205. Jie Tian received the BS degree in computer cience from Tianjin Univerity, Tianjin, China, in 2005, and the MS degree in computer cience at Nanai Univerity, Tianjin, China, in 2008. He i woring toward the PhD degree in the Department of Computer Science at the New Jerey Intitute of Technology. Hi reearch include wirele networ, ad hoc/enor networ and mobile computing.

2006 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 10, OCTOBER 2015 Tan Yan received the BE degree in 2007 from the School of Information Science and Technology, Southeat Univerity, Nanjing, China, the MS degree from the Department of Electrical & Computer Engineering, NJIT, in 2009, and the PhD degree, under the uperviion of Dr. Guiling Wang, from the Department of Computer Science, New Jerey Intitute of Technology. He joined NEC Laboratorie America in 2014, after completing hi PhD. Hi reearch include networ analytic, time erie mining, mobile ad hoc data management and diemination, and graph theory. Guiling Wang received the BS degree in oftware from Nanai Univerity, Tianjin, China, and the PhD degree in computer cience and engineering with a minor in tatitic from The Pennylvania State Univerity, State College, PA, in 2006. She joined the New Jerey Intitute of Technology, Newar, NJ, in the fall of 2006 and wa promoted to an aociate profeor with tenure in June 2011. " For more information on thi or any other computing topic, pleae viit our Digital Library at www.computer.org/publication/dlib.