Provble Possession nd Repliction of Dt over Cloud Servers Ayd F.Brsoum nd M.Anwr Hsn Deprtment of Electricl nd Computer Engineering University of Wterloo, Ontrio, Cnd. fekry@engmil.uwterloo.c, hsn@sisr.uwterloo.c Abstrct. Cloud Computing (CC) is n emerging computing prdigm tht cn potentilly offer number of importnt dvntges. One of the fundmentl dvntges of CC is py-s-you-go pricing model, where customers py only ccording to their usge of the services. Currently, dt genertion is outpcing users storge vilbility, thus there is n incresing need to outsource such huge mount of dt. Outsourcing dt to remote Cloud Service Provider (CSP) is growing trend for numerous customers nd orgniztions lleviting the burden of locl dt storge nd mintennce. Moreover, customers rely on the dt repliction provided by the CSP to gurntee the vilbility nd durbility of their dt. Therefore, Cloud Service Providers (CSPs) provide storge infrstructure nd web services interfce tht cn be used to store nd retrieve n unlimited mount of dt with fees metered in GB/month. The mechnisms used for dt repliction vry ccording to the nture of the dt; more copies re needed for criticl dt tht cnnot esily be reproduced. This criticl dt should be replicted on multiple servers cross multiple dt centers. On the other hnd, non-criticl, reproducible dt re stored t reduced levels of redundncy. The pricing model is relted to the repliction strtegy. Therefore, it is of crucil importnce to customers to hve strong evidence tht they ctully get the service they py for. Moreover, they need to verify tht ll their dt copies re not being tmpered with or prtilly deleted over time. Consequently, the problem of Provble Dt Possession (PDP) hs been considered in mny reserch ppers. Unfortuntely, previous PDP schemes focus on single copy of the dt nd provide no gurntee tht the CSP stores multiple copies of customers dt. In this pper we ddress this chllenging issue nd propose Efficient Multi-Copy Provble Dt Possession (EMC-PDP) protocols. We prove the security of our protocols ginst colluding servers. Through extensive performnce nlysis nd experimentl results, we demonstrte the efficiency of our protocols. Keywords: Cloud computing, outsourcing dt storge, dt integrity, cryptogrphic protocols 1 Introduction Cloud Computing (CC) is n emerging computing prdigm tht cn be viewed s virtulized pool of computing resources (e.g. storge, processing power, memory, pplictions, services, nd network bndwidth), where customers re provisioned nd de-provisioned recourses s they need. CC represents the vision of providing computing services s public utilities like wter nd electricity. CC services cn be ctegorized into [1]: Softwre-s--Service (SS), Pltform-s-- Service (PS), nd Infrstructure-s--Service (IS). The widely used model of CC services is the SS model in which the customers hve ccess to the pplictions running on the cloud provider s infrstructure. Google Docs, Google Clendr, nd Zoho Writer re publicly known exmples of this model. In PS model, the customers cn deploy their pplictions on the
2 Ayd F.Brsoum nd M.Anwr Hsn provider infrstructure under condition tht these pplictions re creted using tools supported by the provider. IS model enbles customers to rent nd use the provider s resources (storge, processing, nd network). Hence, the customers cn deploy ny pplictions including operting systems. The considerble ttention of Cloud Computing prdigm is due to number of key dvntges which mke it chllenging reserch re in both cdemi nd industry. This prdigm of Informtion Technology (IT) rchitecture supplies cost effective mens of computing over shred pool of resources, where users cn void cpitl expenditure on hrdwre, softwre, nd services s they py only for wht they use [1]. Moreover, CC model provides low mngement overhed nd immedite ccess to brod rnge of pplictions. In ddition, mintennce cost is reduced s third prt is responsible for everything from running the cloud to storing dt. It is not only the economic benefits tht customers cn gin from CC model, but lso flexibility to scle up nd down IT cpcity over time to business needs. Furthermore, CC offers more mobility where customers cn ccess informtion wherever they re, rther thn hving to remin t their desks. CC llows orgniztions to store more dt on remote servers thn on privte computer systems. Orgniztions will no longer be worried bout constnt server updtes nd other computing issues; they will be free to concentrte on innovtions [2]. Outsourcing dt storge to remote Cloud Service Provider (CSP) is growing trend for more nd more customers nd orgniztions lleviting the burden of locl dt storge nd mintennce. In ddition, storing dt remotely llows mny uthorized users to ccess the dt from vrious different geogrphic loctions mking it more convenient to them. Also, some orgniztions my crete lrge dt files tht must be chieved for mny yers but re rrely ccessed, nd so there is no need to store such files on the locl storge of the orgniztions. According to recent survey, IT outsourcing hs grown by stggering 79% s compnies seek to reduce costs nd focus on their core competencies [3]. However, the fct tht dt owners re no longer physiclly possess their sensitive dt rises new formidble nd chllenging tsks relted to dt security nd integrity protection in Cloud Computing. Dt security cn be chieved by encrypting sensitive dt before outsourcing to remote servers. As such, it is crucil demnd of customers to hve strong evidence tht the cloud servers still possess their dt nd it is not being tempered with or prtilly deleted over time, especilly becuse the internl opertion detils of the CSP my not be known by cloud customers. It is unrguble tht, the completeness nd correctness of customers dt in the cloud is being put t risk due to the following resons. First, the CSP whose gol is to mke profit nd mintin reputtion hs n incentive to hide dt loss or reclim storge by discrding dt tht hs not been or is rrely ccessed. Second, greedy CSP might delete some of the dt or might not store ll dt in fst storge required by the contrct with certin customers, i.e., plce it on CDs or other offline medi nd thus using less fst storge. Third, the cloud infrstructures re subject to wide rnge of internl nd externl security threts. Exmples of security breches of cloud services pper from time to time [4, 5]. In short, lthough outsourcing dt into the cloud is economiclly ttrctive for the cost nd complexity of long-term lrge-scle dt storge, it does not offer ny gurntee on dt completeness nd correctness. This problem, if not properly hndled, my hinder the successful deployment of cloud rchitecture. Since customers dt hs been outsourced to remote servers, efficient verifiction of the completeness nd correctness of the outsourced dt becomes formidble chllenge for dt security in CC. The trditionl cryptogrphic primitives for dt integrity nd vilbility bsed on hshing nd signture schemes re not pplicble on the outsourced dt without hving locl copy of the dt. Of course, it is imprcticl for the clients to downlod ll stored dt in order to vlidte its integrity; this would require n expensive I/O cost nd immense
DEMC-PDP & PEMC-PDP 3 communiction overheds cross the network. Therefore, clients need efficient techniques to verify the integrity of their outsourced dt with minimum computtion, communiction, nd storge overhed. Consequently, mny reserchers hve focused on the problem of Provble Dt Possession (PDP) nd proposed different schemes to udit the dt stored on remote servers. Simply, Provble Dt possession (PDP) is technique for vlidting dt integrity over remote servers. Ateniese et l. [6] hve formlized PDP model. In tht model, the dt owner pre-processes the dt file to generte some metdt tht will be used lter for verifiction purposes through chllenge response protocol with the remote/cloud server. The file is then sent to be stored on n untrusted server, nd the owner my delete the locl copy of the file. Lter, the server demonstrtes tht the dt file hs not been deleted or tmpered with by responding to chllenges sent from the verifier who cn be the originl dt owner or other trusted entity tht shres some informtion with the owner. Reserchers hve proposed different vritions of PDP schemes under different cryptogrphic ssumptions [7 14]. We will present vrious PDP schemes in the literture survey section, nd will elborte how they vry from different perspectives. Unfortuntely, previous PDP schemes focus on single copy of the file nd provide no proof tht the CSP stores multiple copies of the owner s file. In this pper we ddress this chllenging problem, propose two Efficient Multi-Copy Provble Dt Possession (EMC-PDP) protocols, nd prove the security (correctness nd soundness) of our protocols ginst colluding servers. Extensive performnce nlysis which is vlidted through implementtion nd experimentl results illustrtes the efficiency of our protocols. Curtmol et l. [15] proposed Multiple-Replic PDP (MR-PDP) scheme, which is the only ttempt in the literture tht cretes multiple replics of owner s file nd udit them. Through extensive investigtion, we will detil the vrious fetures nd limittions of the MR-PDP model. Unfortuntely, Curtmol et l. [15] did not ddress how the uthorized users of the dt owner cn ccess the file copies from the cloud servers noting tht the internl opertions of the CSP re opque. This issue is not hndled in their protocol, nd thus we consider the protocol to be incomplete. We will demonstrte tht our protocols re complete nd outperform the MR-PDP model [15] from both the computtions nd communictions cost. The MR-PDP is investigted in more detils in section 5.2. The reminder of the pper is orgnized s follows: in Section 2 we provide our reserch problem definitions, motivtions, chllenges, nd our min contributions. Section 3 contins n extensive literture survey for the previous schemes of remote dt integrity, the fetures of these schemes, nd their limittions. Our system model nd design gols re presented in Section 4. In Section 5 we present nd nlyze the bsic nturl scheme for multi-copy provble dt possession nd the MR-PDP scheme due to Curtmol et l. [15]. Our EMC-PDP schemes re elborted in Section 6. In Section 7 we prove the security of our proposed schemes. The performnce nlysis nd experimentl results re done in Section 8. Our concluding remrks re given in Section 9. 2 Problem Definition, Motivtion, nd Min Contributions Users resort to dt repliction to ensure the vilbility nd durbility of their sensitive dt, especilly if it cnnot esily be reproduced. As simple exmple, when we re writing reserch pper, we re very creful to keep multiple copies of our pper to be ble to recover it in cse of ny filure or physicl dmge. Likewise, orgniztions, governments, nd universities replicte their finncil, personl, nd generl dt to gurntee its vilbility nd durbility over time. In the Cloud Computing prdigm, customers rely on the CSP to undertke the dt repliction tsk relieving the burden of locl dt storge nd mintennce, but they hve to py for their usge of the CSP s storge infrstructure. On the other side, cloud customers should be securely nd efficiently convinced tht the CSP is ctully possessing ll dt copies tht re greed
4 Ayd F.Brsoum nd M.Anwr Hsn upon, these dt copies re complete nd intct, nd thus customers re getting the service they re pying for. Therefore, in this pper we ddress the problem of creting multiple copies of owner s dt file over untrusted CSP nd uditing ll these copies to verify their completeness nd correctness. 2.1 Motivtion nd Chllenges The mechnisms used for dt repliction vry ccording to the nture of the dt; more copies re needed for criticl dt tht cnnot esily be reproduced. The pricing model of the CSPs is relted to the repliction strtegy. For exmple, Amzon S3 stndrd storge strtegy [16] mintins copies of customers dt on multiple servers cross multiple dt centers, while with Amzon Reduced Redundncy Storge (RRS) strtegy which enbles customers to reduce their costs noncriticl, reproducible dt is stored t reduced level of redundncy. As consequence, the pricing for the Amzon S3 stndrd storge is pproximtely 50% higher thn tht of the RRS. Cloud servers cn collude to chet the customers by showing tht they re storing ll copies, while in relity they re storing single copy. Therefore, cloud customers need secure nd efficient techniques to ensure tht the CSP is ctully keeping ll dt copies tht re greed upon, these copies re not corrupted, nd thus they py for rel services. 2.2 Contributions Our contributions cn be summrized s the following: 1. We propose two Efficient Multi-Copy Provble Dt Possession (EMC-PDP) protocols. These protocols efficiently provide the cloud customers with strong evidence tht the CSP is in relity possessing ll dt copies tht re greed upon nd these copies re intct. 2. We prove the security (correctness nd soundness) of our protocols ginst colluding servers. Cloud servers cn provide vlid responses to verifier s chllenges only if they ctully hve ll dt copies in n uncorrupted stte. 3. We justify the performnce of our proposed protocols through concrete nlysis nd comprison with the stte-of-the-rt. 4. We implement our proposed protocols using cryptogrphic librries. The experimentl results vlidte our performnce nlysis. To the best of our knowledge, the proposed protocols re the first complete nd efficient protocols tht ddress the storge integrity of multiple dt copies over Cloud Computing. 3 Literture Survey 3.1 Provble Dt Possession (PDP) Provble dt possession (PDP) is methodology for vlidting the integrity of dt in outsourcing storge service. The fundmentl gol of the PDP scheme is to llow verifier to efficiently, periodiclly, nd securely vlidte tht remote server which supposedly stores the owner s potentilly very lrge mount of dt is not cheting the verifier. The problem of dt integrity over remote servers hs been ddressed for mny yers nd there is simple solution to tckle this problem s follows. First, the dt owner computes messge uthentiction code (MAC) of the whole file before outsourcing to remote server. Then, the owner keeps only the computed MAC on his locl storge, sends the file to the remote server, nd deletes the locl copy of the file. Lter, whenever verifier needs to check the dt integrity, he sends request
DEMC-PDP & PEMC-PDP 5 to retrieve the file from the rchive service provider, re-computes the MAC of the whole file, nd compres the re-computed MAC with the previously stored vlue. Alterntively, insted of computing nd storing the MAC of the whole file, the dt owner divides the file F into blocks {b 1, b 2,..., b m }, computes MAC σ i for ech block b i : σ i = MAC sk (i b i ) 1 i m, sends both the dt file F nd the MACs {σ i } 1 i m to the remote/cloud server, deletes the locl copy of the file, nd stores only the secret key sk. During the verifiction process, the verifier requests for set of rndomly selected blocks nd their corresponding MACs, re-computes the MAC of ech retrieved block using sk, nd compres the re-computed MACs with the received vlues from the remote server [7]. The rtionle behind the second pproch is tht checking prt of the file is much esier thn the whole of it. However both pproches suffer from severe drwbck; the communiction complexity is liner with the queried dt size which is imprcticl especilly when the vilble bndwidth is limited. PDP Schemes of Deswrte et l. Deswrte et l. [8] thought of better solution by using two functions f nd H. H is one-wy function nd f is nother function such tht f(c, H (F ile)) = H(C F ile), where H is ny secure hsh function nd C is rndom chllenge number sent from the verifier to the remote server. Thus, the dt owner hs to compute H (F ile) nd store it on his locl storge. To udit the file, the verifier genertes rndom chllenge C, computes V = f(c, H (F ile)), nd sends C to the remote server. Upon receiving the chllenge C, the server computes R = H(C F ile) nd sends the response R to the verifier. To vlidte the file integrity, the verifier checks V =? R. At lest one of the two functions f nd H must be kept secret becuse if both were public, it would be esy for mlicious server to compute nd store only H (F ile) tht is not the entire file, nd then dynmiclly responds with vlid vlue f(c, H (F ile)) tht is not the expected one H(C F ile). Unfortuntely, Deswrte et l. [8] hve not found such functions f, H, nd H stisfying the desired verifiction rule. To workround this problem, finite number Ñ of rndom chllenges re generted offline for the file to be checked, the corresponding responses H(C i F ile) 1 i Ñ re pre-computed offline s well, nd then the pre-computed responses re stored on the verifier locl storge. To udit the file, one of the Ñ chllenges is sent to the remote server nd the response received from the server is compred with the pre-computed response which is previously stored on the verifier side. However, this solution limits the number of times prticulr dt file cn be checked by the number of rndom chllenges Ñ. Once ll rndom chllenges {C i} 1 i Ñ re consumed, the verifier hs to retrieve the dt file F from the storge server in order to compute new responses H(C i F ile), but this is unworkble. Deswrte et l.[8] proposed nother protocol to overcome the problem of limited number of udits per file. In their protocol the dt file is represented s n integer m. Figure 1 illustrtes the scheme proposed in [8] There re two min limittions in the protocol of Deswrte et l. [8]: In ech verifiction, the remote server hs to do the exponentition over the entire file. Thus, if we re deling with huge files, e.g., in order of Terbytes (s most prcticl pplictions require) this exponentition will be hevy. Storge overhed on the verifier side; it hs to store some metdt for ech file to be checked. This could be chllenge for the verifier if it uses smll devices, e.g., PDA or cell phone with limited storge cpcity. More RSA Bsed PDP Schemes. Filho et l. [9] proposed scheme to verify dt integrity using the RSA-bsed Homomorphic hsh function. A function H is Homomorphic if, given two
6 Ayd F.Brsoum nd M.Anwr Hsn Dt owner: Represents the dt file s n integer m Genertes RSA modulus N = pq (p & q re prime numbers) Pre-computes nd stores M = m mod N ( R Z N ) Sends the file vlue m to the remote server Chllenge Response Verifier Remote Server 1. Picks r R Z N 2. Computes chllenge A = r mod N A B 4. Computes C = M r mod N 5. Checks C =? B Fig. 1. The PDP protocol by Deswrte et l. [8]. 3. Computes response B = A m mod N opertion + nd, we hve H(d+d ) = H(d) H(d ). The protocol proposed in [9] is illustrted in figure 2. Note tht the response R = H(d) is homomorphic function in the dt file d; H(d + d ) r d+d r d r d H(d)H(d ) mod N. To find collision for this hsh function, one hs to find two messges d, d such tht r d r d, i.e., r d d 1 mod N. Thus, d d must be multiple of ϕ(n). Finding such two messges d, d is belived to be difficult since the fctoriztion of N is unknown. The limittions of the protocol proposed in [9] re similr to those of the protocol in [8]: the rchive service provider hs to exponentite the entire dt file plus the storge overhed on the verifier side. To circumvent the problem of exponentiting the entire file Sebé et l. [10] proposed to verify dt integrity by first frgmenting the file into blocks, fingerprinting ech block, nd then using n RSA-bsed hsh function on the blocks. Thus, the file F is divided into set of m blocks: F = {b 1, b 2,..., b m }, where m fingerprints {M i } 1 i m re generted for the file nd stored on the verifier locl storge. Their proposl does not require the exponentition of the entire file. Figure 3 demonstrtes the protocol proposed by Sebé et l. [10]. Limittions. Although the protocol proposed by Sebé et l. [10] does not require exponentition of the entire file, locl copy of the fingerprints whose size is liner in the number of file blocks must be stored on the verifier side. The verifier hs to store the fingerprints {M i } 1 i m, ech of size N bits consuming m N bits from the verifier locl storge, which my impede the verifiction process when using smll devices like PDAs or cell phones. Dt Storge Commitment Schemes. Golle et l. [11] proposed scheme to verify dt storge commitment, concept tht is weker thn integrity. They investigted storgeenforcing commitment scheme. Through their scheme storge server demonstrtes tht it is mking use of storge spce s lrge s the client s dt, but not necessrily the sme exct
DEMC-PDP & PEMC-PDP 7 Dt owner: Genertes RSA modulus N = pq (p & q re prime numbers) Computes ϕ(n) = (p 1)(q 1) Pre-computes nd stores h(d) = d mod ϕ(n) (d is the dt file) Sends the dt file d to the remote server Chllenge Response Verifier 1. Picks r R Z N r R 3. Computes R = r h(d) mod N 4. Checks R? = R Remote Server 2. Computes response R = H(d) = r d mod N Fig. 2. The PDP protocol by Filho et l. [9]. Dt owner: Genertes RSA modulus N = pq (p & q re prime numbers) Computes ϕ(n) = (p 1)(q 1) Splits the dt file F into m blocks: F = {b 1, b 2,..., b m } Pre-computes nd stores M i = b i mod ϕ(n) (1 i m) Sends the dt file F to the remote server Chllenge Response Verifier Remote Server 1. Picks R Z N 2. Genertes l( m) rndom vlues {c i } 1 i l,{c i} 1 i l 3. Computes r = 5. Computes r = R l c i M i mod ϕ(n) i=1 6. Computes R = r mod N 7. Checks R? = R Fig. 3. The protocol by Sebé et l. [10]. l c i b i i=1 4. Computes R = r mod N
8 Ayd F.Brsoum nd M.Anwr Hsn dt. The storge server does not directly prove tht it is storing file F, but proves tht it hs committed sufficient resources to do so. Their scheme is bsed on n-power Computtionl Diffie-Hellmn (n-pcdh) ssumption: for group Z p with genertor g, there is no known probbilistic polynomil time lgorithm A tht cn compute g xn given g x, g x2,..., g xn 1 with non-negligible probbility. The min scheme proposed by Golle et l. [11] is illustrted in figure 4. Setup File F = {b 1, b 2,..., b m }, b i Z p Let n = 2m + 1 Secret key sk = x R Z p Public key pk = (g x, g x2,..., g xn ) = (g 1, g 2,..., g n ) Dt owner computes nd stores f 0 = Chllenge Response m i=1 g bi i mod p Verifier 1. Picks rndom k [0, m] Remote Server k 3. Computes f k = 4. Checks f0 xk? = f k f k m i=1 g bi i+k Fig. 4. The protocol by Golle et l. [11]. Since ech file block b i Z p cn be represented by log 2 p bits, then the totl number of bits to store the file F = m log 2 p bits. For the storge server to chet by storing ll the possible vlues of f k (m + 1 vlues), it needs (m + 1) log 2 p bits which is slightly lrger thn the size of the originl file. Limittions. The gurntee provided by the proposed protocol in [11] is weker thn dt integrity since it only ensures tht the server is storing something t lest s lrge s the originl dt file but not necessrily the file itself. In ddition, the verifier s public key is bout twice s lrge s the dt file. Privcy-Preserving PDP Schemes. Shh et l. [12, 13] proposed privcy-preserving PDP protocols. Using their schemes, n externl Third Prty Auditor (TPA) cn verify the integrity of files stored by remote server without knowing ny of the file contents. The dt owner first encrypts the file, then sends both the encrypted file long with the encryption key to the remote server. Moreover, the dt owner sends the encrypted file long with key-commitment tht fixes vlue for the key without reveling the key to the TPA. The primry purposes of the schemes proposed in [12, 13] re to ensure tht the remote server is correctly possessing
DEMC-PDP & PEMC-PDP 9 the client s dt long with the encryption key, nd to prevent ny informtion lekge to the TPA which is responsible for the uditing tsk. Thus, clients especilly with constrined computing resources nd cpbilities cn resort to externl udit prty to check the integrity of outsourced dt, nd this third prty uditing process should bring in no new vulnerbilities towrds the privcy of client s dt. In ddition to the uditing tsk of the TPA, it hs nother primry tsk which is extrction of digitl contents. For the uditing tsk, the TPA intercts with the remote server to check tht the stored dt is intct. For the extrction tsk, the TPA intercts with both the remote server nd the dt owner to first check tht the dt is intct then delivers it to the owner. The protocols proposed by Shh et l. [12, 13] re illustrted in figure 5. The protocols proposed by Shh et l. [12, 13] hve the following key limittions: The number of times prticulr dt item cn be verified is limited nd must be fixed beforehnd. Storge overhed on the TPA; it hs to store Ñ hsh vlues for ech file to be udited. Lck of support for stteless verifiction; the TPA hs to updte its stte (the list L) between udits to prevent using the sme rndom number or the sme HMAC twice. Very high communiction complexity to retrieve E K (F ) if the TPA wnts to regenerte new list of hsh vlues to chieve unbounded number of udits. PDP in Dtbse Context. In dtbse outsourcing scenrio, the dtbse owner stores dt t storge service provider nd the dtbse users send queries to the service provider to retrieve some tuples/records tht mtch the issued query. Dt integrity is n impertive concern in the dtbse outsourcing prdigm; when user receives query result from the service provider, he wnts to be convinced tht the received tuples re not being tmpered with by mlicious service provider. Mykletun et l. [14] investigted the notion of signture ggregtion to vlidte the integrity of the query result. Signture ggregtion enbles bndwidth- nd computtionefficient integrity verifiction of query replies. In the scheme presented in [14], ech dtbse record is signed before outsourcing the dtbse to remote service provider. Mykletun et l. [14] provided two ggregtion mechnisms: one is bsed on RSA [17] nd the other is bsed on BLS signture [18]. For the scheme bsed on the RSA signture, ech record in the dtbse is signed s: σ i = h(b i ) d mod N, where h is one-wy hsh function, b i is the dt record, d is the RSA privte key, nd N is the RSA modulus. A user issues query to be executed over the outsourced dtbse, the server processes the query nd computes n ggregted signture σ = t i=1 σ i mod N, where t is the number of records in the query result. The server sends the query result long with the ggregted signture to the user. To verify the integrity of the received records, the user checks σ e =? t i=1 σ i mod N, where e is the RSA public key. The second scheme proposed by Mykletun et l. [14] which is bsed on the BLS signture [18] is similr to the first scheme but the record signture σ i = h(m i ) x, where x R Z p is secret key. To verify the integrity of the received records, the user checks ê(σ, g)? = ê( t i=1 h(b i), y), where g is genertor of the group Z p, y = g x (public key), nd ê is computble biliner mp tht will be explined lter. Although dt integrity (correctness) is n impertive concern in the dtbse outsourcing prdigm, completeness is nother crucil demnd for dtbse users. Completeness mens tht the service provider should send ll records tht stisfy the query criteri not just subset of them. The schemes proposed by Mykletun et l. [14] did not fulfill the completeness requirement. The completeness problem hs been ddressed by other reserchers (see for exmple [3, 19]). We emphsize tht the techniques bsed on ggregted signtures [14] would fil to provide blockless verifiction, which is needed by ny efficient PDP scheme. Indeed, the verifier hs to
10 Ayd F.Brsoum nd M.Anwr Hsn Setup Dt owner sends key K nd the encrypted file E K (F ) to the remote server Dt owner sends key-commitment vlue g K nd the encrypted file E K (F ) to the TPA (g is genertor for Z p ) The TPA genertes list L of rndom vlues nd HMACs: L = {(R i, H i )} 1 i Ñ, H i = HMAC(R i, E K (F )), nd R i is rndom number. TPA keeps L, H(E K (F )), g K, nd cn discrd E K (F ) (H is secure hsh function) TPA Checking Dt Integrity 1. Picks ny R i, H i from L Remote Server nd L = L\{(R i, H i )} R i 2. Computes H s =HMAC(R i, E K (F )) H s 3. Checks H? i = Hs Checking Key Integrit 1. Genertes β R Z p 3. Checks (g K ) β =? (W s ) Dt Extrction g β 2. Computes W s = (g β ) K W s D s =E K (F ) Checks hshing of its locl cched copy: H(E K (F ))? = H(D s ). If vlid, sends E K (F ) to the owner Key Extrction Assume tht the owner nd the server gree on shred rndom secret X K+X, g X Checks g K+X =? g K g X. If vlid, sends K + X to the owner Owner gets K = (K + X) X Fig. 5. The protocols by Shh et l. [12, 13].
DEMC-PDP & PEMC-PDP 11 hve the bility to verify dt integrity even though he does not possess ny of the file blocks. The schemes proposed in [14] depend on the retrieved records of the query result to verify the integrity of the outsourced dtbse. Blockless verifiction is min concern to minimize the required communiction cost over the network. Ateniese et l. [6] proposed model to overcome some of the limittions of the previous protocols: limited number of udits per file determined by fixed chllenges tht must be specified in dvnce, expensive server computtion by doing the exponentition over the entire file, storge overhed on the verifier side by keeping some metdt to be used lter in the uditing tsk, high communiction complexity, nd lck of support for blockless verifiction. Ateniese et l. [6] proposed PDP model in which the dt owner frgments the file F into blocks {b 1, b 2,..., b m } nd genertes metdt ( tg) for ech block to be used for verifiction. The file is then sent to be stored on remote/cloud server which my be untrusted nd the dt owner my delete the locl copy of the file. The remote server provides proof tht the dt hs not been tmpered with or prtilly deleted by responding to chllenges sent from the verifier. The scheme proposed by Ateniese et l. [6] provides probbilistic gurntee of dt possession, where the verifier checks rndom subset of stored file blocks with ech chllenge (spot checking). PDP Schemes Bsed on Homomorphic Verifible Tgs. Ateniese et l. [6] proposed using Homomorphic Verifible Tgs(HVTs)/Homomorphic Liner Authentictors(HLAs) s the bsic building blocks of their scheme. In short, the HVTs/HLAs re unforgeble verifiction metdt constructed from the file blocks in such wy tht the verifier cn be convinced tht liner combintion of the file blocks is ccurtely computed by verifying only the ggregted tg/uthentictor. In their work, Ateniese et l. [6] differentite between the concept of public verifibility nd privte verifibility. In public verifibility nyone not necessrily the dt owner who knows the owner s public key cn chllenge the remote server nd verify tht the server is still possessing the owner s files. On the other side, privte verifibility llows only the originl owner (or verifier with whom the originl owner shres secret key) to perform the uditing tsk. Ateniese et l. [6] proposed two min PDP schemes: Smpling PDP (S-PDP) nd Efficient PDP (E-PDP) schemes. In fct, there is slight difference between the S-PDP scheme nd the E-PDP model, but the E-PDP model provides weker gurntee of dt possession. The E-PDP scheme only gurntees possession of the sum of file blocks nd not necessrily possession of ech one of the blocks being chllenged. Both protocols proposed in [6] re illustrted in figure 6. Although the models proposed by Ateniese et l. [6] hve ddressed mny drwbcks of the previous protocols, they still hve some limittions: HVTs in [6] re bsed on RSA nd thus re reltively long; the HVT for ech file block is in order of N bits. Therefore, to chieve 80-bit security level, the generted tg should be of size 1024 bits. The time is tkes to generte the tgs is too long [7]. Since there is no indictor for the file identifier in the block tg, mlicious server cn chet by using blocks from different files if the dt owner uses the sme secret keys d nd v for ll his files. Ateniese et l. [6] clculted tht for mlicious server to ttck their E-PDP scheme, it hs to store 10 140 vlues to be ble to chet with probbility 100% 1. Shchm nd Wters [20] presented simple ttck ginst the E-PDP scheme which enbles mlicious server to chet with probbility 91% requiring no more storge thn n honest server to store the file. 1 10 140 vlues if the number of the file blocks = 1000 nd the number of chllenged blocks = 101
12 Ayd F.Brsoum nd M.Anwr Hsn I. S-PDP scheme Setup N = pq is the RSA modulus (p & q re prime numbers) g is genertor of QR N (QR N is the set of qudrtic residues modulo N) Public key pk = (N, g, e), secret key sk = (d, v), v R Z N, nd ed 1 mod (p 1)(q 1) π is pseudo-rndom permuttion, f is pseudo-rndom function, nd H is hsh function File F = {b 1, b 2,..., b m } Dt owner genertes tg T i for ech block b i : T i = (H(v i) g b i ) d mod N Dt owner sends the dt file F = {b i } nd the tgs {T i } (1 i m) to the remote server Chllenge Response Verifier Remote Server 1. Picks two keys k 1 (key for π), k 2 (key for f), c(# of blocks to be chllenged ), nd g s = g s mod N(s R Z N ) c, k 1, k 2, g s 2. Computes the chllenged block indices: i j = π k1 (j) 1 j c 3. Computes the rndom vlues: j = f k2 (j) 1 j c c 4. Computes T = mod N T, ρ 6. Computes i j = π k1 (j), j = f k2 (j)(1 j c) T 7. Computes τ = c e H(v i j ) j j=1 8. Checks H(τ s mod N)? = ρ T j i j j=1 c j=1 bi j j 5. Computes ρ = H(gs mod N) II. E-PDP scheme The only difference between the E-PDP nd the S-PDP is tht : { j } 1 j c = 1, nd thus Step 4 : T = c T ij mod N j=1 c j=1 b i j Step 5 : ρ = H(gs mod N) T Step 7 : τ = c e H(v i j ) j=1 Fig. 6. The S-PDP nd E-PDP protocols by Ateniese et l. [6].
DEMC-PDP & PEMC-PDP 13 Recently, Ateniese et l. [21] showed tht the HLAs cn be constructed from homomorphic identifiction protocols. They provided compiler-like trnsformtion to build HLAs from homomorphic identifiction protocols nd showed how to turn the HLA into PDP scheme. As concrete exmple, they pplied their trnsformtion to vrint of n identifiction protocol proposed by Shoup [22] yielding fctoring-bsed PDP scheme. Comprison. We present comprison between the previous PDP schemes in tble 1. The comprison is bsed on the following: Owner pre-computtion: the opertions performed by the dt owner to process the file before being outsourced to remote server. Verifier storge overhed: the extr storge required to store some metdt on the verifier side to be used lter during the verifiction process. Server storge overhed: the extr storge on the server side required to store some metdt not including the originl file sent from the owner. Server computtion: the opertions performed by the server to provide the dt possession gurntee. Verifier computtion: the opertions performed by the verifier to vlidte the server s response. Communiction cost: bndwidth required during the chllenge response phse. Unbounded chllenges: to indicted whether the scheme llows unlimited number to udit the dt file. Frgmenttion: to indicte whether the file is treted s one chunk or divided into smller blocks. Type of gurntee: to indicte whether the gurntee provided from the remote server is deterministic gurntee which requires to ccess ll file blocks or probbilistic gurntee tht depends on spot checking. Prove dt possession: to indicte whether the scheme proves the possession of the file itself or proves tht the server is storing something t lest s lrge s the originl file. We will use the nottions EXF to indicte the EXponentition of the entire File, DET to indicte deterministic gurntee, nd PRO to indicte probbilistic gurntee. For simplicity, the security prmeter is not included s fctor for the relevnt costs. 3.2 Proof of Retrievbility (POR) A Proof of Retrievbility (POR) scheme is n orthogonl/ complementry pproch to Provble Dt Possession (PDP) system. A POR scheme is chllenge-response protocol which enbles remote server to provide n evidence tht verifier cn retrieve or reconstruct the entire dt file from the responses tht re relibly trnsmitted from the server. The min ide of the POR schemes is to pply ersure code to dt files before outsourcing to llow more errorresiliency. Thus, if it is criticl demnd to detect ny modifiction or deletion of tiny prts of the dt file, then ersure code could be used before outsourcing. The work done by Juels nd Kliski [23] ws from the first ppers to consider forml models for POR schemes. In their model, the dt is first encrypted then disguised blocks (clled sentinels) re embedded into the ciphertext. The sentinels re hidden mong the regulr file blocks in order to detect dt modifiction by the server. In the uditing phse, the verifier requests for rndomly picked sentinels nd checks whether they re corrupted or not. If the server corrupts or deletes prts of the dt, then sentinels would lso be influenced with certin probbility. The min limittion of the scheme in [23] is tht it llows only limited number of chllenges on the
14 Ayd F.Brsoum nd M.Anwr Hsn Scheme [8] [9] [10] [11] [12, 13] [6] Owner pre-computtion EXF O(1) O(m) O(m) O(1) O(m) Verifier storge overhed O(1) O(1) O(m) O(1) O(Ñ) - Server storge overhed - - - - - O(m) Server computtion EXF EXF O(c) O(m) O(1) O(c) Verifier computtion O(1) O(1) O(c) O(1) O(1) O(c) Communiction cost O(1) O(1) O(1) O(1) O(1) O(1) Unbounded chllenges Frgmenttion Type of gurntee DET DET DET/ PRO DET DET PRO Prove dt possession Tble 1. Comprison of PDP schemes for file consisting of m blocks, c is the number of chllenged blocks, nd Ñ is finite number of rndom chllenges. Verifier pre-computtion is O(Ñ) to generte list L of HMACs. [6] cn be esily modified to support deterministic gurntee dt files, which is specified by the number of sentinels embedded into the dt file. This limited number of chllenges is due to the fct tht sentinels nd their position within the file must be reveled to the server t ech chllenge nd the verifier cnnot reuse the reveled sentinels. Schwrtz nd Miller [24] proposed using lgebric signture to verify dt integrity cross multiple servers using error-correcting codes. Through keyed lgebric encoding nd strem cipher encryption, they re ble to detect file corruptions. The min limittion of their proposl is tht the communiction complexity is liner with respect to the queried dt size. Moreover, the security of their proposl is not proven nd remins in question [7]. Shchm nd Wters [20] proposed compct proof of retrievbility model tht enbles the verifier to unboundedly chllenge the server solving the limittion of Juels nd Kliski [23]. The min contribution of Shchm nd Wters [20] is the construction of HLAs tht enble the server to ggregte the tgs of individul file blocks nd to generte single short tg s response to the verifier s chllenge. Shchm nd Wters [20] proposed two HLAs: one is bsed on the Pseudo-Rndom Function (PRF), nd the other is bsed on the BLS signture [18]. Other vrious POR schemes cn be found in the literture (see for exmple [25 28]). 3.3 Dynmic Provble Dt Possession (DPDP) The PDP nd POR schemes focus on sttic or wrehoused dt which is essentil in numerous different pplictions such s librries, rchives, nd stronomicl/medicl/scientific/legl repositories. On the other side, Dynmic Provble Dt Possession (DPDP) schemes investigte the dynmic file opertions such s updte, delete, ppend, nd insert opertions. There re some DPDP constructions in the literture stisfying different system requirements. Ateniese et l. [29] proposed DPDP model bsed on cryptogrphic hsh function nd symmetric key encryption. The min drwbck of their scheme is tht the number of updtes nd chllenges is limited nd fixed in dvnce. Moreover, their model does not support block insertion opertion. Erwy et l. [30] extended the work of Ateniese et l. [29] to support dt dynmics using uthenticted skip list. However, the efficiency of their scheme remins in question. Wng et l. [31] presented DPDP scheme by integrting the compct proof of retrievbility model due to Shchm nd Wters [20] nd Merkle Hsh Tree (MHT). Zhu et l. [32] ddressed the construction of DPDP schemes on hybrid clouds to support sclbility of service nd dt migrtion. A hybrid cloud is
DEMC-PDP & PEMC-PDP 15 CC deployment model in which n orgniztion provides nd hndles some internl nd externl resources. For exmple, n orgniztion cn use public cloud service s Amzon EC2 [33] to perform the generl computtion, while the dt files re stored within the orgniztion s locl dt center in privte cloud. Zhu et l. [32] proposed two DPDP schemes: Interctive Provble Dt Possession (IPDP) scheme nd Coopertive Provble Dt Possession (CPDP) scheme. It is importnt to note tht none of the bove pproches (PDP, POR, DPDP) ddress the problem of creting multiple copies of dt file over cloud servers, uditing ll these copies to verify their completeness nd correctness, nd providing strong evidence to the owners tht they re getting wht they re pying for. There re some previous work on mintining the file copies throughout distributed systems to chieve vilbility nd durbility, but none of them gurntee tht multiple copies of the dt file re ctully mintined [34, 35]. 4 System Model nd Design Gols 4.1 System Model In our work we consider model of cloud computing dt storge system consisting of three min components s illustrted in figure 7: (i) dt owner customer or n orgniztion originlly possessing lrge mount of dt to be stored in the cloud; (ii) Cloud Servers (CSs) mnged by CSP which provides pid storge spce on its infrstructure to store owner s files; nd (iii) uthorized users set of owner s clients llowed to use the owner s files nd shre some keying mteril with the dt owner. The uthorized users receive the dt from the cloud servers in n encrypted form, nd using the secret key shred with the owner they get the plin dt. The uthoriztion between the dt owner nd the legl users is out of our scope, nd we ssume tht it is ppropritely done. Through the pper we do not differentite between cloud server nd cloud service provider. Fig. 7. Cloud Computing Dt Storge System Model There re mny pplictions tht cn be envisioned to dopt this model of outsourced dt storge system. For ehelth pplictions, dtbse contining sensitive nd lrge mount of informtion bout ptients medicl history is to be stored on the cloud servers. We cn consider the ehelth orgniztion to be the dt owner nd the physicins to be the uthorized users with pproprite ccess right to the dtbse. Finncil pplictions, scientific pplictions, nd eductionl pplictions contining sensitive informtion bout students trnscripts cn lso be envisioned in similr settings.
16 Ayd F.Brsoum nd M.Anwr Hsn 4.2 Design Gols From the extensive literture survey of the previous PDP schemes, we hve explored nd investigted the vrious fetures nd limittions of such schemes. Now, we im t designing efficient nd secure protocols tht overcome the identified limittions in the previous models nd t the sme time to ddress the problem of creting multiple copies of dt files over cloud servers, uditing ll these copies to verify their completeness nd correctness, nd providing strong evidence to the dt owner tht he is getting wht he is ctully pying for. Our design gols cn be summrized in the following points: 1. Designing Efficient Multi-Copy Provble Dt Possession (EMC-PDP) protocols. These protocols should efficiently nd securely provide the owner with strong evidence tht the CSP is in relity possessing ll dt copies tht re greed upon nd these copies re intct. 2. Allowing the uthorized users of the dt owner to semlessly ccess the file copy received from the CSP. 3. Scling down the storge overhed on both the server nd the verifier sides. 4. Minimizing the computtionl complexity on both the server nd the verifier sides. 5. Limiting the bndwidth required by the protocols during the uditing phse. 6. Enbling public verifibility where nyone not necessrily the dt owner who knows the owner s public key cn chllenge the remote servers nd verify tht the CSP is still possessing the owner s files. 7. Supporting blockless verifiction where the verifier hs to hve the bility to verify the dt integrity even though he neither possesses nor retrieves the file blocks from the server. 8. Allowing unbounded number of uditing rther thn imposing fixed limit on the number of interctions between the verifier nd the CSP. 9. Fcilitting stteless verifiction where the verifier is not needed to hold nd upgrde stte between udits. Mintining such stte is unmngeble in cse of physicl dmge of the verifier s mchine or if the uditing tsk is delegted to TPA. 10. Enbling both probbilistic nd deterministic gurntees. In probbilistic gurntee the verifier checks rndom subset of stored file blocks with ech chllenge (spot checking), while the verifier checks ll the stored file blocks in the deterministic gurntee. Remrk 1. We re considering economiclly-motivted CSPs tht my ttempt to use less storge thn required by the contrct through deletion of few copies of the file. The CSPs hve lmost no finncil benefit by deleting only smll portion of copy of the file. As result, in our work we do not encode the dt file before outsourcing. Such encoding, for exmple using ersure codes, is minly to reconstruct the file from limited mount of dmges. In our proposed schemes, multiple copies of the file re generted nd stored on number of servers; hence server s copy cn be reconstructed even from complete loss using duplicted copies on other servers. More importntly, unlike ersure codes, duplicted files enble sclbility vitl issue in the Cloud Computing system. Also, file tht is duplicted nd stored strtegiclly on multiple servers locted t vrious geogrphic loctions cn help reduce ccess time nd communiction cost for users. Remrk 2. In this work we focus on sensitive rchivl nd wrehoused dt, which is essentil in numerous different pplictions such s digitl librries nd stronomicl/medicl/scientific/legl repositories. Such dt re subject to infrequent chnge, so we tret them s sttic. Since we re deling with sensitive dt like ptients medicl history or finncil dt, this dt must be encrypted before outsourcing to cloud servers. In our schemes we utilize the BLS uthentictors [20] to build public verifible model.
DEMC-PDP & PEMC-PDP 17 5 Multi-Copy Provble Dt Possession (MC-PDP) schemes Suppose tht CSP offers to store n copies of n owner s file on n different servers to prevent simultneous filure of ll copies nd to chieve the vilbility spect in exchnge for prespecified fees metered in GB/month. Thus, the dt owner needs strong evidence to ensure tht the CSP is ctully storing no less thn n copies, ll these copies re complete nd correct, nd the owner is not pying for service tht he does not get. A nïve solution to this problem is to use ny of the previous PDP schemes to seprtely chllenge nd verify the integrity of ech copy on ech server. This is certinly not workble solution; cloud servers cn conspire to convince the dt owner tht they store n copies of the file while indeed they only store one copy. Whenever request for PDP scheme execution is mde to one of the n severs, it is forwrded to the server which is ctully storing the single copy. The CSP cn use nother trick to prove dt vilbility by generting the file copies upon verifier s chllenge; however, there is no evidence tht the ctul copies re stored ll the time. The min core of this cheting is tht the n copies re identicl mking it trivil for the servers to deceive the owner. Therefore, one step towrds the solution is to leve the control of the file copying opertion in the owner s hnd to crete unique distinguishble/differentible copies. Before presenting our min protocols, we strt with wrmup scheme Bsic Multi-Copy Provble Dt Possession (BMC-PDP) scheme which leds to severe computtionl, communiction, nd mngement overhed. We then present the Multiple-Replic Provble Dt Possession (MR-PDP) scheme due to Curtmol et l. [15]. To the best of our knowledge, this is the only scheme in the literture tht ddressed the uditing tsk of multiple copies of dt file. Through extensive nlysis, we will elborte the vrious fetures nd limittions of the MR-PDP model especilly from the uthorized users side. Curtmol et l. [15] did not consider how the uthorized users of the dt owner cn ccess the file copies from the cloud servers noting tht the internl opertions of the CSP re opque. Moreover, we will demonstrte the efficiency of our protocols from the storge, computtion, nd communiction spects. We believe tht the investigtion of both BMC-PDP nd MR-PDP models will led us to our min schemes. 5.1 Bsic Multi-Copy Provble Dt Possession (BMC-PDP) scheme As explined previously, the cloud servers cn conspire to convince the dt owner tht they store n copies of the file while indeed they store fewer thn n copies, nd this is due to the fct tht the n copies re identicl. Thus, the BMC-PDP scheme tries to solve the problem by leving the control of the file copying opertion in the owner s hnd to generte unique distinguishble/differentible copies of the dt file. To this end, the dt owner cretes n distinct copies by encrypting the file under n different keys keeping these keys secret from the CSP. Hence, the cloud servers could not conspire by using one copy to nswer the chllenges for nother. This nturl solution enbles the verifier to seprtely chllenge ech copy on ech server using ny of the PDP schemes, nd to ensure tht the CSP is possessing not less thn n copies. Although the BMC-PDP scheme is workble solution, it is imprcticl nd hs the following criticl drwbcks: The computtion nd communiction complexities of the verifiction tsk grow linerly with the number of copies. Essentilly, the BMC-PDP scheme is equivlent to pplying ny PDP schemes to n different files. Key mngement is severe problem with the BMC-PDP scheme. Since the dt file is encrypted under n different keys, the dt owner hs to keep these keys secret from the CSP nd t the sme time to shre these n keys with ech uthorized user. Moreover, when the uthorized user intercts with the CSP to retrieve the dt file, it is not necessrily to receive the sme copy ech time. According to the lod blncing mechnism used by the
18 Ayd F.Brsoum nd M.Anwr Hsn CSP to orgnize the work of the servers, the uthorized user s request is directed to the server with the lowest congestion. Consequently, ech copy should contin some indictor bout the key used in the encryption to enble the uthorized users to properly decrypt the received copy. 5.2 Multiple-Replic Provble Dt Possession (MR-PDP) scheme Curtmol et l. [15] proposed Multiple-Replic PDP (MR-PDP) scheme where dt owner cn verify tht severl copies of file re stored by storge service provider. The MR-PDP scheme is n extension to the PDP models proposed by Ateniese et l. [6]. Curtmol et l. [15] proposed creting distinct replics/copies of the dt file by first encrypting the file then msking the encrypted version with some rndomness generted from Pseudo-Rndom Function (PRF). The MR-PDP scheme of creting nd uditing n distinct copies is illustrted in figure 8. Key limittions of the MR-PDP scheme of Curtmol et l. [15] re s follows: Since the MR-PDP scheme is n extension to the PDP models proposed by Ateniese et l. [6], it inherits ll the limittions of these models identified erlier in the literture survey section: long tgs (1024 bits to chieve 80-bit secuirty level), computtion overhed on both the verifier nd server side, bility of CSP to chet by using blocks from different files if the dt owner uses the sme secret key (d, v) for ll his files. A slightly modified version of the criticl key mngement problem of the BMC-PDP scheme is nother concern in the MR-PDP scheme. The uthorized users hve to know which copy hs been specificlly retrieved from the CSP to properly unmsk it before decryption. Due to the opqueness nture of the internl opertions of the CSP, the server on which specific copy is exctly stored is unknown to cloud customers the MR-PDP scheme does not ddress how the uthorized users of the dt owner cn ccess the file copies from the cloud servers. The MR-PDP supports only privte verifibility, where just the dt owner (or verifier with whom the originl owner shres secret key) cn do the uditing tsk. Essentilly, the MR-PDP is n extension to the E-PDP version proposed by Ateniese et l. [6], nd thus it only proves possession of the sum of the chllenged blocks not the blocks themselves. Moreover, the CSP cn corrupt the dt blocks nd the summtion is still vlid. For the CSP to prove possession of the blocks, it should multiply ech of the chllenged blocks with rndom vlue. 6 The Proposed Efficient Multi-Copy Provble Dt Possession (EMC-PDP) schemes To chieve the design gols outlined in section 4.2, we propose Efficient Multi-Copy Provble Dt Possession (EMC-PDP) schemes utilizing the BLS Homomorphic Liner Authentictors (HLAs)[20]. In short, the HLAs enble dt owner to fingerprint ech block b i of file F in such wy tht for ny chllenge vector C = {c 1, c 2,..., c r } the server cn homomorphiclly construct tg uthenticting the vlue r i=1 c i b i. The direct doption of the HLAs proposed in [20] is not suitble nd leds to two min ttcks. The first ttck rises when the dt owner delegtes the uditing tsk to TPA: by collecting enough number of liner combintions of the sme blocks, the TPA cn obtin the dt blocks by solving system of liner equtions breking the privcy of the owner s dt. This ttck cn be prevented by encrypting the dt file before outsourcing to the CSP, nd thus the TPA cnnot get ny informtion bout the collected blocks. The second ttck rises if the file identifier is not included in the block tg nd
DEMC-PDP & PEMC-PDP 19 Setup N = pq is the RSA modulus (p & q re prime numbers) g is genertor of QR N (QR N is the set of qudrtic residues modulo N) Public key pk = (N, g, e), secret key sk = (d, v, x), v, x R Z N, nd ed 1 mod (p 1)(q 1) π is pseudo-rndom permuttion, f x is pseudo-rndom function keyed with the secret key x, nd H is hsh function File F = {b 1, b 2,..., b m } E K is n encryption lgorithm under key K Dt Owner Obtins n encrypted file F by encrypting the originl file F using the encryption key K: F = { b 1, b 2,... b m }, where b j = E K (b j ), 1 j m. Uses the encrypted version of the file F to crete set of tgs {T j } 1 j m for ll copies to be used in the verifiction process: T j = (H(v j) g b j ) d mod N Genertes n distinct replics { F i } 1 i n, F i = {ˆb i,1, ˆb i,2,..., ˆb i,m }, using rndom msking s follows. for i = 1 to n do for j = 1 to m do 1. Computes rndom vlue r i,j = f x (i j) 2. Computes the replic s block ˆb i,j = b j + r i,j (dded s lrge integers in Z) Dt owner sends the specific replic F i to specific server S i, 1 i n Checking possession of replic F z To check the possession of the replic F z = {ˆb z,1, ˆb z,2,... ˆb z,m }, the owner chllenges the specific server S z s follows: Owner Remote Server S z 1. Picks key k for the π function, c(# of blocks to be chllenged), nd g s = g s mod N (s R Z N ) c, k, g s 2. Computes the chllenged block indices: j u = π k (u) 1 u c c 3. Computes T = mod N 4. Computes ρ = g c u=1 ˆb z,j u s T, ρ 5. Computes j u = π k (u) 1 u c T 6. Checks ( c e g r chl ) s =? ρ, H(v j u ) u=1 where r chl = c u=1 r z,ju is the sum of the rndom vlues used to obtin the blocks in the replic F z Fig. 8. The MR-PDP protocol by Curtmol et l. [15]. u=1 T ju mod N
20 Ayd F.Brsoum nd M.Anwr Hsn the sme owner s secret key is used with ll the owner s files. This llows the CSP to chet by using blocks from different files during verifiction. Since the min core to design multi-copy provble dt possession model is to generte unique distinguishble/differentible copies of the dt file, we use simple yet efficient wy to generte these distinct copies. In our EMC-PDP models we resort to the diffusion property of ny secure encryption scheme. Diffusion mens tht the output bits of the ciphertext should depend on the input bits of the plintext in very complex wy. In n encryption scheme with strong diffusion property, if there is chnge in one single bit of the plintext, then the ciphertext should completely chnge in n unpredictble wy [36]. Our methodology of generting distinct copies is not only efficient, but lso successful in solving the uthorized users problem of the MR- PDP scheme [15] to ccess the file copy received from the CSP. In our schemes, the uthorized users need only to keep single secret key shred with the dt owner to decrypt the file copy; it is not necessrily to recognize the specific server from which the copy is received. In our work, the dt owner hs file F nd the CSP offers to store n copies, {F 1, F 2,..., F n }, of the owner s file in exchnge for pre-specified fees metered in GB/month. We provide two versions of our EMC-PDP scheme: Deterministic EMC-PDP (DEMC-PDP) nd Probbilistic EMC-PDP (PEMC-PDP). In the DEMC-PDP version, the CSP hs to ccess ll the blocks of the dt file, while in the PEMC-PDP we depend on spot checking by vlidting rndom subset of the file blocks. It is trde-off between the performnce of the system nd the strength of the gurntee provided by the CSP. 6.1 Preliminries nd Nottions F is dt file to be outsourced, it is composed of sequence of m blocks, i.e., F = {b 1, b 2,..., b m }, where ech block b i Z p for some lrge prime p. π key ( ) is Pseudo-Rndom Permuttion(PRP): key {0, 1} log(m) {0, 1} log(m) ψ key ( ) is Pseudo-Rndom Function (PRF): key {0, 1} Z p Biliner Mp/Piring. The biliner mp is one of the essentil building blocks of our proposed schemes. Let G 1, G 2, nd G T be cyclic groups of prime order p. Let g 1 nd g 2 be genertors of G 1 nd G 2, respectively. A biliner piring is mp ê : G 1 G 2 G T with the following properties [37]: 1. Biliner: ê(u 1 u 2, v 1 ) = ê(u 1, v 1 ) ê(u 2, v 1 ), ê(u 1, v 1 v 2 ) = ê(u 1, v 1 ) ê(u 1, v 2 ) u 1, u 2 G 1 nd v 1, v 2 G 2 2. Non-degenerte: ê(g 1, g 2 ) 1 3. Computble: there exists n efficient lgorithm for computing ê 4. ê(u, v b ) = ê(u, v) b u G 1, v G 2, nd, b Z p H( ) is mp-to-point hsh function : {0, 1} G 1 E K is n encryption lgorithm with strong diffusion property, e.g., AES 6.2 Deterministic EMC-PDP (DEMC-PDP) scheme The DEMC-PDP consists of five polynomil time lgorithms: KeyGen, CopyGen, TgGen, Proof, nd Verify. (pk, sk) KeyGen(). This lgorithm is run by the dt owner to generte public key pk nd privte key sk. F = {F i } 1 i n CopyGen(CN i, F ) 1 i n. This lgorithm is run by the dt owner. It tkes s input copy number CN i nd file F nd genertes unique differentible copies F = {F i } 1 i n, where ech copy F i is n ordered collection of blocks {b ij } 1 i n. 1 j m
DEMC-PDP & PEMC-PDP 21 Φ TgGen(sk, F). This lgorithm is run by the dt owner. It tkes s input the privte key sk nd the unique file copies F, nd outputs the tgs/uthentictors set Φ, which is n ordered collection of tgs/uthentictors {σ ij } of the blocks {b ij } (1 i n, 1 j m) P Proof(F, Φ). This lgorithm is run by the CSP. It tkes s input the file copies F nd the tgs set Φ, nd returns proof P which gurntees tht the CSP is ctully storing n copies nd ll these copies re intct {1, 0} Verify(pk, P). Since our schemes support public verifibility, this lgorithm cn be run by ny verifier (dt owner or ny other uditor). It tkes s input the public key pk nd the proof P returned from the CSP, nd outputs 1 if the integrity of ll file copies is correctly verified or 0 otherwise. DEMC-PDP Construction The procedures of our DEMC-PDP protocol execution re s follows: Key Genertion. Let ê : G 1 G 2 G T be biliner mp, u is genertor of G 1, nd g is genertor of G 2. The dt owner runs the KeyGen lgorithm to generte privte key x Z p nd public key y = g x G 2 Distinct Copies Genertion. The dt owner runs the CopyGen lgorithm to generte n differentible copies. The CopyGen lgorithm cretes unique copy F i by conctenting copy number i with the file F, then encrypting using n encryption scheme with strong diffusion property like AES, i.e., F i = E K (i F ) 1 i n. The uthorized users need only to keep single secret key K to decrypt the file copy in lter time. Tg Genertion. Given the distinct file copies F = {F i } 1 i n, where ech copy F i = {b i1, b i2,..., b im }, the dt owner runs the TgGen lgorithm to crete tg σ ij for ech block b ij (1 i n, 1 j m) s σ ij = (H(F id ).u bij ) x G 1, where F id is unique file identifier for ech owner s file. The F id is constructed s F ilenme n m u, i.e., the F id is unique fingerprint for ech file comprising the file nme, the number of copies for this file, the number of blocks in the file, nd the genertor u. Embedding the F id into the block tg prevents the CSP from cheting using blocks from different files s the cse in the MR-PDP [15]. Denote the set of tgs Φ = {σ ij } (1 i n, 1 j m). The dt owner sends {F, Φ, F id } to the CSP nd deletes the copies nd the tgs from its locl storge. Chllenge. The verifier cn check the integrity of ll outsourced copies by chllenging the CSP to ensure tht the CSP is storing not less thn n copies nd these copies re not corrupted. At ech chllenge, the verifier sends fresh PRF (ψ) key k to the CSP. Both the verifier nd the CSP use the PRF(ψ) keyed with k to generte n m rndom vlues {r ij } = ψ k (l) 1 l n m (1 i n, 1 j m) Response. After generting the set of rndom vlues {r ij }(1 i n, 1 j m), the CSP runs the Proof lgorithm to generte n evidence tht it is still correctly possessing the n copies. The CSP responds with proof P = {σ, µ}, where n m n m σ = σ r ij ij G 1 µ = r ij b ij Z p i=1 j=1 i=1 j=1 Verify Response. Upon receiving the proof P = {σ, µ} from the CSP, the verifier runs the Verify lgorithm to check the following verifiction eqution: ê(σ, g)? = ê(h(f id ) R u µ, y), R = n i=1 j=1 m r ij (1)
22 Ayd F.Brsoum nd M.Anwr Hsn If the verifiction eqution pssed, the Verify lgorithm returns 1, otherwise 0. The correctness of the bove verifiction eqution cn be illustrted s follows: n m ê(σ, g) = ê( i=1 j=1 i=1 j=1 σ rij ij, g) n m = ê( [H(F id ) u b ij ] x r ij, g) n m n m = ê( [H(F id )] rij u rij bij, y) i=1 j=1 = ê(h(f id ) n i=1 = ê(h(f id ) R u µ, y) i=1 j=1 m j=1 r ij u n m i=1 j=1 r ij b ij, y) Remrk 3. It is importnt to divide the file F into blocks in the proposed DEMC-PDP. If the file F is treted s one chunk nd the dt owner genertes one tg σ Fi = (H(F id ).u F i ) x G 1 for ech file copy F i (1 i n), then the CSP cn simply chet the verifier s follows. To chllenge the CSP in this sitution, both the verifier nd the CSP generte n rndom vlues {r Fi } = n n ψ k (l) 1 l n. Then the CSP responds by computing σ = σ r F i F i G 1, nd µ = r Fi F i Z p. But this scenrio will llow the CSP to chet by storing only F mod p nd not the entire file (file size is much lrge thn p). Therefore, we divid the file F into blocks s countermesure ginst this cheting. The DEMC-PDP scheme is presented in figure 9. i=1 i=1 6.3 Probbilistic EMC-PDP (PEMC-PDP) scheme As we hve previously demonstrted, the PEMC-PDP depends on spot checking by vlidting rndom subset of the file blocks insted of vlidting ll the blocks to chieve better performnce in the computtion overhed. In the PEMC-PDP scheme, we use the sme indices for the chllenged blocks cross ll copies. The rtionle behind the PEMC-PDP scheme is tht checking prt of the file is much esier thn the whole of it, nd thus reducing the computtion nd storge overhed on the servers side. The PEMC-PDP scheme lso consists of five lgorithms: KeyGen, CopyGen, TgGen, Proof, nd Verify. PEMC-PDP Construction The procedures of our PEMC-PDP protocol execution re s follows: Key Genertion. The sme s in the DEMC-PDP. Distinct Copies Genertion. The sme s in the DEMC-PDP. Tg Genertion. Since the PEMC-PDP checks only rndom subset of the file blocks, the block index is needed to be embedded into the block tg to prevent the CSP from cheting by using blocks t different indices. Given the distinct file copies F = {F i } 1 i n, where ech copy F i = {b i1, b i2,..., b im }, the dt owner runs the TgGen lgorithm to crete tg σ ij for ech block b ij (1 i n, i j m) s σ ij = (H(F id j).u b ij ) x G 1, where
DEMC-PDP & PEMC-PDP 23 Setup ê : G 1 G 2 G T is biliner mp, u nd g re two genertors for G 1 nd G 2 respectively, x Z p is privte key, nd y = g x G 2 is public key. File F = {b 1, b 2,..., b m }, where ech block b i Z p Dt Owner Cretes distinct file copies F = {F i } 1 i n, where F i = E K (i F ) 1 i n. Ech copy F i is n ordered collection of blocks {b ij } 1 j m Clcultes the block tgs Φ = {σ ij } 1 i n, where σ ij = (H(F id ).u b ij ) x G 1 1 j m Sends {F, Φ, F id } to the CSP nd deletes the copies nd the tgs from its locl storge. Chllenge Response Verifier CSP 1. Picks fresh key k 2. Genertes n m rndom vlues {r ij } 1 i n = ψ k (l) 1 l n m 1 j m k 3. Genertes n m rndom vlues s the verifier did n m 4. Computes σ = σ r ij ij G 1 5. Computes µ = σ, µ 6. Checks ê(σ, g) =? ê(h(f id ) R u µ, y), n m where R = i=1 j=1 r ij Fig. 9. The proposed DEMC-PDP scheme. i=1 j=1 n i=1 j=1 m r ij b ij Z p
24 Ayd F.Brsoum nd M.Anwr Hsn F id = F ilenme n m u is unique file identifier for ech owner s file to prevent the CSP from cheting using blocks from different files s the cse in the MR-PDP [15]. The dt owner then genertes n ggregted tg σ j for the blocks t the sme indices in ech copy F i s σ j = n i=1 σ ij G 1. Denote the set of ggregted tgs Φ = {σ j } 1 j m. The dt owner sends {F, Φ, F id } to the CSP nd deletes the copies nd the tgs from its locl storge. Chllenge. For chllenging the CSP nd vlidting the integrity of ll copies, the verifier sends c (# of blocks to be chllenged) nd two fresh keys t ech chllenge: PRP(π) key k 1 nd PRF(ψ) key k 2. Both the verifier nd the CSP use the PRP(π) keyed with k 1 nd the PRF(ψ) keyed with k 2 to generte set Q = {(j, r j )} of pirs of rndom indices nd rndom vlues, where {j} = π k1 (l) 1 l c nd {r j } = ψ k2 (l) 1 l c Response. After generting the set Q = {(j, r j )} of rndom indices nd vlues, the CSP runs the Proof lgorithm to generte n evidence tht it is still correctly possessing the n copies. The CSP responds with proof P = {σ, µ}, where σ = σ rj j G 1, µ i = r j b ij Z p, nd µ = {µ i } 1 i n (j,r j ) Q (j,r j ) Q Verify Response. Upon receiving the proof P = {σ, µ} from the CSP, the verifier runs the Verify lgorithm to check the following verifiction eqution: ê(σ, g)? = ê([ (j,r j ) Q H(F id j) rj ] n u ξ, y), where ξ = If the verifiction eqution pssed, the Verify lgorithm returns 1, otherwise 0. The correctness of the bove verifiction eqution cn be illustrted s follows: ê(σ, g) = ê( σ r j j, g) = ê( n = ê( n = ê( = ê([ = ê([ = ê([ (j,r j) Q n (j,r j ) Q i=1 i=1 (j,r j ) Q i=1 (j,r j ) Q (j,r j ) Q (j,r j) Q (j,r j) Q σ rj ij, g) [H(F id j).u b ij ] x r j, g) [H(F id j)] rj H(F id j) rj ] n n i=1 (j,r j ) Q n i=1 H(F id j) r j ] n u n i=1 µ i, y) H(F id j) r j ] n u ξ, y) n µ i (2) i=1 u rj bij, y) (j,r u j ) Q r j.b ij, y) A question my rise, why hve we not let the CSP to compute nd send ξ = n i=1 µ i, nd thus minimizing the communiction overhed? If we llow the CSP to compute ξ, then the CSP cn simply chet the verifier s follows:
ξ = n µ i = i=1 n i=1 (j,r j ) Q r j b ij = (j,r j ) Q r j n i=1 b ij DEMC-PDP & PEMC-PDP 25 Thus the CSP cn just keep the blocks summtion n i=1 b ij not the blocks themselves. Moreover, the CSP cn corrupt the dt blocks nd the summtion is still vlid. Therefore, we force the CSP to send µ = {µ i } 1 i n nd the vlue ξ is computed t the verifier side. The PEMC-PDP scheme is presented in figure 10. Remrk 4. A slightly modified version of the proposed PEMC-PDP scheme cn reduce the communiction overhed during the response phse by llowing the CSP to compute nd send only one µ insted of µ = {µ i } 1 i n. In this slight modifiction the block tgs Φ = {σ ij } 1 i n 1 j m will not be ggregted into set {σ j } 1 j m, nd the dt owner will send {F, Φ, F id } to the CSP. During the chllenge phse, both the verifier nd the CSP generte two sets Q 1 nd Q 2 of c rndom indices nd c n rndom vlues, respectively. The set Q 1 = {j} = π k1 (l) 1 l c, nd the set Q 2 = {r ji } j Q1, 1 i n which is constructed s follows: forech j Q 1 generte n rndom vlues {r ji } 1 i n = ψ k2 (l) 1 l c n. Then the CSP responds by computing σ nd µ, where σ = n σ r j i ij j Q 1 i=1 G 1, µ = j Q 1 i=1 n r ji b ij Z p, nd r ji Q 2. The verifiction eqution (2) will be modified to ê(σ, g) =? ê( n H(F id j) i=1 r j i u µ, y), r Q ji 2. This modifiction will j Q 1 lower the communiction overhed by sending only one µ, nd thus reducing the communiction cost by fctor of n. Since the chllenge-response phse cn occure mny times during the life time of the dt file, such reduced communiction overhed feture my be useful for resource constrined pplictions. On the other hnd, this reduced communiction will be t the cost of the computtions done on the CSP side; the number of cryptogrphic opertions to compute σ will be cn exponentitions nd cn multiplictions versus c exponentitions nd c multiplictions in the originl PEMC-PDP scheme. Therefore, it is trde off between the computtion nd communiction cost. 7 Security Anlysis of the Proposed EMC-PDP schemes In this section we present forml security nlysis for our proposed EMC-PDP schemes depending on the hrdness of the Computtionl Diffie-Hellmn (CDH) problem nd the Discrete Logrithm (DL) problem. Definitions. 1. CDH problem: given g, g x, h G for some group G nd x Z p, compute h x 2. DL problem: given g, h G for some group G, find x such tht h = g x. 7.1 DEMC-PDP Security Anlysis We present proof showing tht the CSP cn never give forged response bck to the verifier, i.e., the output of the verifiction eqution (1) will lwys be 0 except when the CSP honestly compute σ, µ. Theorem 1. If the CDH problem is hrd in biliner groups, then there is no CSP tht cn pss the verifiction eqution (1) of our DEMC-PDP scheme except by responding with correctly computed proof P = {σ, µ}.
26 Ayd F.Brsoum nd M.Anwr Hsn Setup ê : G 1 G 2 G T is biliner mp, u nd g re two genertors for G 1 nd G 2 respectively, x Z p is privte key, nd y = g x G 2 is public key. File F = {b 1, b 2,..., b m }, where ech block b i Z p Dt Owner Cretes distinct file copies F = {F i } 1 i n, where F i = E K (i F ) 1 i n. Ech copy F i is n ordered collection of blocks {b ij } 1 j m Clcultes the block tg σ ij = (H(F id j).u b ij ) x G 1 Computes set of ggregted tgs Φ = {σ j } 1 j m for the blocks t the sme indices in ech n copy F i, where σ j = σ ij G 1 i=1 Sends {F, Φ, F id } to the CSP nd deletes the copies nd the tgs from its locl storge. Chllenge Response Verifier CSP 1. Picks c(# of blocks to be chllenged) nd two fresh keys k 1 nd k 2 2. Genertes set Q = {(j, r j )}, {j} = π k1 (l) 1 l c nd {r j } = ψ k2 (l) 1 l c c, k 1, k 2 3. Genertes set Q s the verifier did 4. Computes σ = σ r j j G 1 (j,r j ) Q 5. Computes {µ i } 1 i n = 6. Checks ê(σ, g) =? ê([ where ξ = n i=1 µ i σ, µ={µ i } 1 i n (j,r j ) Q H(F id j) rj ] n u ξ, y), Fig. 10. The proposed PEMC-PDP scheme. (j,r j ) Q r j b ij Z p
DEMC-PDP & PEMC-PDP 27 Proof. First, since the F id is embedded into the block tg, it is infesible for the CSP to use blocks from different files nd to pss the uditing procedures even if the owner uses the sme secret key x with ll his files. Second, the gol of mlicious CSP is to generte proof tht is not correctly computed nd to pss the verifiction process. Let P = {σ, µ } be the mlicious CSP s response. m j=1 σr ij ij Let P = {σ, µ} be the expected response from n honest CSP, where σ = n i=1 nd µ = n m i=1 j=1 r ij b ij. If the mlicious CSP s response P = {σ, µ } psses the verifiction eqution (1), then we cn find solution to the CDH problem. According to the correctness of our DEMC-PDP scheme, the expected proof P = {σ, µ} stisfies the verifiction eqution, i.e., ê(σ, g) = ê(h(f id ) R u µ, y). Assume tht σ σ, nd σ psses the verifiction eqution, then we hve ê(σ, g) = ê(h(f id ) R u µ, y). It is cler tht µ µ, otherwise σ = σ which contrdicts our ssumption. Define µ = µ µ 0. Dividing the verifiction eqution for the mlicious response by the verifiction eqution for the expected response, we obtin ê(σ σ 1, g) = ê(u µ, y) ê(σ σ 1, g) = ê(u x µ, g) σ σ 1 = u x µ u x = (σ σ 1 ) 1 µ Set u = h. h x = (σ σ 1 ) 1 µ Hence, we hve found solution to the CDH problem unless evluting the exponent cuses division by zero, but we hve µ 0. Therefore, if σ σ we cn use the mlicious CSP to brek the CDH problem, nd thus we gurntee tht σ must be equl to σ. It is only the vlues µ nd µ tht cn differ. Assume tht the mlicious CSP responds with σ = σ nd µ µ. Now we hve ê(σ, g) = ê(h(f id ) R u µ, y) = ê(σ, g) = ê(h(f id ) R u µ, y) ê(h(f id ) R u µ, y) = ê(h(f id ) R u µ, y) 1 = ê(u µ, y) 1 = u µ = µ = 0 = µ = µ 7.2 PEMC-PDP Security Anlysis The essence of the PEMC-PDP security nlysis is similr to tht of the DEMC-PDP, but here we depend on the hrdness of both the CDH problem the DL problem. Theorem 2. If both the CDH problem nd the DL problem re hrd in biliner groups, then there is no CSP tht cn pss the verifiction eqution (2) of our PEMC-PDP scheme except by responding with correctly computed proof P = {σ, µ}, where µ = {µ i } 1 i n. Proof. First, since the F id nd the block index re embedded into the block tg, it is infesible for the CSP to pss the uditing procedures using blocks from different files or using blocks t different indices even if the owner uses the sme secret key x with ll his files. Second, the gol of mlicious CSP is to generte proof tht is not correctly computed nd to pss the
28 Ayd F.Brsoum nd M.Anwr Hsn verifiction process. Let P = {σ, µ }, where µ = {µ i } 1 i n be the mlicious CSP s response. Let P = {σ, µ} be the expected response from n honest CSP, where σ = (j,r j ) Q σr j j, µ = {µ i } 1 i n, nd µ i = (j,r j ) Q r j b ij. If the mlicious CSP s response P = {σ, µ } psses the verifiction eqution (2), then we cn find solution to both the CDH problem nd the DL problem. According to the correctness of our PEMC-PDP scheme, the expected proof P = {σ, µ} stisfies the verifiction eqution, i.e., ê(σ, g) = ê([ H(F id j) r j ] n u ξ, y), ξ = n i=1 µ i (j,r j) Q Assume tht σ σ, nd σ psses the verifiction eqution, then we hve ê(σ, g) = ê([ H(F id j) r j ] n u ξ, y), ξ = n i=1 µ i (j,r j ) Q Obviously if µ i = µ i i, it follows from the verifiction eqution tht σ = σ which contrdicts our ssumption. Define ξ = ξ ξ = n i=1 µ i, it must be the cse tht t lest one of { µ i } 1 i n is nonzero. Dividing the verifiction eqution for the mlicious response by the verifiction eqution for the expected response, we obtin set u = g α h β for α, β Z p then e(σ σ 1, g) = e(u ξ, y) e(σ σ 1, g) = e(u x ξ, g) σ σ 1 = u x ξ (g α h β ) x ξ = σ σ 1 h βx ξ = σ σ 1 y α ξ h x = (σ σ 1 y α ξ ) 1 β ξ Hence, we hve found solution to the CDH problem unless evluting the exponent cuses division by zero. However, we noted tht not ll of { µ i } 1 i n cn be zero nd the probbility tht β = 0 is 1/p which is negligible. Therefore, if σ σ we cn use the mlicious CSP to brek the CDH problem, nd thus we gurntee tht σ must be equl to σ. It is only the vlues µ = {µ i } nd µ = {µ i} (1 i n) tht cn differ. Assume the mlicious CSP responds with σ = σ, µ µ. Now we hve ê(σ, g) = ê([ H(F id j) rj ] n u ξ, y) = ê(σ, g) = ê([ H(F id j) rj ] n u ξ, y) (j,r j ) Q From which we conclude tht ê([ H(F id j) rj ] n u ξ, y) = ê([ (j,r j ) Q (j,r j ) Q 1 = ê(u ξ, y) 1 = u ξ 1 = (g α h β ) ξ h β ξ = g α ξ h = g α ξ β ξ (j,r j ) Q H(F id j) rj ] n u ξ, y)
DEMC-PDP & PEMC-PDP 29 Hence, we hve found solution to the DL problem unless evluting the exponent cuses division by zero. However, we noted tht not ll of { µ i } 1 i n cn be zero nd the probbility tht β = 0 is 1/p which is negligible. Therefore, if there is t lest one difference between {µ i } nd {µ i } (1 i n), we cn use the mlicious CSP to brek the DL problem, nd thus we gurntee tht {µ i } must be equl to {µ i} i 8 Performnce Anlysis nd Experimentl Results 8.1 Performnce Anlysis In this section we evlute the performnce of our proposed EMC-PDP schemes nd the MR- PDP scheme proposed in [15]. The computtion cost of our schemes nd the scheme in [15] is estimted in terms of the cryptogrphic opertions notted in tble 2. H A M D E P Hshing into group G Addition in group G Multipliction in group G Division in group G Exponentition in group G Piring in group G Tble 2. Nottion of cryptogrphic opertions Without loss of generlity, ssume the desired security level is 80-bit then the elliptic curve group we work on hs 160-bit group order nd the size of the RSA modulus N is 1024 bits. Let the key used with the PRP nd the PRF be of size 128 bits. Tble 3 presents comprison between our proposed schemes (DEMC-PDP nd PEMC-PDP) nd the MR-PDP scheme [15]. The comprison is held from these perspectives: the storge nd genertion cost of the block tgs, the communiction cost for the chllenge nd response phse, nd the computtion cost t both the verifier nd the CSP side. To estblish fir comprison between our schemes nd the MR-PDP scheme [15], we ssume tiny modifictions to the originl protocol proposed in [15]. First, we ssume tht the indices of the chllenged blocks re the sme cross ll replics (this ssumption is n optimiztion for the verifiction computtions of the originl MR-PDP [15]). Second, for the CSP to prove the possession of the blocks, ech one of the chllenged blocks should be multiplied by rndom vlue, i.e, modifying the MR-PDP scheme to be n extension to the S-PDP version not the E-PDP version due to Ateniese et l.[6]. The second modifiction gurntees tht the CSP is possessing ech of the chllenged blocks not just only their sum. Let n, m denote the number of copies nd the number of blocks per copy, respectively, nd c denotes the number of chllenged blocks in both the PEMC-PDP nd the MR-PDP.
30 Ayd F.Brsoum nd M.Anwr Hsn Tg Communiction Overhed Computtion Overhed DEMC-PDP PEMC-PDP MR-PDP[15] Size 160 nm bits 160 m bits 1024 m bits Genertion 2nmE + nmm + nmh 2nmE + 2nmM + nmh 2mE + mm + mh Chllenge 128 bits 256 + log 2 (c) bits 1280 + log 2 (c) bits Response 320 bits 160(n + 1) bits 1024(n+1) bits Proof nme + 2nmM + nma ce + c(n + 1)M + cna (c + n)e + c(n + 1)M + cna 2P + 2E + 1M 2P + (c + 2)E + (c + 1)M (2n + c + 1)E + (cn + c + n)m Verify +nma + 1H +na + ch +cna + ch + 1D Tble 3. Storge, communiction, nd computtion costs for the three schemes. The symbols used in the comprison re defined in Tble 2. There is n optimiztion for this response to be 1024 + 160n bits using hshing. The min contribution of [15] is the reduced computtion of generting the block tgs. This contribution is due to the fct tht the tgs re generted from the encrypted version of the file before msking with some unique rndomness to generte the distinguishble copies. Unfortuntely, this contribution resulted in mny vrious limittions tht hve been explined erlier, nd one of these criticl resulting limittions is the inbility of the uthorized users to ccess the file copies for the opqueness nture of the CSP. Moreover, this reduced computtion is unlikely to hve significnt impct on the overll system performnce becuse the tsk of generting file tgs is done only once during the life time of the file which my be for tens of yers. On the other hnd, our schemes re complete nd much more efficient thn the MR-PDP[15] from the following perspectives: By Completeness, we men tht our schemes hndle ll entities dt owners, CSPs, nd uthorized users tht comprise lmost ll prcticl pplictions, while the protocol in [15] missed n essentil entity in the outsourced dt storge system which is the uthorized users. The storge overhed (totl tgs size) of our PEMC-PDP is 6 times less thn tht of the MR-PDP. In generl our tg size is 1/6 the tg size of the scheme due to Curtmol et l. [15], nd this mitigtes the extr storge cost of our DEMC-PDP. For 64-MB file with 4-KB blocks nd 10 copies, the totl tgs sizes of the MR-PDP, DEMC-PDP, nd PEMC-PDP re 2 MB, 3.125 MB, nd 0.3125 MB, respectively. The communiction overhed of our schemes is much less thn tht of the MR-PDP; the wy we construct our schemes enbles us to ggregte the responses from the servers on which the copies re stored. For the chllenge phse, the communiction cost of our DEMC-PDP is 10 times less thn tht of the MR-PDP, nd it is 5 times less for our PEMC-PDP. Moreover for just only 5 copies we compress the response of our DEMC-PDP to bout 1/19 of the MR-PDP response, nd the compression rtio between the response of our PEMC-PDP to tht of the MR-PDP is bout 1:6. Hence, our schemes re efficient nd much more prcticl especilly when the vilble bndwidth is limited. Our PEMC-PDP is the most efficient scheme from the computtion cost perspective. The experimentl results demonstrte the performnce efficiency of our schemes even with the existence of the pring opertions which consume more time thn other cryptogrphic opertions (in the verifiction phse we use just only two piring opertions). 8.2 Experimentl Results In this section, we present nd discuss the experimentl results of our reserch. The experiments re conducted using C++ on system with n Intel(R) Xeon (R) 2-GHZ processor nd 3 GB
DEMC-PDP & PEMC-PDP 31 RAM running Windows XP. In our implementtion we use MIRACL librry version 5.4.2 nd 64-MB file. To chieve 80-bit security level, the elliptic curve group we work on hs 160-bit group order nd the size of the RSA modulus N is 1024 bits. In our implementtion we do not consider the time to ccess the file blocks, s the stte-of-the-rt- hrd drive technology llows s much s 1 MB to be red in just few nnoseconds [10]. Hence, the totl ccess time is unlikely to hve substntil impct on the overll system performnce. We compre our schemes (DEMC-PDP nd PEMC-PDP) with the MR-PDP scheme from different perspectives: totl tgs size, tgs genertion cost, communiction cost tht includes both chllenge nd response, CSP computtion cost, nd verifier computtion cost. It hs been reported in [6] tht if the remote server is missing frction of the dt, then the number of blocks tht needs to be checked in order to detect server misbehvior with high probbility is constnt independent of the totl number of file blocks. For exmple, if the server deletes 1% of the dt file, the verifier only needs to check for c = 460-rndomly chosen blocks of the file so s to detect this misbehvior with probbility lrger thn 99%. Therefore, in our experiments, we use c = 460 to chieve high probbility of ssurnce. Figure 11 shows the totl tg size (T S) for our proposed schemes nd for the MR-PDP model using different number of copies. For the PEMC-PDP nd MR-PDP schemes, the T S is independent of the 01234567 number of copies, while T S is liner in the number of copies n for the DEMC-PDP. Our PEMC scheme hs the lowest T S; its T S is 6 times less thn tht of the MR- PDP. The wy we ggregte the tgs mkes our PEMC-PDP scheme to hve the lowest storge overhed on the server side. Our DEMC-PDP hs the strongest gurntee tht ll blocks of ll copies re ctully being stored by the CSP nd they re intct, but this strongest gurntee is t the expense of the Tgsize(MB) DZ WW D WD WW storge overhed. 1 5Number 10 of Copies15 20 Fig. 11. T S (in MB) of our proposed schemes nd the MR-PDP model Figure 12 presents the tgs genertion time (in seconds) for our proposed schemes nd for the MR-PDP model using different number of copies. The tgs genertion time of the MR-PDP is the lowest one, nd this is becuse it genertes single set of tgs for ll copies. But s mentioned erlier, this reduced computtion of tgs genertion resulted in precluding the uthorized users
32 Ayd F.Brsoum nd M.Anwr Hsn from semlessly ccessing the owner s files. Moreover, the tgs genertion time is unlikely to hve significnt impct on the overll system performnce; tgs genertion tsk is done only once during the files life time which my be for tens of yers. As noted from figure 12, the time of our PEMC-PDP scheme to generte tgs is slightly higher thn tht of our DEMC-PDP model. This slight difference is due to the dditionl nmm opertions tht re used to ggregte the tgs of the blocks t the sme indices. 100 150 200 250 300 350 400 450 Tgs genertion time(s) DZ WW D WW WD WW 0 1 5 Number 10 of copies 15 20 Fig. 12. Tgs genertion times (in sec) of our proposed schemes nd the MR-PDP model The communiction cost of the three schemes is illustrted in figure 13. The MR-PDP scheme hs the highest communiction overhed. On the other hnd, our DEMC-PDP hs constnt communiction cost (56 bytes) independent of both the number of copies nd the number of blocks per copy, nd its communiction cost is the lowest mong the three schemes. As presented in figure 13, the communiction cost of our PEMC-PDP is significntly less thn tht of the MR-PDP model. For 10 copies, the communiction cost of our PEMC-PDP scheme is bout 253 bytes compred to 1569 bytes for the MR-PDP (6 times less). Hence, our schemes re much more prcticl especilly when the vilble bndwidth is limited nd there re millions of verifiers who need to udit their dt files over the CSPs. Tble 4 presents the CSP computtion times (in ms) to provide n evidence tht it is ctully possessing the file copies in n uncorrupted stte. Of course, the computtion cost of our DEMC- PDP scheme is the lrgest one; it computes the proof using ll the blocks of ll copies, nd thus T imemr-pdp T imepemc-pdp provides the strongest gurntee. The speed-up = T ime MR-PDP 100 of our PEMC- PDP with respect to the MR-PDP [15] is lso presented. The proposed PEMC-PDP scheme outperforms the MR-PDP model; it reduces the server computtion time up to 29% mking our scheme much more efficient for prcticl pplictions where lrge number of verifiers re connected to the CSP cusing huge computtion overhed over the servers. Tble 5 presents the verifier computtion times (in ms) to check the responses received from the CSP. The proposed DEMC-PDP scheme hs the shortest verifiction time mong the three schemes; its verifiction process costs only two pirings, two exponentitions, nd one multipliction (other opertions cn be neglected). In ddition, the PEMC-PDP model reduces the
1000 1500 2000 2500 3000 Comuniction Cost (B) DZ WW D WW WD WW 5000 1 5Number 10 of Copies15 20 Fig. 13. Communiction cost (in bytes) of our proposed schemes nd the MR-PDP model DEMC-PDP & PEMC-PDP 33 # of Copies DEMC-PDP PEMC-PDP MR-PDP [15] Speed-up 1 10485.76 294.4 394.57 25.39% 5 52428.8 323.76 440.13 26.44% 10 104857.6 360.43 497.08 27.49% 15 157286.4 395.9 554.03 28.54% 20 209715.2 430.19 610.98 29.59% Tble 4. CSP computtion times (ms) of the three schemes for different number of copies
34 Ayd F.Brsoum nd M.Anwr Hsn verifier computtion cost up to 53% with respect to the MR-PDP scheme. As illustrted in tble 5, both of our schemes hve very tiny increse in the verifier computtion time with incresing number of copies, nd this is due to the dditionl nma nd na opertions which cn be neglected for the DEMC-PDP nd the PEMC-PDP models, respectively. On the other hnd, the verifier computtion cost of the MR-PDP scheme hs liner reltion between the expensive E, M opertions nd the number of copies. # of Copies DEMC-PDP PEMC-PDP MR-PDP [15] Speed-up 1 10.21 294.04 396.21 25.79% 5 10.27 294.042 445.11 33.94% 10 10.32 294.046 506.22 41.91% 15 10.37 294.049 567.34 48.17% 20 10.42 294.52 628.45 53.14% Tble 5. Verifier computtion times (ms) of the three schemes for different number of copies To sum up, the PEMC-PDP hs the best overll performnce mong the three schemes considered here; it surpsses the MR-PDP model from number of perspectives: storge overhed, communiction cost, nd computtion cost. The only dvntge of the MR-PDP scheme is the reduction of the tg genertion cost which is done only once during the life time of the outsourced storge system, while it cuses high overhed for other tsks tht re done more frequently: communiction, proof, nd verify. Besides, if the computtion cost over the server side is less importnt (the CSP hs unlimited computtionl resources), the owner needs the strongest gurntee tht ll block of ll copies re intct, nd the verifiction process is done in constrined environment with limited bndwidth nd limited verifier s computtionl power (e.g., PDA or cell phone), then our DEMC-PDP scheme is the best choice to be pplied in such circumstnces. 9 Conclusion In this work we hve studied the problem of creting multiple copies of dt file over the cloud servers nd uditing ll these copies to verify their correctness nd completeness. We presented two secure, complete, nd efficient protocols: DEMC-PDP nd PEMC-PDP schemes tht chieve our design gols outlined in section 4.2. We held n extensive comprtive nlysis between our schemes nd the MR-PDP [15] which is the only scheme in the literture tht previously ddressed the sme problem, nd proved tht our proposed schemes llevite key limittions of the MR-PDP model tht precludes the uthorized users from semlessly ccessing the owner s files. Through extensive security nlysis nd experimentl results, we hve showed tht our proposed schemes re provbly secure nd much more efficient thn the MR-PDP to be pplied in prcticl pplictions. To the best of our knowledge, the proposed protocols re the first secure, complete, nd efficient protocols tht ddress the storge integrity of multiple dt copies over Cloud Computing. By using the proposed schemes, the dt owners cn hve strong evidence tht they ctully get the service they py for. References 1. P. Mell nd T. Grnce, Drft NIST working definition of cloud computing, Online t http://csrc. nist.gov/groups/sns/cloud-computing/index.html, 2009.
DEMC-PDP & PEMC-PDP 35 2. R. O. Weicho Wng, Zhiwei Li nd B. Bhrgv, Secure nd efficient ccess to outsourced dt, in CCSW 09: Proceedings of the 2009 ACM Workshop on Cloud Computing Security, New York,NY, USA, 2009, pp. 55 66. 3. M. Xie, H. Wng, J. Yin, nd X. Meng, Integrity uditing of outsourced dt, in VLDB 07: Proceedings of the 33rd Interntionl Conference on Very Lrge Dtbses, 2007, pp. 782 793. 4. N. Gohring, Amzon s s3 down for severl hours, Online t http://www.pcworld.com/ businesscenter/rticle/142549/mzons s3 down for severlhours.html, 2008. 5. B. Krebs, Pyment Processor Brech My Be Lrgest Ever, Online t http://voices. wshingtonpost.com/securityfix/2009/01/pyment processor brech my b.html, Jn. 2009. 6. G. Ateniese, R. Burns, R. Curtmol, J. Herring, L. Kissner, Z. Peterson, nd D. Song, Provble dt possession t untrusted stores, in CCS 07: Proceedings of the 14th ACM Conference on Computer nd Communictions Security, New York, NY, USA, 2007, pp. 598 609. 7. K. Zeng, Publicly verifible remote dt integrity, in ICICS, 2008, pp. 419 434. 8. Y. Deswrte, J-J. Quisquter, nd A. Sïdne, Remote integrity checking, in 6th Working Conference on Integrity nd Internl Control in Informtion Systems (IICIS), S. J. L. Strous, Ed., 2003, pp. 1 11. 9. D. L. G. Filho nd P. S. L. M. Brreto, Demonstrting dt possession nd unchetble dt trnsfer, Cryptology eprint Archive, Report 2006/150, 2006. 10. F. Sebé, J. Domingo-Ferrer, A. Mrtinez-Blleste, Y. Deswrte, nd J.-J. Quisquter, Efficient remote dt possession checking in criticl informtion infrstructures, IEEE Trns. on Knowl. nd Dt Eng., vol. 20, no. 8, 2008. 11. P. Golle, S. Jrecki, nd I. Mironov, Cryptogrphic primitives enforcing communiction nd storge complexity, in FC 02: Proceedings of the 6th Interntionl Conference on Finncil Cryptogrphy, Berlin, Heidelberg, 2003, pp. 120 135. 12. M. A. Shh, M. Bker, J. C. Mogul, nd R. Swminthn, Auditing to keep online storge services honest, in HOTOS 07: Proceedings of the 11th USENIX workshop on Hot topics in operting systems, Berkeley, CA, USA, 2007, pp. 1 6. 13. M. A. Shh, R. Swminthn, nd M. Bker, Privcy-preserving udit nd extrction of digitl contents, Cryptology eprint Archive, Report 2008/186, 2008. 14. E. Mykletun, M. Nrsimh, nd G. Tsudik, Authentiction nd integrity in outsourced dtbses, Trns. Storge, vol. 2, no. 2, 2006. 15. R. Curtmol, O. Khn, R. Burns, nd G. Ateniese, MR-PDP: Multiple-Replic Provble Dt Possession, in 28th IEEE ICDCS, 2008, pp. 411 420. 16. Amzon Simple Storge Service (Amzon S3), http://ws.mzon.com/s3/. 17. R. Rivest, A. Shmir, nd L. Adlemn, A method for obtining digitl signtures nd public-key cryptosystems, Commun. ACM, vol. 26, no. 1, 1983. 18. D. Boneh, B. Lynn, nd H. Shchm, Short signtures from the weil piring, in ASIACRYPT 01: Proceedings of the 7th Interntionl Conference on the Theory nd Appliction of Cryptology nd Informtion Security, London, UK, 2001, pp. 514 532. 19. F. Li, M. Hdjieleftheriou, G. Kollios, nd L. Reyzin, Dynmic uthenticted index structures for outsourced dtbses, in SIGMOD 06: Proceedings of the 2006 ACM SIGMOD Interntionl Conference on Mngement of Dt, New York, NY, USA, 2006, pp. 121 132. 20. H. Shchm nd B. Wters, Compct proofs of retrievbility, Cryptology eprint Archive, Report 2008/073, 2008, http://eprint.icr.org/. 21. G. Ateniese, S. Kmr, nd J. Ktz, Proofs of storge from homomorphic identifiction protocols, in ASIACRYPT 09: Proceedings of the 15th Interntionl Conference on the Theory nd Appliction of Cryptology nd Informtion Security, Berlin, Heidelberg, 2009, pp. 319 333. 22. V. Shoup, On the security of prcticl identifiction scheme, in EUROCRYPT 96: Proceedings of the 15th Annul Interntionl Conference on Theory nd Appliction of Cryptogrphic Techniques, Berlin, Heidelberg, 1996, pp. 344 353. 23. A. Juels nd B. S. Kliski, PORs: Proofs of Retrievbility for lrge files, in CCS 07: Proceedings of the 14th ACM Conference on Computer nd Communictions Security, 2007, pp. 584 597. 24. T. S. J. Schwrz nd E. L. Miller, Store, forget, nd check: Using lgebric signtures to check remotely dministered storge, in ICDCS 06: Proceedings of the 26th IEEE Interntionl Conference on Distributed Computing Systems, Wshington, DC, USA, 2006.
36 Ayd F.Brsoum nd M.Anwr Hsn 25. R. Curtmol, O. Khn, nd R. Burns, Robust remote dt checking, in StorgeSS 08: Proceedings of the 4th ACM Interntionl Workshop on Storge Security nd Survivbility, New York, NY, USA, 2008, pp. 63 68. 26. K. D. Bowers, A. Juels, nd A. Opre, Proofs of retrievbility: theory nd implementtion, in CCSW 09: Proceedings of the 2009 ACM Workshop on Cloud Computing Security, New York, NY, USA, 2009, pp. 43 54. 27., Hil: high-vilbility nd integrity lyer for cloud storge, in CCS 09: Proceedings of the 16th ACM Conference on Computer nd Communictions Security, New York, NY, USA, 2009, pp. 187 198. 28. Y. Dodis, S. Vdhn, nd D. Wichs, Proofs of retrievbility vi hrdness mplifiction, in TCC 09: Proceedings of the 6th Theory of Cryptogrphy Conference on Theory of Cryptogrphy, Berlin, Heidelberg, 2009, pp. 109 127. 29. G. Ateniese, R. Di Pietro, L. V. Mncini, nd G. Tsudik, Sclble nd efficient provble dt possession, in SecureComm 08: Proceedings of the 4th Interntionl Conference on Security nd Privcy in Communiction Netowrks, New York, NY, USA, 2008, pp. 1 10. 30. C. Erwy, A. Küpçü, C. Ppmnthou, nd R. Tmssi, Dynmic provble dt possession, in CCS 09: Proceedings of the 16th ACM Conference on Computer nd Communictions Security, New York, NY, USA, 2009, pp. 213 222. 31. Q. Wng, C. Wng, J. Li, K. Ren, nd W. Lou, Enbling public verifibility nd dt dynmics for storge security in cloud computing, in ESORICS 09: Proceedings of the 14th Europen Conference on Reserch in Computer Security, Berlin, Heidelberg, 2009, pp. 355 370. 32. Z. H. G.-J. A. H. H. Yn Zhu, Huixi Wng nd S. S. Yu, Coopertive provble dt possession, Cryptology eprint Archive, Report 2010/234, 2010. 33. Amzon Elstic Compute Cloud (Amzon EC2), http://ws.mzon.com/ec2/. 34. B.-G. Chun, F. Dbek, A. Heberlen, E. Sit, H. Wetherspoon, M. F. Kshoek, J. Kubitowicz, nd R. Morris, Efficient replic mintennce for distributed storge systems, in NSDI 06: Proceedings of the 3rd Conference on Networked Systems Design & Implementtion, Berkeley, CA, USA, 2006. 35. P. Mnitis, M. Roussopoulos, T. J. Giuli, D. S. H. Rosenthl, nd M. Bker, The LOCKSS peerto-peer digitl preservtion system, ACM Trns. Comput. Syst., vol. 23, no. 1, pp. 2 50, 2005. 36. C. E. Shnnon, Communiction theory of secrecy systems, Bell Syst.Tech. J., vol. 28, no. 4, 1949. 37. A. Menezes, An introduction to piring-bsed cryptogrphy, Lecture Notes 2005, Online t http: //www.mth.uwterloo.c/ jmeneze/publictions/pirings.pdf.