Secure and Efficient Proof of Storage with Deduplication



Similar documents
Watermark-based Provable Data Possession for Multimedia File in Cloud Storage

Proactive Secret Sharing Or: How to Cope With Perpetual Leakage

A Secure Password-Authenticated Key Agreement Using Smart Cards

Ensuring Data Storage Security in Cloud Computing

Ensuring Data Storage Security in Cloud Computing

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

AN EFFICIENT GROUP AUTHENTICATION FOR GROUP COMMUNICATIONS

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

An Interest-Oriented Network Evolution Mechanism for Online Communities

PKIS: practical keyword index search on cloud datacenter

Provably Secure Single Sign-on Scheme in Distributed Systems and Networks

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Secure Network Coding Over the Integers

SEVERAL trends are opening up the era of Cloud

8 Algorithm for Binary Searching in Trees

What is Candidate Sampling

SEVERAL trends are opening up the era of Cloud

Tracker: Security and Privacy for RFID-based Supply Chains

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Compact CCA2-secure Hierarchical Identity-Based Broadcast Encryption for Fuzzy-entity Data Sharing

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Riposte: An Anonymous Messaging System Handling Millions of Users

Complete Fairness in Secure Two-Party Computation

1 Example 1: Axis-aligned rectangles

From Selective to Full Security: Semi-Generic Transformations in the Standard Model

An RFID Distance Bounding Protocol

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Recurrence. 1 Definitions and main statements

Extending Probabilistic Dynamic Epistemic Logic

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Identity-Based Encryption Gone Wild

DP5: A Private Presence Service

Project Networks With Mixed-Time Constraints

Practical and Secure Solutions for Integer Comparison

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Optimal Distributed Password Verification

Trivial lump sum R5.0

A Cryptographic Key Assignment Scheme for Access Control in Poset Ordered Hierarchies with Enhanced Security

Canon NTSC Help Desk Documentation

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Scalable and Secure Architecture for Digital Content Distribution

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

BERNSTEIN POLYNOMIALS

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Fuzzy Keyword Search over Encrypted Data in Cloud Computing

A Performance Analysis of View Maintenance Techniques for Data Warehouses

An Alternative Way to Measure Private Equity Performance

This circuit than can be reduced to a planar circuit

Secure Cloud Storage Service with An Efficient DOKS Protocol

J. Parallel Distrib. Comput.

Ad-Hoc Games and Packet Forwardng Networks

Practical PIR for Electronic Commerce

A DISTRIBUTED REPUTATION MANAGEMENT SCHEME FOR MOBILE AGENT- BASED APPLICATIONS

RequIn, a tool for fast web traffic inference

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

An Optimally Robust Hybrid Mix Network (Extended Abstract)

Enabling P2P One-view Multi-party Video Conferencing

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Politecnico di Torino. Porto Institutional Repository

A SECURE BILLING SERVICE WITH TWO-FACTOR USER AUTHENTICATION IN WIRELESS SENSOR NETWORKS. Received March 2010; revised July 2010

Fast Variants of RSA

Design and Development of a Security Evaluation Platform Based on International Standards

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Efficient Project Portfolio as a tool for Enterprise Risk Management

Usage of LCG/CLCG numbers for electronic gambling applications

Pricing Model of Cloud Computing Service with Partial Multihoming

Multiple-Period Attribution: Residuals and Compounding

The OC Curve of Attribute Acceptance Plans

Enterprise Master Patient Index

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Improved SVM in Cloud Computing Information Mining

Tuition Fee Loan application notes

Towards a Global Online Reputation

DEFINING %COMPLETE IN MICROSOFT PROJECT

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Brigid Mullany, Ph.D University of North Carolina, Charlotte

The EigenTrust Algorithm for Reputation Management in P2P Networks

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Transcription:

Secure and Effcent Proof of Storage wth Deduplcaton Qng Zheng Department of Computer Scence Unversty of Texas at San Antono qzheng@cs.utsa.edu Shouhua Xu Department of Computer Scence Unversty of Texas at San Antono shxu@cs.utsa.edu ABSTRACT Both securty and effcency are crucal to the success of cloud storage. So far, securty and effcency of cloud storage have been separately nvestgated as follows: On one hand, securty notons such as Proof of Data Possesson (PDP) and Proof of Retrevablty (POR) have been ntroduced for detectng that the data stored n the cloud has been tampered wth. On the other hand, the noton of Proof of Ownershp (POW) has also been proposed to allevate the cloud server from storng multple copes of the same data, whch could substantally reduce the consumpton of both network bandwdth and server storage space. These two aspects are seemngly qute to the opposte of each other. In ths paper, we show, somewhat surprsngly, that the two aspects can actually co-exst wthn the same framework. Ths s possble fundamentally because of the followng nsght: The publc verfablty offered by PDP/POR schemes can be naturally exploted to acheve POW. Ths one stone, two brds phenomenon not only nspred us to propose the novel noton of Proof of Storage wth Deduplcaton (POSD), but also guded us to desgn a concrete scheme that s provably secure n the Random Oracle model based on the Computatonal Dffe-Hellman (CDH) assumpton. Categores and Subect Descrptors C..4 [Communcaton Networks]: Dstrbuted Systems; H.3.4 [Informaton Storage and Retreval]: Systems and Software General Terms Securty Keywords cloud storage, outsourced storage, proof of storage, deduplcaton, ntegrty checkng, proof of ownershp, proof of data possesson, proof of retrevablty Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. CODASPY 1, February 7 9, 01, San Antono, Texas, USA. Copyrght 01 ACM 978-1-4503-1091-8/1/0...$10.00. 1. INTRODUCTION Cloud computng s gettng ncreasngly popular because t can provde low-cost and on-demand use of vast storage and processng resources. The present paper focuses on the securty and effcency of cloud storage, namely that clents outsource ther data to cloud storage servers. Whle cloud storage offers compellng scalablty and avalablty advantages over the current paradgm of one storng and mantanng ts own IT systems and data, t does not come wthout securty concerns. Ths has led to studes on cloud storage securty and effcency, whch are, however, addressed separately as we dscuss below. From the perspectve of cloud storage securty, there have been two notable notons: Proof of Data Possesson (PDP): Ths noton was ntroduced by Atenese et al. []. It allows a cloud clent to verfy the ntegrty of ts data outsourced to the cloud n a very effcent way (.e., far more effcent than the straghtforward soluton of downloadng the data to the clent-end for verfcaton). Ths noton has been enhanced n varous ways [8, 3, 15]. Proof of Retrevablty (POR): Ths noton was ntroduced by Juels and Kalsk [10]. Compared wth PDP, POR offers an extra property that the clent can actually recover the data outsourced to the cloud (n the flavor of knowledge extracton n zero-knowledge proof). Ths noton has been enhanced and extended n multple aspects [1, 6, 5, 16]. From the perspectve of cloud storage effcency, deduplcaton technque has become a common practce of many cloud vendors. Ths s reasonable especally when there are many duplcatons n the data outsourced to the cloud (e.g., only 5% of data may be unque accordng to a survey [1]). As such, the cloud vendor can substantally save storage space by storng a sngle copy of each data fle regardless of the number of clents that outsource t. Ths explans the term deduplcaton. Ths ssue was frst ntroduced to the research communty by [9]. Because straghtforward deduplcaton s vulnerable to attacks (e.g., a dshonest clent can clam that t has certan data whle t does not), Halev et al. [13] proposed the noton called Proof of Ownershp (POW) as well as concrete constructons. Our contrbutons. Both the securty and effcency perspectves mentoned above are mportant and should be offered by a sngle cloud storage soluton, whch s a new problem that has not been addressed. In ths paper, we 1

tackle ths problem by proposng a -n-1 noton we call Proof of Data Storage wth Deduplcaton (POSD). Specfcally, we ntroduce the novel concept of POSD, and formalze ts functonal and securty defntons. Moreover, we propose the frst effcent POSD scheme and prove ts securty n the Random Oracle model based on the Computatonal Dffe- Hellman (CDH) assumpton. We also analyze and compare the performance of our scheme and the performance of some relevant PDP/POR/POW schemes, whch suggests that our POSD scheme s as effcent as the PDP/POR/POW schemes. Organzaton. The rest of the paper s organzed as follows. Secton brefly revews the related pror work. Secton 3 dscusses the notatons and cryptographc settngs. Secton 4 presents the defntons of POSD. Secton 5 descrbes our POSD scheme and ts securty as well as performance analyss. Secton 6 concludes the paper.. RELATED WORK Cloud storage securty was not systematcally studed untl very recently, despte prevous nvestgatons for smlar problems (cf. []). Atenese et al. [] ntroduced the concept of PDP, and Juels et. al [10] proposed the concept of POR, whch was mproved sgnfcantly by Shacham and Waters [1]. The man dfference between the two notons s that POR uses Error Correcton/Erasure Codes to tolerate the damage to portons of the outsourced data. These solutons are later enhanced n varous ways [8, 4, 5, 6, 16]. Data deduplcaton of cphertext data n the pre-cloud era was studed n [14, 7]. Data deduplcaton n the context of cloud computng was recently ntroduced [9]. Halev et al. [13] dubbed the term of POW and presented the frst systematc study of deduplcaton n cloud, ncludng several alternatve solutons that offer dfferent trade-offs between securty and performance. 3. PRELIMINARIES 3.1 Notatons Let l be a securty parameter. A functon ε(l) s neglgble f t s smaller than l const for any constant const and suffcently large l. Let q be an l-bt prme and p aprmesuchthatq (p 1). Let F be a data fle consstng of n blocks, where the th block F s composed of m symbols n Z q,.e. F =(F 1, F m), where F Z m q. Let fd be the dentty that unquely dentfes data fle F. Let each fle be assocated wth some auxlary nformaton (.e. cryptographc tags), denoted by Tag. We consder two varants of Tag: Tag nt s the cryptographc nformaton for audtng data ntegrty, and Tag dup s the cryptographc nformaton for duplcaton checkng. Let [] denote the optonal arguments of a functon or algorthm; for example, Alg(a, b[,c]) means that algorthm Alg has two arguments a and b, and optonally a thrd argument c. 3. Cryptographc Settng and Assumptons Let G and G T be cyclc groups of prme order q and g be a generator of G. Lete : G G G T be a blnear map, wth the followng propertes: () e can be computed effcently; () for all (u, v) G G, and a, b Z q, e(u a,v b )=e(u, v) ab ; () e(g,g) 1. The standard Computatonal Dffe-Hellman (CDH) Problem s the followng: Gven (g,g w,h) G 3,whereg,g w,h are selected unformly at random from G, compute h w.the CDH Assumpton says that no probablstc polynomal-tme (ppt) algorthm can solve the CDH Problem wth a nonneglgble probablty (n l). The Dscrete Log (DLOG) Problem s the followng: Gven any prme q-order cyclc group G and two random elements g and h, fnd w such that g w = h. The DLOG Assumpton says that no ppt algorthm can solve the DLOG Problem only wth a non-neglgble probablty (n l). The DLOG Assumpton s weaker than the CDH Assumpton. Let H 1 : {0, 1} G and H : {0, 1} Z q be randomly chosen from the respectve famles of hash functons. Both H 1 and H are modeled as random oracles. Let PRF : {0, 1} l {0, 1} {0, 1} l be a famly of secure pseudorandom functons. 4. REQUIREMENTS, MODEL AND DEFI- NITIONS OF POSD Requrements. Bult on top of [, 10, 13], we summarze the performance requrements of POSD as: A soluton should use common functons (e.g., hash functons) so as to allow cross-clent data deduplcaton and cross-clent cloud data ntegrty audtng. A soluton should consume bandwdth that s substantally less than the sze of the data fle n queston. Ths prevents the aforementoned trval solutons. A soluton should not force the cloud server, when determnng whether to conduct a deduplcaton operaton, to retreve any sgnfcant porton of the data fle n queston. Ths s plausble because t could be very resource-consumng to load a large data fle from secondary storage to memory. A soluton should only requre the clent to make a sngle pass over ts data fle, whle usng an amount of memory that s substantally smaller than the sze of the data fle n queston. As n the cases of PDP/POR and POW, there are trval solutons to fulfll the functons of POSD. Specfcally, a clent can download the whole data from the cloud to verfy the ntegrty of ts data outsourced to the cloud, and the server can ask the clent to upload a data fle to show that the clent ndeed has a copy of the data fle before conductng the deduplcaton operaton. However, ths trval soluton s not practcal because t ncurs prohbtve communcaton overhead. On the other hand, t was also noted n [, 10, 13] that smple heurstcs wll not solve the respectve problems wthout shortcomngs. Model partcpants. We consder a cloud storage servce model that nvolves the followng three partcpants. () Cloud storage server, denoted by S: It provdes storage servce wth relevant assurance procedures, by whch the cloud storage clents can check the ntegrty of ther data stored n the cloud and the server can save storage space va data deduplcaton n a secure fashon. () Cloud storage clents, denoted by C: A clent outsources ts data to the cloud n a secure fashon, whle

allowng the cloud storage server to conduct data deduplcaton operatons. (If a clent does not want the server to conduct ths operaton, ths can be acheved va an approprate contract-level agreement that s out of the scope of the present paper.) () Thrd party, denoted by Audtor: A clent may allow a thrd party to check the ntegrty of ts data outsourced to the cloud. Moreover, any clent, who possesses a data fle that s duplcated (.e., the same data fle has been uploaded to the server by another clent), can act as an Audtor of that specfc data fle. Communcaton channels. If the data fle outsourced to the cloud s not confdental, there s no need for prvate channels. (In ths case, secure deduplcaton stll can be relevant because t may be very expensve to eavesdrop the communcaton durng the transfer of a large data fle.) In the case the outsourced data fles are confdental, we can assume the avalablty of prvate communcaton channels for the executon of certan protocols. Ths s common to PDP/POR/POW [, 1, 13] and avods unnecessary complcatons n descrbng the protocols (gven that prvate channels can be mplemented usng standard technques n a modular fashon). Note that n order to facltate deduplcaton, the data wll be stored n plantext n the cloud, whch s the same as n POW [13]. Functonal defnton. The followng defnton of POSD s bult on the defntons of PDP/POR [, 1] and POW [13]. Defnton 1. (functonal defnton) A POSD scheme, denoted by Λ, conssts of the followng tuple of polynomaltme algorthms (Keygen,Upload,AudtInt,Dedup). Keygen: Ths s the key generaton algorthm. It takes as nput a securty parameter l, and outputs two pars of publc/prvate keys (pk nt, sk nt) and(pk dup, sk dup ), where pk nt s made publc and sk nt s the correspondng prvate key of a clent (ths par of keys may be used for ntegrty protecton/verfcaton purpose), pk dup s made publc and sk dup s the prvate key of the server (ths par of keys may be used for secure deduplcaton purpose). Upload: Ths s the data uploadng protocol runnng by a clent C and a server S over a prvate channel so that secrecy of the data s assured. Suppose C wants to upload a new data fle F to the cloud, where S can easly determne that F has not be outsourced to the cloud by any clent (e.g., by comparng the hash value provded by C aganst the lst of hash values stored by the server). For preprocessng, Clent C takes as nput a new data fle F wth a unque dentfer fd and the secret key sk nt, outputs some auxlary nformaton Tag nt that can be used to audt the ntegrty of F n the cloud. At the end of the executon, S stores (fd, F, Tag nt ) receved from C as well as possbly some deduplcaton nformaton Tag dup,whchmaybeproduced by the server usng sk dup.theservermayalso keep a hash value of the F s so as to facltate the detecton of data duplcatons and thus the need of deduplcaton. AudtInt: Ths s the data ntegrty audtng protocol. It s executed between server S and Audtor so that S convnces Audtor that ntegrty of some data fle stored n the cloud s assured. The Audtor s nput ncludes the data fle dentfer fd and the correspondng clent s pk nt. The server s nput ncludes the data fle F correspondng to fd and the auxlary nformaton Tag nt assocated to F. Essentally, the protocol s of challenge-response type, where Audtor sends a challenge chal to the server and the server computes and sends back a response resp. If resp s vald wth respect to chal as well as the other relevant nformaton, Audtor outputs 1, meanng that the ntegrty of F s assured, and 0 otherwse. Formally, we can wrte t as: b (Audtor(fd, pk nt) S(fd, F, Tag nt )) where b {0, 1}. Dedup: Ths s the deduplcaton checkng protocol. It s executed between server S and clent C, whoclamsto possess a data fle F (the detecton of the need to deduplcate can be fulflled by C sendng the hash value of ts data fle to S, whch can determne whether or not the data fle has been n the cloud). Ths protocol s also essentally of challenge-response type. Bascally, S sends a challenge chal to C, whch returns a response resp that s produced usng data fle F and possbly other nformaton. S verfes the valdty of resp usng possbly Tag dup and pk dup, and outputs 1 f the verfcaton s successful (meanng that the clent ndeed has data fle F) and 0 otherwse. Formally, we can wrte t as: b (S(fd, Tag dup, [sk dup, ]pk dup ) C(fd, F)) where b {0, 1}. Correctness defnton. We requre a POSD scheme Λ = (Keygen, Upload, AudtInt, Dedup) tobecorrect f, for honest clent and server, the executon of the AudtInt protocol wll always output 1 and the executon of the Dedup protocol wll always output 1. Securty defnton. We defne securty of POSD usng games, whch specfy both the adversary s behavor (.e., what the adversary s allowed to do) and the wnnng condton (.e., under what crcumstance we say the attack s successful). At a hgh-level, we requre a POSD scheme to be server unforgeable, whch s smlar to the securty defned by the data possesson game n [], and (κ, θ) uncheatable, whch s smlar to the securty defnton n [13]. Intutvely, we say a POSD scheme s server unforgeable f no cheatng server can successfully execute the AudtInt protocol wth an honest Audtor wth a non-neglgble probablty. Formally, we have: Defnton. (server unforgeablty) For POSD scheme Λ = (Keygen, Upload, AudtInt, Dedup), consder the followng game between an adversary A and a challenger, where A plays the role of the cloud server S whle possbly controllng many compromsed clents, and the challenger acts as an honest clent. Setup Stage: 3

Run algorthm Keygen to generate (pk nt, sk nt) and (pk dup, sk dup ). Make pk nt and pk dup publc, ncludng gvng sk nt to the respectve clent and sk dup to A. Note that the sk nt of the challenger s not gven to A. For any other clent, the correspondng sk nt may be gven to A as long as t requests (.e., these clents are compromsed by, or collude wth, A). Challenge Stage: At ths stage, A can do anythng wth respect to the clents other than the challenger. Wth respect to the challenger, A does the followng. Aadaptvely chooses a data fle F {0, 1} for the challenger. The challenger pcks a unque dentfer fd and runs the Upload protocol wth A. At the end of the executon, A obtans (fd, F, Tag nt ). The above process may be repeated for polynomal many tmes. Denote by Q = {(fd, F, Tag nt )}, the set of tuples A receved from the challenger when executng the Upload protocol. Note that the challenger keeps arecordofq fd = {fd}, namely the proecton of Q on attrbute fd. Acan execute AudtInt wth the challenger wth respect to any fd Q fd, and execute Dedup wth the challenger wth respect to some data fle (possbly chosen by A). Ths process can be executed polynomal (n l) numberoftmes. Forgery Stage: The adversary outputs an fd Q fd correspondng to F that was outsourced to the cloud. The adversary wns the game f for any F F, 1 (Audtor(fd, pk nt) A(fd, F, )), We say Λ s server unforgeable f the wnnng probablty for any ppt algorthm A s neglgble n l. Intutvely, we say a POSD scheme s (κ, θ) uncheatable f gven a fle F wth mn-entropy κ, no cheatng clent, who can fnd F contanng θ-bt Shannon entropy of F, convnces the server that t has F wth a probablty non-neglgbly more than (κ θ). Formally, we have: Defnton 3. ((κ, θ) uncheatablty) For a POSD scheme Λ = (Keygen, Upload, AudtInt, Dedup), consder the followng game between the adversary A (who plays the role of the compromsed clents) and the challenger (who plays the role of the server and an honest clent). Setup Stage: Run algorthm Keygen to generate (pk nt, sk nt) as well as (pk dup, sk dup ). Make pk nt and pk dup publc, ncludng gvng pk nt and pk dup to the adversary A. For a compromsed clent, the correspondng sk nt s gven to A. However, both sk dup and sk nt correspondng to the honest clent are only gven to the challenger. (Note that f the sk nt correspondng to the honest clent s gven to A, A could use t to authentcate to the server to download the clent s any data outsourced to the server.) The challenger chooses a data fle F of κ-bt mn-entropy, and a unque dentfer fd. The challenger honestly executes the Upload protocol by playng the roles of both the clent and the server, and gves the publcly observable nformaton to the adversary A. Challenge Stage: At ths stage, A seeks to nfer the content of F by runnng the Upload, AudtInt and Dedup protocols wth the challenger. In partcular, A may penetrate nto the cloud server to learn some portons of F. Ths s reasonable because stealng the whole F, oranyformofts compressed verson (because F has enough mn-entropy and thus Shannon entropy), could alert the defender about the compromse because of the abnormal use of network bandwdth and/or CPU resources. Moreover, sublmnal channels normally do not offer bandwdth compatble to the magntude of κ. Note that the above adversaral model also accommodates that A can command the compromsed clents to launch some type of guessng attacks wth respect to F, whch s possble for example when F has a publc structure (e.g., Word document or move fle). In any case, suppose A learned up to θ-bt Shannon entropy of F. Forgery Stage: A eventually outputs some F. A wns the game f 1 (S(fd, Tag dup, [sk dup, ]pk dup ) A(fd, F )). We say Λ s (κ, θ) uncheatable f the wnnng probablty for any ppt algorthm A s neglgble (n l) morethan (κ θ). Note that κ θ l would be the most often cases because we manly deal wth large data fles, and thus A s wnnng probablty s effectvely requred to be neglgble n l. Dscusson. In the above securty defntons, we dd not consder the noton of farness, whch was very recently ntroduced to prevent a dshonest clent from legtmately accusng an honest cloud server of tamperng ts data n the settng of dynamc POR [16]. Ths s because POSD n ths paper deals wth statc data, rather than dynamc data where farness can be reasonably nvolved [16]. For statc data, farness can be easly acheved by lettng a clent sgn sgn F n the Upload protocol or after a successful executon of the Dedup protocol. 5. POSD CONSTRUCTION AND ANALYSIS 5.1 Basc Ideas As dscussed above, t s conceptually convenent to thnk POSD=PDP/POR+POW because POSD ams to fulfll the functonaltes of both ntegrty audt and deduplcaton. In ths paper, we focus on the scenaro of POSD=PDP+POW because of the followng. Frst, POR s more costly that PDP due to ts use of Error Correctng/Erasure Codes for fulfllng retrevablty. Second, n the Dedup protocol of POSD (and POW), retrevablty s actually not needed because the server already knows F. Nevertheless, the basc deas are equally applcable to the scenaro of POSD=POR+POW. In what follows we frst elaborate the nsght that led us to our desgn. Relatonshp between PDP/POR and POW, revsted. From the defnton of POSD, we see some smlarty between 4

the AudtInt protocol and the Dedup protocol. Specfcally, both protocols are n a sense for verfyng ntegrty except that one s for data n the cloud-end (.e., cloud server attests to clent) and the other s for data n the clent-end (.e., clent attests to cloud server). Because the AudtInt protocol s the core of PDP/POR, there s some smlarty between PDP/POR and POW (as noted n [13]) as well as POSD. However, t s stated n [13] that PDP/POR protocols are not applcable n the settng of POW (and thus POSD) we call ths the Deduplcaton Gap between PDP/POR and POW/POSD because of the followng: In PDP/POR, there s a preprocessng step that facltates that the clent can later verfy the ntegrty of ts data n the cloud. Whereas, n the settng of POW, a new clent possesses a secret data fle F, but no other secrets. Halev et al. [13] correctly excluded the possblty of usng PDP/POR based on symmetrc key cryptosystems [, 1] for the purpose of POW (because the new clent does not, and should not, know the secret keys). We observe, however, that Publcly Verfable PDP/POR protocols are actually suffcent for the purpose of POW. As we wll see, ths s made possble because the clent can compute the needed nformaton from F on the fly and wthout usng any secret nformaton. Why s the publc verfablty suffcent to brdge the above Deduplcaton Gap? Conceptually, publc verfablty of PDP/POR meanng that a thrd party, who may be gven some non-secret nformaton by a clent, can verfy the ntegrty of the clent s data n the cloud. Puttng ths nto the settng of POW/POSD, wecanletthededup protocol be essentally the same as the core PDP/POR protocol by reversng the roles of the clent and the server. More specfcally, f the AudtInt protocol s publcly verfable, t s possble that only the publc key pk nt s needed for the cloud server and only the data fle F (as well as the publc parameters, of course) s needed for the clent. 5. Constructon Recall that q s an l-bt prme and p s another prme such that q (p 1). G and G T are cyclc groups of prme order q, ande : G G G T s a blnear map. We use two hash functons (random oracles) H 1 : {0, 1} G and H : {0, 1} Z q. F s a data fle consstng of n blocks of m symbols n Z q, namely F =(F 1, F m), where F Z m q for 1 n. F s unquely dentfed by fd. The POSD scheme s descrbed as follows (n the end of ths subsecton we wll explan some desgn decsons to help understand our scheme): Keygen: Ths algorthm generates cryptographc keys as follows: Select v 1 and v unformly at random from Z p such that the orders of v 1 and v are q (f v s generated from v 1, then the DLOG of v to base v 1 should be erased afterwards). Select s 1,s unformly at random from Z q for 1 m. Set z = v s 1 1 v s mod p for 1 m. Let g be a generator of G. Select u unformly at random from G. Select w unformly at random from Z q, and set z g = g w. Set pk nt = {q, p, g,u, v 1,v,z 1,,z m,z g} and the clent s prvate key sk nt = {(s 11,s 1),, (s m1,s m),w}. Note that usng an approprate Pseudorandom Functon (PRF), we can further reduce the storage at the clent-end to constant (.e., usng a sngle key to the PRF for generatng the s 11,s 1,,s m1,s m,w). Set pk dup = pk nt and sk dup = null, wherepk dup s also made publc. Upload: Ths protocol s performed between a clent, who s to outsource a data fle F to the cloud, and the cloud server as follows: For each data block F,where1 n, the clent selects r 1,r unformly at random from Z q and computes: x = v r 1 1 vr mod p, m y 1 = r 1 + F s 1 mod q, y = r + t = m F s mod q, (H 1(fd ) u H (x ) ) w (n G). The clent sends (fd, F, Tag nt ) to the server, where Tag nt = {(x,y 1,y,t ) 1 n }. Upon recevng (fd, F, Tag nt ), the server sets Tag dup = Tag nt. AudtInt: Ths protocol s executed between an audtor, whch can be the clent tself, and the cloud server to verfy the ntegrty of the clent s data fle F stored n the cloud. Note that the clent does not need to gve any nformaton to the audtor except the publc keys pk nt and the data fle dentfer fd. The audtor chooses a set of c elements I = {α 1,...,α c} where α s selected unformly at random from {1,...,n}, and chooses a set of coeffcents β = {β 1,...,β c} where β s selected unformly at random from Z q. The audtor sends chal =(I,β)totheserver. The server computes: for 1 m, and μ = Y 1 = Y = T = β F mod q β y 1 mod q, β y mod q, t β (n G). The server sends resp =({μ } 1 m, {x }, Y 1, Y, T) to the audtor, where x = v r 1 1 vr mod p for 1 n were generated by the clent n the executon of the Upload protocol. 5

Upon recevng resp, the audtor parses resp as {{μ } 1 m, Y 1, Y, T, {x } }, computes and verfes e(t,g) X = x β mod p, W = H 1(fd ) β, X =? v Y 1 1 v Y z μ mod p? = e(wu β H (x ),z g) (n G T ). If both hold, return 1; otherwse, return 0. Dedup: Ths protocol s executed between the clent, who clams to have a data fle F wth dentfer fd that was already outsourced to the cloud by another clent, and the server. Ths s a smple varant of the above AudtInt protocol, where the audtor only needs to know the publc keys pk nt and the data fle dentfer fd. Here, we let the cloud server play the role of the audtor (wth some mnor adaptatons because there are some nformaton that s not known to the clent n producng the response), who naturally knows pk nt and fd. The server chooses a set of c elements I = {α 1,...,α c} where α s selected unformly at random from {1,...,n}, and chooses a set of coeffcents β = {β 1,...,β c} where β s selected unformly at random from Z q.theserver sends chal =(I,β) to the clent. The clent computes made to satsfy both the above performance desgn requrements and the securty defntons. Frst, our Upload and AudtInt protocols are new. When compared wth exstng protocols for the smlar purpose [, 1, 8, 5, 6, 16], t has the followng sgnfcant advantage durng the executon of the Upload protocol (a thorough comparson wll be present n Secton 5.5). Our scheme only requres the clent to perform O(n) exponentaton operatons plus O(mn) multplcaton operatons, where n s the number of data blocks and m s number of symbols n each block (.e., mn s the number of symbols n a data fle). In contrast, the referred schemes requre the clent to perform O(mn) exponentaton operatons plus O(mn) multplcaton operatons. As such, our AudtInt protocol would be of ndependent value as t could also be used as the core of PDP/POR protocols. Second, n our POSD scheme we used z = v s 1 1 v s for verfcaton, whch s remnscent of the sgnature scheme n [11]. However, our scheme s not a dgtal sgnature scheme because we actually allow a sort of manpulaton. On the other hand, we use z = v s 1 1 v s rather than, for example, z = v s 1 1. Ths s because securty of our constructon partally reles on the DLOG problem, or more precsely the DLOG of v wth respect to base v 1. Thrd, n the Upload protocol, the purpose of t s to prevent the server from forgng any new legtmate tuple of (x,y 1,y ) from a legtmate (x,y 1,y ) andf. To see ths, let us consder the case wthout usng t. Note that (x,y 1,y ) wthrespecttoblockf satsfes x = v y 1 1 v y z F mod p. μ = β F mod q for 1 m, and sends resp =({μ } 1 m )tothe server. The server computes from Tag dup = {(x,y 1,y,t ) 1 n }: Y 1 = β y 1 mod q, Wthout usng t, the server can choose r 1 and r from Z q, and set x = x v r 1 1 vr mod p, y 1 = y 1 + r 1 mod q, y = y + r mod q The server verfes e(t,g) Y = β y mod q, W = H 1(fd ) β, X = x β mod p, T = X =? v Y 1 1 v Y t β (n G). z μ mod p,? = e (Wu ) β H (x ),z g If both hold, return 1; otherwse, return 0. (n G T ). Dscusson on some desgn decsons. To help understand our scheme, now we dscuss some desgn decsons we so that (x,y 1,y ) also satsfes x = v y 1 1 v y z F mod p. As another example of attacks, the server can generate (x,y 1,y ) as follows: let x = x z 1 mod p, y 1 = y 1 and y = y. Durng the executon of the Upload protocol, the server may return F = {F 1 +1, F,...,F m} by addng one to the frst symbol of block F. As a consequence, (x,y 1,y ) and F also satsfy x = v y 1 1 v y z F 1+1 1 = z F mod p. 5.3 Correctness Analyss Wth respect to correctness defnton, the correctness of 6

the POSD scheme can be verfed as follows: v Y 1 1 v Y and = v β y 1 z μ 1 v β y (v s 1 1 v s ) μ = v β y 1 1 v β y (v s 1 1 v s ) β F 1 v β (r + m F s ) = v β (r 1 + m F s 1 ) = v (v s 1 1 v s ) β F β r 1 1 v β r = (v r 1 1 vr ) β = x β = X. e(t,g) ( ) ( ( ) ) = e t β,g = e H 1(fd )u H wβ (x ),g = e ( ( ) ) H 1(fd )u H β (x ),g w ( = e H 1(fd ) = e (Wu ) β H (x ),z g. u H (x ) ) β,z g 5.4 Securty Analyss Now we prove that the POSD scheme satsfes Defntons and3. Theorem 1. Assume H 1 and H are hash functons modeled as random oracles, and the CDH problem s hard. The POSD scheme s server unforgeable. Proof. We show our proof through a sequence of games between a challenger, who plays the role of an honest clent, and adversary A, who acts as the malcous server. The overall proof strategy s: gven fd correspondng to (F, Tag nt ) stored n the server and a challenge randomly selected by the challenger, f the adversary can pass the verfcaton usng (F, Tag nt ) (F, Tag nt ), then there s an algorthm that can solve the CDH problem. Game 0: Game 0 s defned as n Defnton, where the challenger only keeps the relevant publc and prvate keys, and Q fd, whch s the lst of the data fle dentfers fd s t has used (as mentoned before, a PRF can reduce the storage of the fd s to constant). Game 1: Game 1 sthesameasgame 0 except that the challenger keeps Q = {(fd, F, Tag nt )} the lst of (fd, F, Tag nt ) nvolved n the executon of the Upload protocol. In ths case, we prove that f the adversary A can produce a forgery (fd, F, Tag nt ) / Qthat can pass the test n the AudtInt protocol wth respect to the challenger s challenge (I,β), then there s an effcent algorthm that can solve the CDH problem. The smulator s constructed as follows: For generatng the keys, the smulator works as follows: Select v 1 and v unformly at random from Z p such that the order of v 1 and v s q. Select unformly at random s 1 and s from Z q for 1 m. Setz = v s 1 1 v s mod p for 1 m. Let g be a generator of group G, andselecth from G at random. Set u = g γ h η,whereγ and η are chosen unformly at random from Z q. Select z g unformly at random from group G, whch means that the smulator does not know the correspondng w wth z g = g w. Set pk nt = {p, q, g, u, h, v 1,v,z 1,...,z m,z g} and pk dup = pk nt. However, the smulator only knows secrets sk = {(s 11,s 1),...,(s m1,s m)} but not the w. The smulator model H ( ) as a random oracle. Gven x,fx has been quered, return H (x ). Otherwse, select η unformly at random from Z q and return η. The smulator keeps the lst of (x,h (x )). When the smulator s asked to compute Tag nt for data fle F, the smulator executes the followng: for each data block F where 1 n, select r 1 and r unformly at random from Z q and computes x = v r 1 1 vr mod p, m y 1 = r 1 + F s 1 mod q, y = r + m F s mod q. Select λ unformly at random from Z q and set ( ) H 1(fd ) =g λ / u H (x ). Thus, we have ( ) t = H 1(fd )u H w (x ) =(g w ) λ =(z g) λ. Set the cryptographc tag for block F as (x,y 1,y,t ) and thus Tag nt = {(x,y 1,y,t ) 1 n }. The smulator keeps the lst of (fd, H 1(fd )). Note that λ s unknown to A. When A queres H 1(fd ) separately, the smulator operates as follows. If fd has been quered, return H 1(fd ). Otherwse, select λ unformly at random from Z q and return h λ. Note that λ s unknown to A. The smulator nteracts wth A untl A outputs a forgery (fd, (I,β), {x }, Y 1, Y, T, {μ } 1 m )attheforgery stage and wns the game, where (I,β)schosenatrandom by the smulator. 7

Suppose A produces (fd, (I,β), {x }, Y 1, Y,T, {μ } 1 m ) As the probablty that to wn Game 1. Ths means that fd Q fd, but β (H (x ) H (x )) = 0 ({μ } 1 m, {x }, Y 1, Y, T ) ({μ } 1 m, {x }, Y 1, Y, T), (1) where μ = βf mod q, μ = s neglgble and u = g βf mod q, and γ h η,wehave (fd, F, Tag nt ) Qfrom whch {x }, Y 1, Y, T are computed. The correctness of the scheme mples h = g ( ) e(t,g)=e H 1(fd ) β u β H (x ),z g Snce A wns n Game 1,wehave ( ) e(t,g)=e H 1(fd ) β u β H (x ),z g. (3) In what follows, we wll consder three cases of Eq. (1): Case 1: T T. Case : T = T, but x x for some I. Case 3: T = T, x = x for all I, but (Y 1, Y, {μ } 1 m ) (Y 1, Y, {μ } 1 m ). In each case, we wll utlze, among other thngs, Eqs. () and (3), to show that the smulator can solve the CDH problem, whch means that A cannot wn Game 1 wth a nonneglgble probablty. Ths wll complete the proof. Case 1: T T. In ths case, we have e(t/t,g)=e (u ) β (H (x ) H (x )),z g. By substtutng u wth g γ h η,wehave e(t/t,g)=e ((g γ h η ) ) β (H (x ) H (x )),z g. Rearrange the terms, we get T/T =(g wγ h wη ) β (H (x ) H (x )). We clam that β (H (x ) H (x )) 0 modq. Otherwse, we get T/T = 1, whch contradcts the assumpton T T. Together wth the fact that z g = g w,wecan get ( h w = (T/T ) z γ( (H (x ) H (x )) g ) 1 () (H (x ) H (x )), whch means that f T T, the smulator can solve the CDH problem by computng h w wth respect to gven g and z g = g w for unknown w. Case : T = T, but x x for some I. Because T = T,wehave H 1(fd ) β u β H (x ). H 1(fd ) β u βh(x) = By arrangng the term, we have u β (H (x ) H (x )) =1. γ(h (x ) H (x )) η(h (x ) H (x )) mod q = g γη 1 (H (x ) H (x )). Ths means that the smulator can solve the DLOG of random h wth respect to base g, whch mmedately breaks the CDH assumpton. Case 3: T = T, x = x for all I, but (Y 1, Y, {μ } 1 m ) (Y 1, Y, {μ } 1 m ). Note that and x β = v Y 1 1 vy x β = v Y 1 1 vy z μ mod p z μ mod p. Because x = x for all I, wehave v Y 1 1 vy z μ = v Y 1 1 vy z μ mod p. By replacng z wth v s 1 1 v s n the above equaton, we get 1 = v Y Y m s 1 (μ μ ) mod p. v Y 1 Y 1 m s 1 (μ μ ) Thus, f (Y 1, Y, {μ } 1 m ) (Y 1, Y, {μ } 1 m ), then the smulator can compute the DLOG of random v wth respect to base v 1, whch mmedately breaks the CDH assumpton. Theorem. Assume H 1 and H are hash functons modeled as random oracles, and the CDH problem s hard. The POSD scheme s (κ, θ) uncheatable wth respect to challenge of c = n θ κ log(θ κ +ɛ) blocks n the Dedup protocol, where κ s the mn-entropy of the fle F n queston, θ s the amount of entropy leaked to or stolen by the adversary, and ɛ s neglgble n securty parameter l. Proof. Accordng to Theorem 1, gven that H 1 and H are hash functons modeled as random oracles, and the CDH problem s hard, our scheme s server unforgeable. That s, gven the challenge {I,β} wth fle dentfer fd, the response {μ 1,,μ m, {x }, Y 1, Y,T} must be computed honestly from (fd, F, Tag nt ), so that x β = v Y 1 1 v Y ( e(t,g) = e z μ mod p, H 1(fd ) β u β H (x ),z g ) Wthout loss of generalty, let (I,β) be the challenge wth correspondng fle dentfer fd n the executon of Dedup. Let (μ 1,,μ m) be the response from the malcous clent and {x }, Y 1, Y, T are computed from Tag dup by the 8

cloud server. Recall that Tag dup = Tag nt, then we have {x = x }, Y 1 = Y 1, Y = Y, T = T. Therefore, n order to satsfy x β = v Y 1 1 vy z μ mod p, ( ) e(t,g) = e H 1(fd ) β u β H (x ),z g we have μ = μ, 1 m, otherwse we can solve the DLOG problem of random v wth respect to base v 1 (see case 3 n the proof of Theorem 1). That s, the malcous clent must compute (μ 1,,μ m)honestlyfromf. Inother words, the malcous clent can wn the game only f t can fgure out the unknown bts entropy n the data blocks specfed by the set I. Let Evnt denote the event that there are c data blocks wth unknown bts entropy and the adversary tres to guess the unknown bts entropy n order to cheatng successfully. In order to smplfy the model, assume that the unknown bts entropy dstrbutes over the data blocks of F unformly. Meanwhle, because the challenged data blocks are chosen unformly at random so that we can assume that the unknown bts entropy dstrbutes over F unformly. Therefore the probablty 1 Pr[Evnt] = c(κ θ)/n = c(θ κ)/n = log(θ κ +ɛ) = θ κ + ɛ = 1 + ɛ. κ θ Ths completes the proof. 5.5 Performance Analyss (c = n θ κ log(θ κ + ɛ)) Fgure 1 shows the requred sze of challenges n order to acheve ntegrty assurance n the nterval [0.991, 0.999] under three crcumstances: Err= 0.1,Err= 0.05andErr= 0.01. Consder, for example, the case of Err= 0.01. It only requres to send less than 500 challenges n order to acheve 99.1% ntegrty assurance, regardless of the sze of the data blocks. Ths also explans the advantage of POR and PDP. log(c) 15 10 5 ɛ = 60 ɛ = 70 ɛ = 80 0 51K 576K 640K 704K 768K 83K 896K 960K 1M 1 θ Fgure : The mpact of θ on the challenge sze c: n = 15 and κ =1M bts Theorem shows a lower bound on the number of challenged data blocks n order to fulfll (κ, θ) uncheatablty. In order to llustrate the mpact of the lower bound of c wth parameters κ, θ, n and ɛ, we consder two cases: one fle wth small mn-entropy (1M bts) and the other wth large mn-entropy (18M bts). Wth respect to neglgble probablty 80, Fgure shows that gven a fle of 15 data blocks wth 1M bts mn-entropy, our scheme can fulfll (1M,960K) uncheatablty by challengng about 6 data blocks (or 0.1% portons of the data fle). 700 600 30 ɛ = 60 c 500 400 Err = 0.1 Err = 0.05 Err = 0.01 5 0 ɛ = 70 ɛ = 80 300 log(c) 00 15 100 10 0 0.991 0.99 0.993 0.994 0.995 0.9960.997 0.998 0.999 Integrty Assurance Fgure 1: The mpact of Err and ntegrty assurance on challenge sze c On the sze of the challenges. The sze of the challenges n the AudtInt protocol s an mportant performance parameter. Let Err be the probablty of block beng corrupted (.e., portons of the data modfed by the server). 5 64M 7M 80M 88M 96M 104M11M10M18M 1 θ Fgure 3: The mpact of θ on the challenge sze c: n = 7 and κ = 18M bts Fgure 3 shows that gven a fle of 7 data blocks wth 18M bts mn-entropy, fulfllng (18M, 10M) uncheatablty requres to challenge about 9 data blocks (or 18 portons 9

of the data fle). Ths shows that even f the adversary has obtaned 93.75% of the data fle (e.g., by penetratng nto the cloud server n a stealthy fashon and wthout beng detected), the attacker cannot cheat aganst reasonably small challenges. Comparson wth some relevant schemes. Because POSD s the frst scheme that smultaneously allows proof of storage ntegrty and deduplcaton, n Table 1 we compare ts effcency to the most effcent PDP scheme n [], the most effcent POR scheme n [1], and the only exstng POW scheme n [13], respectvely. The two partcular PDP and POR schemes are chosen also because they offer the afore-mentoned publc verfablty, namely that a thrdparty can examne the storage ntegrty on behalf of a clent, whch s exploted to construct POSD. ThePOW scheme s the one based on Merkle-tree n [13]; t s chosen because ts securty s compatble wth our POSD scheme (there are more effcent but less secure solutons n [13]). Note that n the clent storage, we consder a sngle fle F. In prncple, each clent can outsource polynomally-many data fles to the cloud. In ths case, storage of the dentfes, fd s, stll can be made constant by lettng each clent use a Pseudorandom Functon PRF to generate fd s from ts secret key whle mantanng a counter. From the perspectve of assurng cloud data storage ntegrty, we draw the followng observatons from Table 1. Frst, our POSD scheme requres O(n) exponentatons for a clent to preprocess a data fle before uploadng t to the cloud. Ths complexty s substantally smaller than the preprocessng complexty O(mn) exponentatons of the schemes n [, 1]. Second, our POSD scheme ncurs O((m+c)l) communcaton overhead n the audt process, whch s hgher than the O(ml) communcaton overhead of the PDP and POR schemes. To demonstrate that the extra communcaton s not sgnfcant especally when we deal wth large fles, let us consder the followng realstc example. Supposeadatafleconsstsof 7 blocks of 8 symbols (.5-GB fle f l = 160). Assume that the probablty of block corrupton s Err =0.01. That s, roughly 1 data blocks are corrupted. Suppose we want to acheve 99.5% ntegrty assurance (.e., wth probablty 99.5% the tamperaton of a data fle s detected), the extra communcaton overhead n our POSD scheme s only 16 bts (8KB). Moreover, t should be noted that the PDP and POR schemes cannot fulfll deduplcaton. From the perspectve of secure data deduplcaton,, we draw the followng observatons from Table 1. Frst, our POSD scheme s slghtly less effcent than the POW scheme. However, the POW scheme cannot fulfll the audtablty of cloud storage securty (note that t was well-known that Merkle-tree s not suffcent to fulfll PDP/POR [, 1, 16]). Second, our POSD scheme ncurs smaller communcaton overhead because O(ml) s often much smaller than O(c log(n)ml). Thrd, the POW scheme s ndeed secure n the standard model based on the exstence of Collson-Resstant Hash (CRH) functons. However, t cannot fulfll audtablty of cloud storage ntegrty. 6. CONCLUSION We motvated the need of the cloud storage noton we call proof of storage wth deduplcaton or POSD, to fulfll data ntegrty and duplcaton smultaneously. We also presented an effcent POSD scheme, whch s proven secure n the Random Oracle model based on the Computatonal Dffe- Hellman assumpton. Compared wth the PDP/POR/POW schemes, our scheme s as effcent as thers. One nterestng future work s to remove the random oracle n the protocol wthout eopardzng performance. Another s to seek a dfferent desgn methodology for such protocols so as to acheve even substantally better performance. Acknowledgements. Ths work was supported n part by an AFOSR MURI grant and a NSF grant. 7. REFERENCES [1] The dgtal unverse decade - are you ready? Internatonal Data Corporaton, 010. http://dcdocserv.com/95. [] G. Atenese, R. Burns, R. Curtmola, J. Herrng, L. Kssner, Z. Peterson, and D. Song. Provable data possesson at untrusted stores. In Proceedngs of the 14th ACM conference on Computer and communcatons securty, CCS 07, pages 598 609, New York, NY, USA, 007. ACM. [3] G. Atenese, R. D Petro, L. V. Mancn, and G. Tsudk. Scalable and effcent provable data possesson. In Proceedngs of the 4th nternatonal conference on Securty and prvacy n communcaton netowrks, SecureComm 08, pages 9:1 9:10, New York, NY, USA, 008. ACM. [4] G. Atenese, S. Kamara, and J. Katz. Proofs of storage from homomorphc dentfcaton protocols. In Proceedngs of the 15th Internatonal Conference on the Theory and Applcaton of Cryptology and Informaton Securty: Advances n Cryptology, ASIACRYPT 09, pages 319 333, Berln, Hedelberg, 009. Sprnger-Verlag. [5] K. D. Bowers, A. Juels, and A. Oprea. Proofs of retrevablty: theory and mplementaton. In Proceedngs of the 009 ACM workshop on Cloud computng securty, CCSW 09, pages 43 54, New York, NY, USA, 009. ACM. [6] Y. Dods, S. Vadhan, and D. Wchs. Proofs of retrevablty va hardness amplfcaton. In Proceedngs of the 6th Theory of Cryptography Conference on Theory of Cryptography, TCC 09, pages 109 17, Berln, Hedelberg, 009. Sprnger-Verlag. [7] J. R. Douceur, A. Adya, W. J. Bolosky, D. Smon, and M. Themer. Reclamng space from duplcate fles n a serverless dstrbuted fle system. In Proceedngs of the nd Internatonal Conference on Dstrbuted Computng Systems (ICDCS 0), ICDCS 0, pages 617, Washngton, DC, USA, 00. IEEE Computer Socety. [8] C.Erway,A.Küpçü, C. Papamanthou, and R. Tamassa. Dynamc provable data possesson. In Proceedngs of the 16th ACM conference on Computer and communcatons securty, CCS 09, pages 13, New York, NY, USA, 009. ACM. [9] D. Harnk, B. Pnkas, and A. Shulman-Peleg. Sde channels n cloud servces: Deduplcaton n cloud 10

PDP [] POR [1] POSD (ths paper) POW [13] total key sze O(m) O(m) O(m) 0(nokeys) use Random Oracle? yes yes yes no securty assumpton RSA CDH CDH CRH For ntegrty audt purpose clent storage O(1) O(1) O(1) N/A server storage O(n) O(n) O(n) N/A audt preprocessng comp. O(mn)Ex+ O(mn)Mu O(mn)Ex+O(mn)Mu O(n)Ex+O(mn)Mu N/A audt computaton (clent) O(c)Ex+ O(cm)Mu O(c + m)ex+ O(cm)Mu O(c)Ex+ O(cm)Mu N/A audt computaton (server) O(c)Ex+ O(cm)Mu O(c)Ex+ O(cm)Mu O(c)Ex+ O(cm)Mu N/A audt communcaton O(ml) O(ml) O((m + c)l) N/A ntegrty assurance 1 (1 Err) c 1 (1 Err) c 1 (1 Err) c N/A For deduplcaton purpose dedup preprocessng comp. N/A N/A O(n)Ex+O(mn)Mu ECC +O(n )H dedup. computaton (clent) N/A N/A O(cm)Mu O(n )H dedup. computaton (server) N/A N/A O(c)Ex+O(cm)Mu O(c log(n))h dedup. communcaton N/A N/A O(lm) O(cml log(n)) Table 1: Effcency comparson between some PDP, POR, POW and our POSD schemes, where n s the number of blocks of a data fle, m s the number of symbols of a data block, c s the number of blocks that wll be challenged, Err s the probablty of block corrupton, Ex represents modular exponentaton operaton, Mu represents modular multplcaton operaton, and N/A ndcates that a property s not applcable to a certan scheme. storage. IEEE Securty and Prvacy, 8:40 47, November 010. [10] A. Juels and B. S. Kalsk, Jr. Pors: proofs of retrevablty for large fles. In Proceedngs of the 14th ACM conference on Computer and communcatons securty, CCS 07, pages 584 597, New York, NY, USA, 007. ACM. [11] T. Okamoto. Provably secure and practcal dentfcaton schemes and correspondng sgnature schemes. In Proceedngs of the 1th Annual Internatonal Cryptology Conference on Advances n Cryptology, CRYPTO 9, pages 31 53, London, UK, 1993. Sprnger-Verlag. [1] H. Shacham and B. Waters. Compact proofs of retrevablty. In Proceedngs of the 14th Internatonal Conference on the Theory and Applcaton of Cryptology and Informaton Securty: Advances n Cryptology, ASIACRYPT 08, pages 90 107, Berln, Hedelberg, 008. Sprnger-Verlag. [13] B. P. A. S.-P. Sha Halev, Danny Harnk. Proofs of ownershp n remote storage systems. Cryptology eprnt Archve, Report 011/07, 011. http://eprnt.acr.org/. [14] M. W. Storer, K. Greenan, D. D. Long, and E. L. Mller. Secure data deduplcaton. In Proceedngs of the 4th ACM nternatonal workshop on Storage securty and survvablty, StorageSS 08, pages 1 10, New York, NY, USA, 008. ACM. [15] Q. Wang, C. Wang, K. Ren, W. Lou, and J. L. Enablng publc audtablty and data dynamcs for storage securty n cloud computng. IEEE Trans. Parallel Dstrb. Syst., :847 859, May 011. [16] Q. Zheng and S. Xu. Far and dynamc proofs of retrevablty. In Proceedngs of the frst ACM conference on Data and applcaton securty and prvacy, CODASPY 11, pages 37 48, New York, NY, USA, 011. ACM. 11