Secure Deduplication of Encrypted Data without Additional Servers

Size: px
Start display at page:

Download "Secure Deduplication of Encrypted Data without Additional Servers"

Transcription

1 Secure Deduplication of Encrypted Data without Additional Servers Jian Liu Aalto University N. Asokan Aalto University and University of Helsinki Benny Pinkas Bar Ilan University Abstract Encrypting data on the client-side before uploading it to cloud storage is essential for protecting users privacy. However client-side encryption is at odds with the standard practice of deduplication in cloud storage services. Reconciling client-side encryption with cross-user deduplication has been an active research topic. In this paper, we present the first secure cross-user deduplication scheme that supports clientside encryption without requiring any additional independent servers. We demonstrate that our scheme provides better security guarantees than previous work. We also demonstrate that its performance is reasonable via simulations using realistic datasets. 1. INTRODUCTION Cloud storage is a service that enables people to store their data remotely with a storage provider. With a rapid growth in user base, storage providers tend to save storage costs via cross-user deduplication: if two clients want to upload the same file, the storage server detects the duplication and stores only a single copy. Deduplication achieves high storage savings [20] and is adopted by many cloud service providers. It is also adopted widely in backup systems for enterprise workstations. Clients who care about privacy want to encrypt their data on the client-side using semantically secure encryption mechanisms. However, naïve application of encryption thwarts deduplication. Reconciling deduplication and encryption is an active research topic. The current solutions are either dependent on the aid of additional independent servers [3, 23, 25] or only suitable for Peer-to-Peer (P2P) systems [13]. Reliance on independent servers is a strong assumption that is very difficult to meet in commercial contexts, and since most of the popular cloud storage services (e.g. Dropbox, Google Drive) are client-server systems, turning them into P2P systems is unrealistic. In this paper, we present the first single-server scheme for secure cross-user deduplication with client-encrypted data. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX...$ Our scheme allows two clients uploading the same file to the cloud storage to securely agree on the same file encryption key. We do this by building upon a well-known cryptographic primitive known as password authenticated key exchange (PAKE) [6]. A PAKE allows two parties to agree on a session key if and only if they share a short secret ( password ), which might have low entropy. In our deduplication scheme, PAKE protocol runs are facilitated by the storage server. It avoids the need for any additional independent servers and thus can be adopted easily into popular cloud storage services. In terms of security, our scheme makes use of per-file rate limits (namely, limiting the number of queries allowed per file), rather than per-client rate limits (limiting the number of queries allowed for each client, as was used in [3]). This change makes the scheme significantly more resistant against brute-force guessing attacks by colluding clients or a malicious storage server that attempts to masquerade as multiple clients. We make the following contributions: presenting the first single-server scheme that simultaneously enables the server to do cross-user deduplication and clients to encrypt their data on client-side using semantically secure encryption keys. (Section 5); showing that our scheme has better security guarantees than previous work (Section 6). As far as we know, our proposal is the first scheme that can prevent online brute-force guessing attacks by malicious clients or storage server, without the aid of an identity server; demonstrating, via simulations with realistic datasets, that our scheme provides its privacy benefits while retaining a utility level (in terms of deduplication effectiveness) on par with standard industry practice. Our use of per-file rate limits implies that an incoming file will not be checked against all existing files in storage. Surprisiingly, our scheme still achieves high deduplication effectiveness (Section 7); and implementing our scheme to show that it incurs minimal overhead (computation/communication) compared to traditional deduplication schemes (Section 8). 2. PRELIMINARIES 2.1 Deduplication Deduplication strategies can be categorized according to the basic data units they handle. One category is file-level deduplication which eliminates redundant files [12]. The other is block-level deduplication, in which files are segmented

2 into blocks and duplicated ones are eliminated [24]. In this paper we use the term file to refer both files and blocks. Deduplication strategies can also be categorized according to the side in which deduplication happens. In server-side deduplication, all files are uploaded to the storage server S, which then deletes the duplicated files. Clients C are unaware of deduplication. This strategy saves storage but not bandwidth. In client-side deduplication, C checks the existence of its files on S (by sending cryptographic hashes) before uploading them. Duplicates are not uploaded to the S. This strategy saves both storage and bandwidth. The effectiveness of deduplication is customarily expressed by the deduplication ratio, defined as the the number of bytes input to a data deduplication process divided by the number of bytes output [14]. In the case of a cloud storage service, this is the ratio between total size of files uploaded by all clients and the total storage space used by the service. Dedpulication percentage (sometimes referred to as space 1. deduplication ratio reduction percentage ) [14, 17] is 1 A perfect deduplication scheme will detect all duplicates. Hence its deduplication percentage will be close to 100%. The level of deduplication achieveable depends on a number of factors. In common business settings, deduplication ratios in the range of 4:1 (75%) to 500:1 (99.8%) are typical. Wendt et al [26] suggest that a figure in the range 10:1 (90%) to 20:1 (95%) is a realistic expectation. Although deduplication benefits storage providers (and hence, indirectly, their users), it also constitutes a privacy threat for users. For example, a cloud storage server that supports client-side, cross-user deduplication can be exploited as an oracle that asks queries of the form did anyone upload this file?. An adversary A can do so by uploading a file and observing whether deduplication takes place [17]. For a predictable file that is known to be drawn from a small space, A can construct all possible files from that space, upload them and observe which causes deduplication so that A can identify the target file. Harnik et al. [17] propose a randomized threshold approach to address such online brute-force guessing attacks. Specifically, for each file F, S keeps a random threshold t F (t F 2) and a counter c F that indicates the number of Cs that have previously uploaded F. Client-side deduplication happens only if c F t F. Otherwise, S does a server-side deduplication. 2.2 Hash Collisions A hash function H: Φ {0, 1} n is a function that deterministically maps a binary string in Φ of arbitrary length to a binary string h of fixed length n, where h is called the hash value or simply the hash, and n is called the hash length of H. A cryptographic hash function can map distinct elements to a hash value such that it is infeasible to invert the function and find collisions (namely, the function is collision resistant). If a hash function is not collision resistant, its collisions can be calculated as Φ, on condition that the hash values 2 n are evenly distributed across the available range. While a hash function with a long hash length has small collisions, a hash function with a small hash length will have high collisions. We will use such a short hash to protect clients privacy. For a predictable file F, drawn from a small space Φ, if A knows H(F ), she can easily perform an offline bruteforce guessing attack. But if A only knows SH(F ), where SH() is a short hash function, it is hard for A to guess the exact content of F due to the high collision rate of SH(). 2.3 Additively Homomorphic Encryption A public key encryption scheme is additively homomorphic if given two ciphertexts c 1 = Enc(pk, m 1; r 1) and c 2 = Enc(pk, m 2; r 2) (i.e., encryptions of the messages m 1, m 2 using the same public key pk but different random values r 1, r 2), it is possible to efficiently compute Enc(pk, m 1 + m 2; r) with independent randomness r, even without knowledge of the corresponding private key. Examples of such encryption are Paillier s encryption [21], or El Gamal encryption [15] where addition is done in the exponent. In this paper, we use E()/D() to denote symmetric encryption/decryption, and use Enc()/Dec() to denote additively homomorphic encryption/decryption. We abuse the notation and use Enc(pk, m) to denote the random variable induced by Enc(pk, m; r) where r is chosen uniformly at random. In addition, we use and to denote homomorphic addition and subtraction respectively. 2.4 Password Authenticated Key Exchange Password-based protocols are commonly used for user authentication. However, such protocols are vulnerable to offline brute-force guessing attacks (also referred to as dictionary attacks ) since users tend to choose passwords with relatively low entropy that are hence guessable. Bellovin and Merritt [6] were the first to propose a password authenticated key exchange (PAKE) protocol, in which an attacker making a password guess cannot verify the guess without an online attempt to authenticate itself with the password. The protocol is based on using the password as a symmetric key to encrypt the messages of a standard key exchange protocol (e.g., Diffie-Hellman [11]), so that two parties with the same password will successfully generate a session key without revealing their passwords. Following this seminal work, many protocols were proposed to improve PAKE in several aspects, e.g., describing provably secure protocols [5, 7], weakening the assumption (i.e., working in standard model without random oracles) [16, 2], achieving a stronger proof model [10, 9] and improving the round efficiency [18, 5, 19]. We use PAKE as a building block of our deduplication protocol. The ideal functionality F pake is shown in Figure 1. Alice and Bob hold passwords pw a and pw b respectively. F pake outputs the same key to both of them if and only if pw a = pw b. Otherwise, it outputs two independent random keys to Alice and Bob. We use PAKE as a building block for our deduplication protocol. The chosen PAKE protocol should satisfy the following requirements: Implicit key exchange: It must be implicit, meaning that at the end of the protocol each party outputs a key but neither party learns if the passwords matched or not. In fact, many PAKE protocols were designed to be explicit where parties will know whether the exchange succeeded or not at the end of the protocol. Single round: It must be single-round so that it can be easily facilitated by the storage server. Pseudo-random keys: It must guarantee that if the passwords are different than neither party can distinguish the key output by the other party from a random key.

3 Inputs: Alice s input is a password pw a; Bob s input is a password pw b. Outputs: Alice s output is k a; Bob s output is k b. where if pw a = pw b then k a = k b, and if pw a pw b then Bob (resp. Alice) cannot distinguish k a (resp. k b ) from a random string of the same length. Figure 1: The ideal functionality F pake for password authenticated key exchange. Concurrent executions: It must allow multiple PAKE instances to run in parallel. There are two common security notions for PAKE under concurrent executions. One notion is concurrent PAKE, defined by [5, 7]. The other stronger notion is that of UC secure PAKE [10], which guarantees security for composition with arbitrary protocols, and with arbitrary, unknown and possibly correlated password distributions. Protocols secure according to the notion of UC security (e.g. in [10]) are currently much less efficient than protocols secure according to the notion of concurrent PAKE. We therefore use a concurrent PAKE protocol in our work. Our deduplication scheme uses the SPAKE2 protocol of Abdalla and Pointcheval [1], which is described in Figure 2. This protocol is secure in the concurrent setting, in the random oracle model. It is very efficient, requiring each party to compute just three exponentiations, and send just a single group element to the other party. Theorem 5.1 in [1] states that this protocol is secure assuming that the computational Diffie-Hellman problem is hard in the group used by the protocol. 3. PROBLEM STATEMENT 3.1 General Setting Our generic setting for cloud storage systems consists of a storage server S and a set of clients (Cs) who store their files on S. Cs never communicate directly, but exchange messages with S, and S processes the messages and/or forwards them as needed. In addition, third parties (T Ps) can be introduced to the system to assist deduplication. For example, T Ps can be used for key generation [3], keeping track of file popularities [25] and even encryption/decryption [23]. But independent T Ps are expensive to deploy in practice and can be bottlenecks for both security and performance. Introducing a T P which is independent from S is often unrealistic in commercial setting. We assume that parties communicate through secure channels in the system, so that an adversary A cannot eavesdrop and tamper any data in the channel. Thus we assume that S can authenticate and authorize Cs, and Cs can authenticate S as well. We introduce new notations as needed. A summary of Inputs: The input of the uploading client C is F ; Each existing client C i has inputs F i and k Fi ; S s input is a set of existing encrypted files {E(k Fi, F i)}. Outputs: C learns a key k F with which to encrypt its file. If F is identical to an existing file F i then k F = k Fi. Otherwise k F is random; Each C i s output is empty; S learns the encryption of F with the key k F. Note that if there exists F i = F then this encryption is identical to the encryption of F i. Figure 3: The ideal functionality F dedup of deduplicating encrypted data. notations appears in Appendix A. 3.2 Ideal Model We define an ideal model that attempts to capture the ideal functionality F dedup of deduplicating encrypted data. The model has three types of participants: the storage server S, an uploading client C attempting to upload a new file, and existing clients {C i} who have already uploaded a file. The ideal functionality is shown in Figure 3. A deduplication system for encrypted files is secure if it implements F dedup. In particular, no participant learns more information than is defined in the output of the functionality. Our system implements this functionality with the following relaxation: S learns a short hash of F (say, 13 bits long), and other clients that have previously uploaded files with this hash value might learn that a new file with this short hash is being uploaded. In a large scale system this short hash agrees with many files and therefore leaks limited information about the uploaded file. 3.3 Design Goals Threat Model: An adversary A might control any subset of Cs of the system. It might also have control over S: it can gain access to the data stored at S, perform offline bruteforce guessing attacks, and might even mount active attacks by deviating from the deduplication protocol and potentially masquerading as Cs of the system. A may also control any third party T P used by the deduplication scheme. When client-side deduplication takes place then A masquerading as C uploading a file, and wishing to identify whether a specific file was already uploaded by a different C, can apply an online brute-force attack as described by Harnik et al. [17]. The randomized approach of [17], described in Section 2.1, can mitigate that attack. However, even if the deduplication scheme is secure against offline brute-force guessing attacks, an A controlling both S and some Cs can still mount a similar online brute-force attack by running the deduplication protocol for every guess of a predictable file and checking if deduplication occurs. This attack is online since, as will become clearer in Section 4,

4 Public information: A finite cyclic group G of prime order p generated by an element g. Public elements M u G associated with user u. A hash function H modeled as a random oracle. Secret information: User u has a password pw u. The protocol is run between Alice and Bob: 1. Each side performs the following computation: Alice chooses x R Z p and computes X = g x. She defines X = X (M A) pw A. Bob chooses y R Z p and computes Y = g y. He defines Y = Y (M B) pw B. 2. Alice sends X to Bob. Bob sends Y to Alice. 3. Each side computes the shared key: Alice computes K A = (Y /(M B) pw A ) x. She then computes her output as SK A = H(A, B, X, Y, pw A, K A). Bob computes K B = (X /(M A) pw B ) y. He then computes his output as SK B = H(A, B, X, Y, pw B, K B). Figure 2: The SPAKE2 protocol of Abdalla and Pointcheval [1]. A has to run a protocol invocation with the uploading C for every guess of the content of the uploaded file. Security Goals: With the above threat model in mind, we aim to propose a deduplication scheme that both enables S to do deduplication and Cs to encrypt their files with a semantically secure encryption scheme. Our security goals are as follows. S1 Avoid relying on any T P. S2 Prevent brute-force guessing attacks: S2.1 online by compromised Cs S2.2 offline by compromised passive S S2.3 online by compromised active S (masquerading as multiple Cs) Functional goals: In addition, our protocol should also meet certain functional goals. F1 Maximize deduplication effectiveness (exceed realistic expectations, as discussed in Section 2.1). F2 Minimize computational and communication overhead (i.e., the computation/communication costs should be comparable to storage systems without deduplication). 4. OVERVIEW OF THE SOLUTION We first motivate salient design decisions in our deduplication protocol before describing the details in Section 5. When an uploader C wants to upload a file F to S, we need to address two problems: (a) determining if S already has an encrypted copy of F in its storage and (b) if so, then securely arranging to have the encryption key transferred to C from some C i who uploaded the original encrypted copy of F. In traditional client-side deduplication, when C wants to upload a file F to S, she first sends a cryptographic hash h of F to S so that it can check for the existence of the file in its storage. Naïvely adapting this approach to the case of encrypted storage is insecure because a compromised S can easily do offline brute-force guessing attacks on h if F is predictable. Therefore, instead of h, we let C send a short hash sh = SH(F ). Due to the high collision rate of the short hash function SH(), S cannot reliably guess the content of F offline. Now, suppose that another C i, previously uploaded E(k Fi, F i) using k Fi as the symmetric file encryption key for F i and that sh = sh i. Our protocol needs to determine if this happened because F = F i and, in that case, arrange to have k Fi securely transferred from C i to C. We do this by having C and C i engage in an oblivious key sharing protocol which allows C to receive k Fi iff F = F i and a random key otherwise. We say that C i plays the role of a checker in this protocol. The oblivious key sharing protocol could be implemented using generic solutions for secure two-party computation, such as Yao s protocol [28], which express the desired functionality as a boolean circuit and then securely compute it. Protocols of this type have been demonstrated to be very efficient, even with security against malicious adversaries. In our setting the circuit representation is actually quite compact, but the problem in using this approach is that the inputs of the parties are relatively long (say, 288 bits long, comprising of a full-length hash value and a key), and known protocols require an invocation of oblivious transfer, namely of public-key operations, for each input bit. There are known solutions for oblivious transfer extension, which use a preprocessing step to reduce the online computation time of oblivious transfer. However, in our setting the secure computation is run between two clients that do not have any pre-existing relationship, and therefore preprocessing cannot be computed before the protocol is run. Our solution for an efficient oblivious key sharing protocol is based on having C and C i run a PAKE instance, with the hash values of their files, namely h and h i, as their respective input passwords. The protocol results in C i getting k i and C getting k i, which are equal if h = h i and are independent otherwise. The next step of the protocol, which will be described in Section 5, uses these keys to deliver to C a key k F, which is equal to k Fi iff k i = k i. C uses this key to encrypt her uploaded file, and S can deduplicate that file if this file is equal to the file uploaded by C i. Several additional issues need to be solved: 1. A compromised S can perform an online brute-force guessing attack where it initiates many interactions with C/C i to identify F /F i. Each protocol interaction essentially enables a single guess about the content of the relevant file. Cs therefore use a per-file rate limiting strategy to prevent such attacks. Specifically, they set

5 sh 1 root E(k F1, F 1)E(k F2, F 2)... C 1 C 2... sh 2... Figure 4: S s record structure. a bound on the maximum number of PAKE requests they would service as a checker or an uploader for each file. (See Section 5.1.) Our experiments with real data sets in Section 7 show that this rate limiting does not affect the success of deduplication. 2. What if C i is not online? If S has a large number of clients, it can find enough online checkers. If there are not enough online checkers, we can let the uploader run PAKE with the currently available checkers and with additional dummy checkers to hide the number of available checkers. (See Section 5.2.) Again, our experiments in Section 7 show that this does not affect the success of deduplication (since it is likely to find checkers for popular files). 3. How can S do client-side deduplication securely? S will know whether C and C i have the same file from the results of PAKE. If they do have the same file, S just inform C that she need not upload her file. In order to solve the problem of online brute-force guessing attacks by C, when client-side deduplication takes place, we can use the same randomized threshold strategy as [17] (see Section 5.3). 5. DEDUPLICATION PROTOCOL In this section we describe our deduplication protocol in detail. The data structure maintained by S is shown in Figure 4. Clients Cs who want to upload a file also upload the corresponding short hash sh. Since different files may have the same short hash, a hash sh is associated with the list of different encrypted files whose plaintext has this hash value. S also keeps track of clients (C 1, C 2,...) who have uploaded the same encrypted file (therefore, with overwhelming probability, these clients have the same plaintext file encrypted with the same key). Figure 5 shows a basic secure deduplication protocol. Each C i that uploads a file F i stores at S a short hash sh i of that file (Step 0). When a client C wishes to upload a file F it sends the short hash of this file, sh, to S (Step 1). S identifies clients {C i} that have uploaded files with the same short hash (Step 2), and runs the following protocol with each of them (in the more refined version of the system, there is a bound on the number of these protocol invocations). Consider a specific C i who uploaded a file F i. It runs a PAKE protocol with C, where their inputs are the cryptographic hash values of F i and F respectively, and their outputs are keys k i and k i respectively (Step 3). We assume that the keys are sufficiently long so that k i can be divided to a left key and right key, k i = k il k ir, where k il is long enough so that collisions are unlikely (namely, k il >> 2 log n where n is the number of clients), and k ir can be used for symmetric encryption. The same also holds for k i. At this point, a naive solution would have been for C i to send to C an encryption of k Fi with k i, namely E(k Fi, k i). However, this would enable a subtle attack by the C to identify whether F was previously uploaded. Therefore, the protocol continues as follows: each C i sends to S the pair k il, (k ir + k Fi ), where addition is done in a field whose size will be defined next. C sends to S the pair k il and Enc(pk, k ir + r), as well as its public key pk, where this is an additively homomorphic encryption (Steps 4-5). After receiving these messages from all C is, S looks for a pair i for which k il = k il. This equality happens iff F i = F (except with negligible probability). If S finds such a pair it sends to C the value e = Enc(pk, k ir + k Fi ) Enc(pk, k ir + r) = Enc(k Fi r). Otherwise it sends e = Enc(r ), where r is chosen randomly by S (Step 6). C calculates k F = Dec(e) + r, and sends E(k F, F ) to S (Step 7). Note that if F was already uploaded to the server then E(k F, F ) is equal to the previously stored encrypted version of the file. This basic protocol has several issues that are addressed in the following sections: it allows for online brute-force guessing attacks (which we prevent using rate-limiting in Section 5.1); it does not specify how checkers are selected by S, if too many checkers are available (Section 5.2), and, if client-side deduplication is used, it leaks to the client whether F was already uploaded (this is solved in Section 5.3 using a randomized threshold approach). Theorem 1. The deduplication protocol in Figure 5 implements F dedup in malicious model if the PAKE protocol is secure against malicious adversaries, the additively homomorphic encryption is semantically secure and the hash function is modeled as a random oracle. Proof. We assume there is a trusted party that implements F dedup in ideal model. We then construct a simulator, and show that the execution of F dedup in ideal model is computationally indistinguishable from the execution of the deduplication protocol in real model. For the purpose of the proof we assume that the PAKE protocol is implemented as an oracle to which the parties sends their inputs and receive their outputs. (Note that the PAKE functionality is assumed to be implemented using a fully-secure protocol; in fact, it could be implemented using generic secure computation protocols. Therefore this assumption is justified.) A corrupt uploader C: We first assume that S and the C is are honest and construct a simulator for the uploader C. The simulator operates as follows: The simulator records the calls that C makes to the hash function, which is modeled as a random oracle, and records tuples of the form (F j, h j, sh j ) of the inputs and outputs for these calls. The simulator first receives a value sh from the C. The simulator now obtains C s inputs to the different PAKE invocations. If such an input is equal to an h j value which was in the same tuple as sh the simulator invokes F dedup with F j and receives a key k F from F dedup. Note that an honest C uses the same h value in all PAKE invocations. A corrupt C might use different h values. Also,

6 0. When each C i uploaded its file F i, it uploads sh i and E(k Fi, F i). 1. C attempts to upload a file F. It calculates the cryptographic hash h and a short hash sh of F, and sends sh to S. 2. S finds the clients {C i } which uploaded files {F i} whose short hash sh i is equal to sh, and lets them run PAKE protocols with C. a C s input is h and C i s input is h i. 3. After the invocation of the PAKE protocols, each C i gets a session key k i and C gets a set of session keys {k i} corresponding to the different C i s. b 4. Each C i splits k i to k il k ir and sends k il and (k ir + k Fi ) to S. 5. C splits each k i to k il k ir, and sends to S its public key pk, {k il} and {Enc(pk, k ir + r)} where r is chosen randomly. 6. After receiving these messages from all C is, S checks if there is an index i for which k il = k il. (a) If this is the case then it uses the homomorphic properties of the encryption to send to C a value e = Enc(pk, k ir + k Fi ) Enc(pk, k ir + r) = Enc(k Fi r); c (b) Otherwise it sends e = Enc(r ), where r is chosen randomly by S. 7. C calculates k F = Dec(e) + r, and sends E(k F, F ) to S. 8. If S already stores E(k F, F ) it deletes E(k F, F ) and allows C to access the stored file E(k Fi, F i). Otherwise, S stores E(k F, F ). a The PAKE is run via S. There is no direct interaction between the C and each C i. C sends its first message of the PAKE protocol in Step 1. b Note that, with overwhelming probability, k i = k i iff F i = F. c If more than a single k il matches, then the server picks at random one of the matching values and computes the encryption using the corresponding k ir. This event does not happen if C is honest, but might happen if C uses in the PAKE protocols different h values that agree with the same short hash sh. Figure 5: The deduplication protocol. since sh is a short hash, there might be several (F j, h j, sh j ) tuples with the same sh value and therefore several legitimate h values for C to use. The simulator might therefore learn several k F values. The simulator stores for each h value a key k F, which is equal to the corresponding result from F dedup, or is random if h was not in a tuple with sh. The simulator sends back to C a random key k i for each PAKE invocation. It splits this key as k i = k il k ir, and stores the result together with the h value used in this PAKE invocation, and the key k F that corresponds to h. The simulator receives from C a set of pairs ({k il, Enc(k ir+ r)}. Let K = {k il k il = k il}, namely the set of k il values that agree with the values stored by the simulator. Let K be a maximal subset of K where each k il K corresponds to a different h value. Then if K is not empty the simulator chooses at random an element k il K and sends back to C the encryption Enc((k ir + k F ) (k ir + r)), computed using the homomorphic properties. Otherwise it sends to C an encryption of a random value. (Note that if F is not identical to any of the files F i then the key k F that C will retrieve is random, even if the simulator found a match of a the k il value.) A corrupt C i: In the ideal model C i always provides an input to the functionality. We prove security with relation to the relaxed functionality described in Section 3.2, where C i also learns whether the uploaded file has the same short hash as F i. The simulator interacts with C i in the real protocol and should extract C i s input to the functionality in the ideal model, namely (F i, k Fi ). We can assume that C i has previously sent E(k Fi, F i) to the server, and therefore the simulator should only find k Fi and use it to compute F i. The simulator first observes whether sh matches the short hash of C i. If that is not the case then the simulator provides a random k Fi to the ideal functionality. Otherwise, the simulator observes C i s input h i to the PAKE protocol, its output k i, and the message (k il, k ir +k Fi ) that C i sends to the server. If k il is different than the left part of k i then the simulator provides to the functionality a random value k Fi. Otherwise, it subtracts the right part of k i, namely k ir, from k ir + k Fi, and provides the result as the input to the functionality. A corrupt Server: In the protocol the server learns whether the k il value matched or not. This should be done in the simulation, too, but this information can be deduced from the output in the ideal model. Now we discuss several extensions to the basic protocol to account for the types of issues we alluded to in Section Rate Limiting A compromised active S can do online brute-force guessing attacks on either C or C i. Specifically, if F i is predictable, S can pretend to be uploader and send PAKE requests to C i attempting to guess F i. S can also pretend to be checkers and send PAKE responses to C to guess F. Both uploaders and checkers should limit the number of PAKE runs for each file in their respective roles. We use RL c to denote the rate limit for checkers, i.e., each C i can process at most RL c PAKE requests for F i and will ignore further requests. Similarly, RL u is the rate limit for uploaders: i.e. an C will send at most RL u PAKE requests to upload F. This strategy can both improve security (shown in Section 6.3) and

7 reduce overhead (shown in Section 7). 5.2 Checker Selection On receiving an upload request, S selects matching files in descending order of popularity (in terms of number of Cs who own the file) and then choose checkers for that file in ascending order of engagement (in terms of the number of PAKE requests they have serviced so far as checkers for that file). However, the number of PAKE instances an uploader is asked to run for a given file may reveal information about the popularity of that file. To avoid this, S may need participate in fake PAKE runs with C using random inputs. More concretely, after receiving an upload request from C, S generates a random value r 1 [1, RL u 1], and sets r 2 as RL u r 1. If no short hash matches sh a, S runs r 1 times PAKE immediately using random inputs. If S finds a sh i that matches sh and there are x files under sh i and y of them have owners whose clients are currently online, if y < r 1, S runs (r 1 y) times PAKE using random inputs; Then, for each of the y files, selects an online checker to run PAKE, and marks the other (x y) files as unchecked; if r 1 y, S selects r 1 of the y files based on popularity; and for each of them, selects an online checker to run PAKE. Then marks the other (x r 1) files as unchecked. At the end of the above procedure, if no match has been found, S picks a random PAKE instance and ask C to use the corresponding key to encrypt the file she wants to upload. Assume there are z unchecked files after the above procedure, if z < r 2, S runs (r 2 z) times PAKE using random inputs at random time interval, and whenever a checker for any of the z files comes online, sends a PAKE request to him; if r 2 z, S selects r 2 of the z unchecked files based on popularity, and whenever a checker for any of the r 2 files comes online, sends a PAKE request to it. During the above procedure, whenever a match is found, S stops sending PAKE requests to other checkers. But it still uses random inputs to make sure C has run PAKE r 1 times before uploading and r 2 times after uploading. 5.3 Randomized Threshold The protocol in Figure 5 is a sever-side deduplication scheme. To save bandwidth, we use the randomized threshold approach in [17] to transform it to client-side deduplication. For each file F, S keeps a random threshold t F (t F 2), and a counter c F that indicates the number of Cs that have previously uploaded F. In step 6 of the deduplication protocol, In the case of 6a, if c Fi < t Fi, S tells C to upload E(k F, F ) and then deletes it locally. Otherwise, S just informs C that the file is duplicated and there is no need to upload it. In the case of 6b, if there are still unchecked files, it asks C to run further PAKE instances when checkers owning those files come online. if S finally finds k jl = k jl, S puts C as an owner of E(k Fj, F j), if c Fj t Fj, S tells C to switch the key of F to k Fj and then deletes E(k F, F ) from its local storage; otherwise, S treats F as another copy of F j and keeps Enc(pk, k Fj r). If c Fj exceeds t Fj later, it sends Enc(pk, k Fj r) to C, tells her to switch the key of F and then deletes E(k F, F ) from its local storage. Otherwise, S puts E(k F, F ) under sh a and puts C as an owner of E(k F, F ). 6. SECURITY ANALYSIS In this section, we demonstrate how our deduplication scheme satisfies the security requirements in section 3.3. Our scheme doesn t rely on any T P, so it satisfies S Compromised Client Only A can compromise one or more legitimate Cs, and attempt online brute-force guessing attacks. From Section 5.2, we know that a C receives r 1 PAKE replies immediately after sending an upload request. Then C is either required to upload the encrypted file or is given access to an already existing encdrypted file. If C was asked to upload the encrypted file, it cannot still learn how many other Cs, if any, have already uploaded the same file. If C was not required to upload the encrypted file, or if after being asked to run a further r 2 PAKE instances, C was informed that the encryption key has changed, then C can only conclude that its file is sufficiently popular (i.e., has more than the specific random threshold number of owners). Therefore, our protocol satisfies S Compromised Passive Server Next, we consider the case where A has compromised S such that A can gain access to each message received during the protocol as well as each file stored in S. A can thus attempt offline brute-force guessing attacks on predictable files. But all sensitive data (i.e., files, hashes and keys) are encrypted using a semantically secure encryption scheme with keys known only to the participating Cs. The only thing that leaks information is the short hash. We can set its length to make it leak as little information as required. As long as the range of the short hash function is sufficiently smaller than the plaintext space from which a file is drawn, A cannot succeed in a brute-force guessing via a compromised passive S. Therefore, our scheme satisfies S Compromised Active Server Finally, we consider the case where A has compromised S and can make it masqurade as as many Cs as she wants (this is equivalent to A compromising S and the required number of legitimate Cs). Such an adversary can attempt to mount online brute-force guessing attacks on predictable files. As introduced in Section 5.1, we use a per-file rate limiting strategy to prevent such attacks. Recall that RL u (RL c) is the rate limit for uploaders (checkers), n is the

8 Number of Upload Requests length of the short hash and Φ is the space of a predictable file F, then A can uniquely identify F only if Φ 2 n x (RL u + RL L) (1) where x is the number of Cs who potentially own F. As a comparison, DupLESS [3] uses a per-client rate limiting strategy to prevent such online brute-force guessing attacks from one C. Since they need to choose the rate limit so that it does not degrade usability for a C that needs to legitimately upload a large number of files within a brief time interval (such as backing up a local file system), the authors of DupLEss choose the large bound as the total number of requests a single C can make during one week. That is to say, for a predictable file whose size of file space Φ is , a compromised C has to use at least one week to uniquely identify it. But if A has compromised multiple Cs, this time lag will be linearly decreased. Recall that a compromised active S can masquerade as any number of Cs needed. To prevent a compromised active S from masquerading multiple Cs, authors in [25] introduce another T P called identity server. When Cs first join the system, the identity server is responsible for verifying their identity and issuing credentials to them. But an identity server is hard to deploy in real world. As far as we know, our scheme is the first deduplication protocol that satisfies S2.3 without the aid of an identity server. The popularity-contest package on a Debian device regularly reports the list of packages installed on that device. The resulting data consists of a list of debian packages along with the number of devices which reported that package. We took a snapshot of this data on Nov 27, It consists of data collected from Debian users and. From this, we generate our enterprise dataset: it has upload requests (debian package installations) of which are for distinict files (unique debian packages). Figure 6 shows the file popularity distribution (i.e., the number of upload requests for each file) in logarithmic scale for both datasets. We map each dataset to a stream of upload requests by generating the requests in random order. Specifically, for a file that has x copies, we generate x upload requests for it at random time intervals File popularity in media dataset File popularity in enterprise dataset SIMULATION Our approach incurs some extra computation and communication due to the number of PAKE runs whenever a C wants to upload a file. Furthermore, the use of rate limits and randomized thresholds can impact deduplication effectiveness. In this section, we use realistic simulations to study the effect of various parameter choices in our protocol on its computation/communication overhead and deduplication effectiveness. Datasets. We want to consider two types of storage environments. The first consists predominantly of media files, such as audio and video files. We did not have access to a dataset from such an environment. Instead, we use a dataset comprising Android application prevalence data to represent an environment with media files. This is based on the assumption that the popularity of Android applications is likely to be similar to that of media files: both are created and published by a small number of authors (artists/developers), made available on online stores or other distribution sites, and are acquired by consumers either for free or for a fee. We call this the media dataset. We use a publicly available dataset. 1 It consists of data collected from Android devices. For each device, the dataset identifies the set of (anonymized) application identifiers found on that device. We treat each application identifier as a file and presence of a app on a device as an upload request to add the corresponding file to the storage. This dataset has upload requests in total, of which are for distinct files. The second is the type storage environments found in an enterprise backup systems. We use data gathered by Debian Popularity Contest 2 to approximate such an environment File ID Figure 6: File popularity in both datasets. Parameters. To facilitate comparison with DupLESS [3], we set Φ as We then set the length of the short hash n to 13. Inequality 1 in Section 6.3 then implies a lower bound of 100 for (RL u+rl c): with these parameter choices, A can mount a successful online brute-force guessing attack to uniquely identify a predictable file held by only one C. We use these parameter values in our simulations. We measure overhead as the average number of PAKE runs 3, which can be calculated as: µ = T otal number of PAKE runs T otal number of upload requests We measure deduplication effectiveness using deduplication percentage, introduced in Section 2.1. We assume that all files are of equal size so that deduplication percentage ρ is: ρ = 1 Number of all files in storage T otal number of upload requests First we assume that all Cs are online during the simulation, and study the impact of rate limits. Once we choose specific rate limits, we evaluate how absent Cs and the use of randomized thresholds impact deduplication effectiveness. 3 We do not include fake PAKE runs by S (Section 5.2) since we are interested in estimating the average number of real PAKE runs. In a full implementation, an uploader will always run r 1 PAKE runs, where r 1 [1, RL u 1]. (2) (3)

9 Deduplication Percentage % Deduplication Percentage % Average Number of PAKE Runs Rate limiting. Having selected RL u+rl c to be 100, we now see how the average number of PAKE runs and the deduplication effectiveness change when divide this total number in different ways between RL u and RL c. Figure 7 shows the average number of PAKE runs resulting from different values of RL u (and hence RL c) in both datasets. We also ran the simulation without any rate limits, which led to an average of PAKE runs in media dataset and PAKE runs in enterprise dataset. They are significantly larger than the results with rate limiting. Figure 8 shows the deduplication percentage resulting from different rate limit choices in both datasets. We see that setting RL u (RL c) to 30 (70), maximizes the deduplication percentage to values (97.58% and %) close to perfect deduplication (97.59% and %) in both datasets of some Cs being offline may adversely impact deduplication effectiveness. For example, as we described in Sections , if S finds that the upload request from C does not match any online checker available at the time of the upload, it will ask C to upload its encrypted file. S will continue to ask C to run r 2 PAKE instances afterwards. If one of these finds a checker whose existing file F matches C s file but c F never exceeds t F, S can no longer deduplicate C s copy of F. Had the original uploader of F been online at the time C wanted to upload its file, S would have successfully conducated server-side deduplication of F. In other words, the possibility of clients being offline will impact deduplication effectiveness. To estimate this impact, we assign an offline rate to each C as its probability to be offline during one upload request. Using the chosen rate limit values (RL u = 30 and RL c = 70) and fixing the upper limit of the random threshold (i.e., 2 t F 10), we measured the deduplication percentage by varying the offline rate value. The results for both datasets are shown in Figure 9. It shows that the deduplication percentage is still high when the offline rate is as high as 90% Deduplication percentage in media dataset Deduplication percentage in enterprise dataset Average number of PAKE runs in media dataset Average number of PAKE runs in enterprise dataset (90) 20(80) 30(70) 40(60) 50(50) 60(40) 70(30) 80(20) 90(10) RL (RL ) u c Figure 7: Average number of PAKE runs VS. rate limits. We conclude that the use of rate limits not only improves security, but can also reduce the overhead without negatively impacting deduplication effectiveness Deduplication percentage in media dataset Deduplication percentage in enterprise dataset Perfect deduplication in media dataset Perfect deduplication in enterprise dataset (90) 20(80) 30(70) 40(60) 50(50) 60(40) 70(30) 80(20) 90(10) RL (RL ) u c Figure 8: Deduplication percentage VS. rate limits. Randomized threshold and offline rate. The possibility Offline Rate Figure 9: Deduplication percentage VS. offline rates. Evolution of deduplication effectiveness: Deduplication percentage will continue to increase as more files are added to the storage. Figure 10 shows that the deduplication percentage achieved by our scheme meets the realistic expectation (95%) in the very beginning (after receiving upload requests in the media dataset, and upload requests in enterprise dataset). So our system satisfies F1. However, our use of rate limits may result in a somewhat slower increase compared to schemes with perfect deduplication. Figure 11 shows how the difference between perfect deduplication and the deduplication percentage achieved by our scheme evolves as more files are added to the storage. Figure 13 (in Appendix C) shows the secondary deviation. The difference become stable as more files are uploaded to the storage. This phenomenon can be explained by the Zipf s law [29]. As uploaders set a rate limit for their requests and S always selects files in descending order of popularity, our system is similar to web proxy caching, which is a temporary storage for popular web pages. Breslau et al. observe that page

10 Deduplication Percentage % Deviation in Deduplication Percentage % Deviation in Deduplication Percentage % # Deviation from perfect deduplication in media dataset Number of Upload Requests # Deviation from perfect deduplication in enterprise dataset Number of Upload Requests #10 8 Figure 11: Deviation from perfect deduplication VS. Number of upload requests Evolution of deduplication percentage in media dataset Evolution of deduplication percentage in enterprise dataset Number of Upload Requests (x30 in enterprise dataset) #10 6 Figure 10: Deduplication percentage VS. Number of upload requests. requests distribution follows the Zipf s law, where the frequency of a request for the mth most popular page can be calculated as, where N is the size of the cache, 1/m α Ni=1 (1/i α ) and α is the value of the exponent characterising the distribution [8]. As a result, most of the requested pages can be found in the cache. In our case, most of the upload requests can find a matched file within the rate limit if the file has been uploaded already. 8. PERFORMANCE EVALUATION 8.1 Prototype implementation Our prototype consists of two parts: (1) a server program which simulates S and (2) a client program which simulates C (performing file uploading/downloading, encryption/decryption, and assissting S in deduplication). We use Node.js 4 for the implementation of both parts and Redis 5 for the implementation of S s data structure. Furthermore, we use SHA-256 as the cryptographic hash functions and AES with 256-bit keys as the symmetric encryption scheme, both of which are provided by the Crypto module in Node.js. We truncate the cryptographic hash to get the short hash. We use the GNU multiple precision arithmetic library 6 to implement the public key operations. 8.2 Test setting and methodology We run the server side program on an Amazon EC2 instance (Intel Xeon with 1 CPU at 2.5 GHz) 7. and the client side program on an Intel Core i7 machine with GHz cores. We measure the running time using the Date module in javascript and measure the bandwidth usage using TCPdump. As the downloading phase in our protocol is simply downloading an encrypted file, we only consider the uploading phase. We set 13 for the length of short hash and 30 for RL u. We consider the worst case where we let the uploader run PAKE with 30 checkers. So we simulate the uploading phase in our protocol as: 1. An uploader C sends the short hash of the file it wants to upload to server S; 2. S sends requests to 30 checkers C i and lets them run PAKE with C; 3. S waits to get responses of all instances back from C and {C i}; 4. S chooses one instance and sends the result to C 5. C uses the resulting key to encrypt its file with AES and uploads it to S. We measure both the running time and bandwidth usage during the whole procedure above. We compare the results

11 to two baselines: (1) simply uploading a file without encryption and (2) uploading a file with AES encryption. As in [3], we repeat our experiment using files of size 2 2i KB for i {0, 1,..., 8}, which provides a file size range of 1KB to 64 MB. For each file, we upload it 100 times and calculate the mean values. For files that are larger than the computer buffer, we do loading, encryption and uploading at the same time by piping the data stream. As a result, uploading encrypted files uses almost the same amount of time with uploading plain files. 8.3 Results Figure 12 reports the uploading time and bandwidth usage in our protocol compared to two baselines. For files that are smaller than 1 MB, the extra overhead introduced by our deduplication protocol is relatively high. For example, it takes 587 ms (2 436 bytes bandwidth) to upload a 1 KB plain file and 594 ms (2 585 bytes) to upload a 1 KB encrypted file, while it takes ms ( bytes) to upload the same file in our protocol. But the extra overhead introduced by our protocol is independent of the file size, and it become negligible when the file is large enough. So our system meets F2. 9. DISCUSSION Incentives: In our scheme Cs have to run several PAKE instances as both uploaders and checkers. This imposes a cost on each C. S is the direct beneficiary of the protocol. Cs may indirectly benefit from deduplication in that the ability to effectively deduplicate makes the storage system more efficient and can thus potentially lower the cost of using the storage incurred by each C. Nevertheless, it is desirable to have more direct incentive mechanisms to encourage Cs to do do PAKE checks. For example, if a file uploaded by C is found to be shared by other Cs, S could reward the Cs owning that file by giving them small increases in their respective storage quotas. Deduplication effectiveness: We can improve deduplication effectiveness by introducing additional checks. For example, an uploader can indicate the (approximate) file size (which will be revealed any way) so that S can limit the selection of checkers to those whose files are of a similar size. Similarly, S can keep track of similarities between Cs based on the number of files they share and use this information while selecting checkers by prioritizing checkers who are similar to the uploader. Nevertheless, as discussed in Section 7, our schemes achieve surpass what is considered as realistic levels of deduplication in the very beginning. Whitehouse [27] reported that when selecting a deduplication scheme, enterprise administraotrs rated considerations such as ease of deployment and use being more important as deduplication ratio. Therefore, we argue that the small sacrifice in deduplication ratio is offset by the significant advantage of ensuring user privacy without having to use independent third party servers. Block-level deduplication: Our scheme can be applied to both file-level and block-level deduplication. But block-level deduplication will incur more overhead. 10. RELATED WORK One technique proposed to enable deduplication with encrypted data is convergent encryption (CE) [12], which derives the file encryption key deterministically from the file contents. As a result, the same file will produce the same ciphertext, so that S can deduplicate ciphertexts. But if A compromises S, it can easily perform an offline brute-force guessing attack. Bellare et al. recently proposed messagelocked encryption (MLE), which uses a semantically-secure encryption scheme but produces a tag in a deterministic way [4]. So it still suffers from the same attack. Puzio et al. propose a block-level deduplication scheme that uses a third party T P [23]. C first encrypts each block with CE and then sends the ciphertexts to T P, who adds an additional layer of encryption to the block. During file retrieval, blocks are decrypted by T P first and sent back to C. If A compromises both S and Cs, she can perform an online brute-force guessing attack by letting multiple Cs upload all of possible files and monitoring S s storage. Stanek et al. propose a scheme that only deduplicates popular files [25]. Cs encrypt their files with two layers of encryption: the inner layer is obtained through CE and the outer layer is obtained through a semantically secure threshold encryption scheme. S can decrypt the outer layer of a file iff the number of Cs who have uploaded this file reaches the threshold. They introduce a T P as an index server to keep track of the popularity of each file, but the index server has to know the content or hash of each file. In addition, they introduce another T P as an identity server to prevent online brute-force guessing attacks by multiple Cs. Both of these solutions [23, 25] are vulnerable to offline brute-force guessing attacks if A compromises the T Ps. Bellare et al. propose DupLESS that uses an additional secret in the key generation process of CE [3]. This secret is identical for all Cs and is provided by a T P. To prevent T P from conducting offline brute-force guessing attacks, Cs run an oblivious PRF protocol with T P to generate the key which guarantees that Cs who input the same file will get the same encryption key without revealing their files. But if A compromises both S and T P, S can get the secret from T P and the scheme will be reduced to normal CE. Duan proposes a distributed architecture that generates the file encryption key by a threshold number of all Cs [13]. But this scheme is only suitable for peer-to-peer scemarios: a threshold number of Cs must be online and interact with one another to generate the key. A malicious S can easily generates more than a threshold number of Cs. In Table 1, we summarize the resilience of these schemes with respect to the design goals from Section 3.3. Threat Schemes [12], [4] [23] [25] [3] [13] Our work Compromised C S (pas.) S (act.), Cs T Ps S, T Ps X X X X X X X X X X Table 1: Resilience of deduplication schemes. Acknowledgments This work was supported in part by the Academy of Finland project Cloud Security Services (283135).

12 Time Usage (Millisecond) Bandwidth Usage (Byte) Our protocol Upload encrypted files Upload plain files File Size (KB) Our protocol Upload encrypted files Upload plain files File Size (KB) Figure 12: Time (left) and bandwidth usage (right) VS. file size. 11. REFERENCES [1] M. Abdalla and D. Pointcheval. Simple password-based encrypted key exchange protocols. In A. Menezes, editor, Topics in Cryptology - CT-RSA 2005, The Cryptographers Track at the RSA Conference 2005, San Francisco, CA, USA, February 14-18, 2005, Proceedings, volume 3376 of Lecture Notes in Computer Science, pages Springer, [2] B. Barak, R. Canetti, Y. Lindell, R. Pass, and T. Rabin. Secure computation without authentication. In V. Shoup, editor, Advances in Cryptology - CRYPTO 2005, volume 3621 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, [3] M. Bellare, S. Keelveedhi, and T. Ristenpart. Dupless: Server-aided encryption for deduplicated storage. In Proceedings of the 22Nd USENIX Conference on Security, SEC 13, pages , Berkeley, CA, USA, USENIX Association. [4] M. Bellare, S. Keelveedhi, and T. Ristenpart. Message-locked encryption and secure deduplication. In T. Johansson and P. Nguyen, editors, Advances in Cryptology - EUROCRYPT 2013, volume 7881 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, [5] M. Bellare, D. Pointcheval, and P. Rogaway. Authenticated key exchange secure against dictionary attacks. In Preneel [22], pages [6] S. Bellovin and M. Merritt. Encrypted key exchange: password-based protocols secure against dictionary attacks. In Research in Security and Privacy, Proceedings., 1992 IEEE Computer Society Symposium on, pages 72 84, May [7] V. Boyko, P. D. MacKenzie, and S. Patel. Provably secure password-authenticated key exchange using diffie-hellman. In Preneel [22], pages [8] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: evidence and implications. In INFOCOM 99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, volume 1, pages vol.1, Mar [9] R. Canetti, D. Dachman-Soled, V. Vaikuntanathan, and H. Wee. Efficient password authenticated key exchange via oblivious transfer. In M. Fischlin, J. Buchmann, and M. Manulis, editors, Public Key Cryptography - PKC 2012, volume 7293 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, [10] R. Canetti, S. Halevi, J. Katz, Y. Lindell, and P. D. MacKenzie. Universally composable password-based key exchange. In R. Cramer, editor, Advances in Cryptology - EUROCRYPT 2005, 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Aarhus, Denmark, May 22-26, 2005, Proceedings, volume 3494 of Lecture Notes in Computer Science, pages Springer, [11] W. Diffie and M. E. Hellman. New directions in cryptography. Information Theory, IEEE Transactions on, 22(6): , [12] J. Douceur, A. Adya, W. Bolosky, P. Simon, and M. Theimer. Reclaiming space from duplicate files in a serverless distributed file system. In Distributed Computing Systems, Proceedings. 22nd International Conference on, pages , [13] Y. Duan. Distributed key generation for encrypted deduplication: Achieving the strongest privacy. In Proceedings of the 6th Edition of the ACM Workshop on Cloud Computing Security, CCSW 14, pages 57 68, New York, NY, USA, ACM. [14] M. Dutch. Understanding data deduplication ratios. SNIA Data Management Forum, /l3pm284d8r1s.pdf. [15] T. ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms. In G. Blakley and D. Chaum, editors, Advances in Cryptology, volume 196 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, [16] O. Goldreich and Y. Lindell. Session-key generation using human passwords only. In J. Kilian, editor, Advances in Cryptology - CRYPTO 2001, volume 2139 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, [17] D. Harnik, B. Pinkas, and A. Shulman-Peleg. Side channels in cloud services: Deduplication in cloud storage. Security Privacy, IEEE, 8(6):40 47, Nov [18] I. Jeong, J. Katz, and D. Lee. One-round protocols for two-party authenticated key exchange. In

13 M. Jakobsson, M. Yung, and J. Zhou, editors, Applied Cryptography and Network Security, volume 3089 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, [19] J. Katz and V. Vaikuntanathan. Round-optimal password-based authenticated key exchange. Journal of Cryptology, 26(4): , [20] D. T. Meyer and W. J. Bolosky. A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies, FAST 11, pages 1 1, Berkeley, CA, USA, USENIX Association. [21] P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. In J. Stern, editor, Advances in Cryptology Ů EUROCRYPT Š99, volume 1592 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, [22] B. Preneel, editor. Advances in Cryptology - EUROCRYPT 2000, International Conference on the Theory and Application of Cryptographic Techniques, Bruges, Belgium, May 14-18, 2000, Proceeding, volume 1807 of Lecture Notes in Computer Science. Springer, [23] P. Puzio, R. Molva, M. Önen, and S. Loureiro. Cloudedup: Secure deduplication with encrypted data for cloud storage. In Proceedings of the 2013 IEEE International Conference on Cloud Computing Technology and Science - Volume 01, CLOUDCOM 13, pages , Washington, DC, USA, IEEE Computer Society. [24] S. Quinlan and S. Dorward. Venti: A new approach to archival storage. In Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 02, pages 7 7, Berkeley, CA, USA, USENIX Association. [25] J. Stanek, A. Sorniotti, E. Androulaki, and L. Kencl. A secure data deduplication scheme for cloud storage. In N. Christin and R. Safavi-Naini, editors, Financial Cryptography and Data Security, Lecture Notes in Computer Science, pages Springer Berlin Heidelberg, [26] J. M. Wendt. Getting Real About Deduplication Ratios. getting-real-about-deduplication.html, [27] L. Whitehouse. Understanding data deduplication ratios in backup systems. TechTarget article, May Understanding-data-deduplication-ratios-in\ -backup-systems. [28] A. C.-C. Yao. How to generate and exchange secrets. In Foundations of Computer Science, 1986., 27th Annual Symposium on, pages , Oct [29] G. K. Zipf. Relative frequency as a determinant of phonetic change. Harvard studies in classical philology, pages 1 95, Notation Description Entities C Client S Server A Adversary T P Third Party Cryptographic Notations E() Symmetric key encryption D() Symmetric key decryption k Symmetric encryption/decryption key Enc() Additively homomorphic encryption Dec() Additively homomorphic decryption Additively homomorphic addition Additively homomorphic subtraction H() Cryptographic hash function h Cryptographic hash SH() Short hash function sh Short hash PAKE Password Authenticated Key Exchange F pake Ideal functionality of PAKE F dedup Ideal functionality of deduplication protocol Parameters F File Φ File space for a predictable file n Length of the short hash RL u Rate limit by uploaders RL c Rate limit by checkers t F Random threshold for a file Counter for a file c F Table 2: Summary of notations Appendix A: Notation Table A table of notations is shown in Table 2. Appendix B: Secondary deviation from perfect deduplication

14 Secondary Deviation in Deduplication Percentage % Secondary Deviation in Deduplication Percentage % 18 # Secondary deviation from perfect deduplication in media dataset 14 # Secondary deviation from perfect deduplication in enterprise dataset Number of Upload Requests # Number of Upload Requests #10 8 Figure 13: Secondary deviation from perfect deduplication VS. Number of upload requests.

Secure Deduplication of Encrypted Data without Additional Independent Servers

Secure Deduplication of Encrypted Data without Additional Independent Servers Secure Deduplication of Encrypted Data without Additional Independent Servers Jian Liu Aalto University jian.liu@aalto.fi N. Asokan Aalto University and University of Helsinki asokan@acm.org Benny Pinkas

More information

Security of Cloud Storage: - Deduplication vs. Privacy

Security of Cloud Storage: - Deduplication vs. Privacy Security of Cloud Storage: - Deduplication vs. Privacy Benny Pinkas - Bar Ilan University Shai Halevi, Danny Harnik, Alexandra Shulman-Peleg - IBM Research Haifa 1 Remote storage and security Easy to encrypt

More information

Side channels in cloud services, the case of deduplication in cloud storage

Side channels in cloud services, the case of deduplication in cloud storage Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik, Benny Pinkas, Alexandra Shulman-Peleg Presented by Yair Yona Yair Yona (TAU) Side channels in cloud services Advanced

More information

Lecture 10: CPA Encryption, MACs, Hash Functions. 2 Recap of last lecture - PRGs for one time pads

Lecture 10: CPA Encryption, MACs, Hash Functions. 2 Recap of last lecture - PRGs for one time pads CS 7880 Graduate Cryptography October 15, 2015 Lecture 10: CPA Encryption, MACs, Hash Functions Lecturer: Daniel Wichs Scribe: Matthew Dippel 1 Topic Covered Chosen plaintext attack model of security MACs

More information

Capture Resilient ElGamal Signature Protocols

Capture Resilient ElGamal Signature Protocols Capture Resilient ElGamal Signature Protocols Hüseyin Acan 1, Kamer Kaya 2,, and Ali Aydın Selçuk 2 1 Bilkent University, Department of Mathematics acan@fen.bilkent.edu.tr 2 Bilkent University, Department

More information

1 Construction of CCA-secure encryption

1 Construction of CCA-secure encryption CSCI 5440: Cryptography Lecture 5 The Chinese University of Hong Kong 10 October 2012 1 Construction of -secure encryption We now show how the MAC can be applied to obtain a -secure encryption scheme.

More information

1 Digital Signatures. 1.1 The RSA Function: The eth Power Map on Z n. Crypto: Primitives and Protocols Lecture 6.

1 Digital Signatures. 1.1 The RSA Function: The eth Power Map on Z n. Crypto: Primitives and Protocols Lecture 6. 1 Digital Signatures A digital signature is a fundamental cryptographic primitive, technologically equivalent to a handwritten signature. In many applications, digital signatures are used as building blocks

More information

Secure Computation Without Authentication

Secure Computation Without Authentication Secure Computation Without Authentication Boaz Barak 1, Ran Canetti 2, Yehuda Lindell 3, Rafael Pass 4, and Tal Rabin 2 1 IAS. E:mail: boaz@ias.edu 2 IBM Research. E-mail: {canetti,talr}@watson.ibm.com

More information

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,rubing@bellcore.com Abstract In this paper, we investigate a method by which smart

More information

Overview of Public-Key Cryptography

Overview of Public-Key Cryptography CS 361S Overview of Public-Key Cryptography Vitaly Shmatikov slide 1 Reading Assignment Kaufman 6.1-6 slide 2 Public-Key Cryptography public key public key? private key Alice Bob Given: Everybody knows

More information

Lecture 9 - Message Authentication Codes

Lecture 9 - Message Authentication Codes Lecture 9 - Message Authentication Codes Boaz Barak March 1, 2010 Reading: Boneh-Shoup chapter 6, Sections 9.1 9.3. Data integrity Until now we ve only been interested in protecting secrecy of data. However,

More information

On the Limits of Anonymous Password Authentication

On the Limits of Anonymous Password Authentication On the Limits of Anonymous Password Authentication Yan-Jiang Yang a Jian Weng b Feng Bao a a Institute for Infocomm Research, Singapore, Email: {yyang,baofeng}@i2r.a-star.edu.sg. b School of Computer Science,

More information

ISSN: 2321-7782 (Online) Volume 2, Issue 1, January 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 2, Issue 1, January 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 1, January 2014 International Journal of Advance Research in Computer Science and Management Studies Research Paper Available online at: www.ijarcsms.com New Challenges

More information

Overview of Cryptographic Tools for Data Security. Murat Kantarcioglu

Overview of Cryptographic Tools for Data Security. Murat Kantarcioglu UT DALLAS Erik Jonsson School of Engineering & Computer Science Overview of Cryptographic Tools for Data Security Murat Kantarcioglu Pag. 1 Purdue University Cryptographic Primitives We will discuss the

More information

Introduction. Digital Signature

Introduction. Digital Signature Introduction Electronic transactions and activities taken place over Internet need to be protected against all kinds of interference, accidental or malicious. The general task of the information technology

More information

1 Signatures vs. MACs

1 Signatures vs. MACs CS 120/ E-177: Introduction to Cryptography Salil Vadhan and Alon Rosen Nov. 22, 2006 Lecture Notes 17: Digital Signatures Recommended Reading. Katz-Lindell 10 1 Signatures vs. MACs Digital signatures

More information

Secure Computation Martin Beck

Secure Computation Martin Beck Institute of Systems Architecture, Chair of Privacy and Data Security Secure Computation Martin Beck Dresden, 05.02.2015 Index Homomorphic Encryption The Cloud problem (overview & example) System properties

More information

Computing on Encrypted Data

Computing on Encrypted Data Computing on Encrypted Data Secure Internet of Things Seminar David Wu January, 2015 Smart Homes New Applications in the Internet of Things aggregation + analytics usage statistics and reports report energy

More information

1 Message Authentication

1 Message Authentication Theoretical Foundations of Cryptography Lecture Georgia Tech, Spring 200 Message Authentication Message Authentication Instructor: Chris Peikert Scribe: Daniel Dadush We start with some simple questions

More information

CIS 6930 Emerging Topics in Network Security. Topic 2. Network Security Primitives

CIS 6930 Emerging Topics in Network Security. Topic 2. Network Security Primitives CIS 6930 Emerging Topics in Network Security Topic 2. Network Security Primitives 1 Outline Absolute basics Encryption/Decryption; Digital signatures; D-H key exchange; Hash functions; Application of hash

More information

Side channels in cloud services, the case of deduplication in cloud storage

Side channels in cloud services, the case of deduplication in cloud storage Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik IBM Haifa Research Lab dannyh@il.ibm.com Benny Pinkas Bar Ilan University benny@pinkas.net Alexandra Shulman-Peleg

More information

3-6 Toward Realizing Privacy-Preserving IP-Traceback

3-6 Toward Realizing Privacy-Preserving IP-Traceback 3-6 Toward Realizing Privacy-Preserving IP-Traceback The IP-traceback technology enables us to trace widely spread illegal users on Internet. However, to deploy this attractive technology, some problems

More information

KEY DISTRIBUTION: PKI and SESSION-KEY EXCHANGE. Mihir Bellare UCSD 1

KEY DISTRIBUTION: PKI and SESSION-KEY EXCHANGE. Mihir Bellare UCSD 1 KEY DISTRIBUTION: PKI and SESSION-KEY EXCHANGE Mihir Bellare UCSD 1 The public key setting Alice M D sk[a] (C) Bob pk[a] C C $ E pk[a] (M) σ $ S sk[a] (M) M, σ Vpk[A] (M, σ) Bob can: send encrypted data

More information

Client Server Registration Protocol

Client Server Registration Protocol Client Server Registration Protocol The Client-Server protocol involves these following steps: 1. Login 2. Discovery phase User (Alice or Bob) has K s Server (S) has hash[pw A ].The passwords hashes are

More information

Provable-Security Analysis of Authenticated Encryption in Kerberos

Provable-Security Analysis of Authenticated Encryption in Kerberos Provable-Security Analysis of Authenticated Encryption in Kerberos Alexandra Boldyreva Virendra Kumar Georgia Institute of Technology, School of Computer Science 266 Ferst Drive, Atlanta, GA 30332-0765

More information

CS 758: Cryptography / Network Security

CS 758: Cryptography / Network Security CS 758: Cryptography / Network Security offered in the Fall Semester, 2003, by Doug Stinson my office: DC 3122 my email address: dstinson@uwaterloo.ca my web page: http://cacr.math.uwaterloo.ca/~dstinson/index.html

More information

Designing a Secure Client-Server System Master of Science Thesis in the Programme Software Engineering & Technology

Designing a Secure Client-Server System Master of Science Thesis in the Programme Software Engineering & Technology Designing a Secure Client-Server System Master of Science Thesis in the Programme Software Engineering & Technology FREDRIK ANDERSSON Department of Computer Science and Engineering CHALMERS UNIVERSITY

More information

Lecture 3: One-Way Encryption, RSA Example

Lecture 3: One-Way Encryption, RSA Example ICS 180: Introduction to Cryptography April 13, 2004 Lecturer: Stanislaw Jarecki Lecture 3: One-Way Encryption, RSA Example 1 LECTURE SUMMARY We look at a different security property one might require

More information

Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing

Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing Jan Camenisch IBM Research Zurich jca@zurich.ibm.com Anna Lysyanskaya Brown University anna@cs.brown.edu Gregory Neven

More information

Deduplication as security issue in cloud services, and its representation in Terms of Service Agreements

Deduplication as security issue in cloud services, and its representation in Terms of Service Agreements Deduplication as security issue in cloud services, and its representation in Terms of Service Agreements Cecilia Wirfelt Louise Wallin Email: {cecwi155, louwa538}@student.liu.se Supervisor: Jan-Åke Larsson,

More information

How To Ensure Correctness Of Data In The Cloud

How To Ensure Correctness Of Data In The Cloud Ensuring Data Storage Security in Cloud Computing ABSTRACT Cloud computing has been envisioned as the next-generation architecture of IT enterprise. In contrast to traditional solutions, where the IT services

More information

Analysis of Privacy-Preserving Element Reduction of Multiset

Analysis of Privacy-Preserving Element Reduction of Multiset Analysis of Privacy-Preserving Element Reduction of Multiset Jae Hong Seo 1, HyoJin Yoon 2, Seongan Lim 3, Jung Hee Cheon 4 and Dowon Hong 5 1,4 Department of Mathematical Sciences and ISaC-RIM, Seoul

More information

An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining

An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining 1 B.Sahaya Emelda and 2 Mrs. P. Maria Jesi M.E.,Ph.D., 1 PG Student and 2 Associate Professor, Department of Computer

More information

Outline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures

Outline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures Outline Computer Science 418 Digital Signatures Mike Jacobson Department of Computer Science University of Calgary Week 12 1 Digital Signatures 2 Signatures via Public Key Cryptosystems 3 Provable 4 Mike

More information

Verifiable Delegation of Computation over Large Datasets

Verifiable Delegation of Computation over Large Datasets Verifiable Delegation of Computation over Large Datasets Siavosh Benabbas University of Toronto Rosario Gennaro IBM Research Yevgeniy Vahlis AT&T Cloud Computing Data D Code F Y F(D) Cloud could be malicious

More information

Network Security. Computer Networking Lecture 08. March 19, 2012. HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23

Network Security. Computer Networking Lecture 08. March 19, 2012. HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23 Network Security Computer Networking Lecture 08 HKU SPACE Community College March 19, 2012 HKU SPACE CC CN Lecture 08 1/23 Outline Introduction Cryptography Algorithms Secret Key Algorithm Message Digest

More information

CPSC 467b: Cryptography and Computer Security

CPSC 467b: Cryptography and Computer Security CPSC 467b: Cryptography and Computer Security Michael J. Fischer Lecture 1 January 9, 2012 CPSC 467b, Lecture 1 1/22 Course Overview Symmetric Cryptography CPSC 467b, Lecture 1 2/22 Course Overview CPSC

More information

Message Authentication Code

Message Authentication Code Message Authentication Code Ali El Kaafarani Mathematical Institute Oxford University 1 of 44 Outline 1 CBC-MAC 2 Authenticated Encryption 3 Padding Oracle Attacks 4 Information Theoretic MACs 2 of 44

More information

CUNSHENG DING HKUST, Hong Kong. Computer Security. Computer Security. Cunsheng DING, HKUST COMP4631

CUNSHENG DING HKUST, Hong Kong. Computer Security. Computer Security. Cunsheng DING, HKUST COMP4631 Cunsheng DING, HKUST Lecture 08: Key Management for One-key Ciphers Topics of this Lecture 1. The generation and distribution of secret keys. 2. A key distribution protocol with a key distribution center.

More information

Chapter 2 TSAS: Third-Party Storage Auditing Service

Chapter 2 TSAS: Third-Party Storage Auditing Service Chapter 2 TSAS: Third-Party Storage Auditing Service Abstract In cloud storage systems, data owners host their data on cloud servers and users (data consumers) can access the data from cloud servers Due

More information

Two Factor Zero Knowledge Proof Authentication System

Two Factor Zero Knowledge Proof Authentication System Two Factor Zero Knowledge Proof Authentication System Quan Nguyen Mikhail Rudoy Arjun Srinivasan 6.857 Spring 2014 Project Abstract It is often necessary to log onto a website or other system from an untrusted

More information

Secure cloud access system using JAR ABSTRACT:

Secure cloud access system using JAR ABSTRACT: Secure cloud access system using JAR ABSTRACT: Cloud computing enables highly scalable services to be easily consumed over the Internet on an as-needed basis. A major feature of the cloud services is that

More information

Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing

Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing Jan Camenisch (IBM Research Zurich) Anna Lysyanskaya (Brown University) Gregory Neven (IBM Research Zurich) Password

More information

Cryptographic hash functions and MACs Solved Exercises for Cryptographic Hash Functions and MACs

Cryptographic hash functions and MACs Solved Exercises for Cryptographic Hash Functions and MACs Cryptographic hash functions and MACs Solved Exercises for Cryptographic Hash Functions and MACs Enes Pasalic University of Primorska Koper, 2014 Contents 1 Preface 3 2 Problems 4 2 1 Preface This is a

More information

A Method for Making Password-Based Key Exchange Resilient to Server Compromise

A Method for Making Password-Based Key Exchange Resilient to Server Compromise A Method for Making Password-Based Key Exchange Resilient to Server Compromise Craig Gentry 1, Philip MacKenzie 2, and Zulfikar Ramzan 3 1 Stanford University, Palo Alto, CA, USA, cgentry@cs.stanford.edu

More information

Proof of Ownership in Remote Storage Systems

Proof of Ownership in Remote Storage Systems Proof of Ownership in Remote Storage Systems S. Halevi, D. Harnik, B. Pinkas and A. Shulman-Peleg Summarized by Eli Haim Advanced Topics in Storage Systems School of Engineering - EE Tel Aviv University

More information

Lecture 6 - Cryptography

Lecture 6 - Cryptography Lecture 6 - Cryptography CSE497b - Spring 2007 Introduction Computer and Network Security Professor Jaeger www.cse.psu.edu/~tjaeger/cse497b-s07 Question 2 Setup: Assume you and I don t know anything about

More information

TELE 301 Network Management. Lecture 18: Network Security

TELE 301 Network Management. Lecture 18: Network Security TELE 301 Network Management Lecture 18: Network Security Haibo Zhang Computer Science, University of Otago TELE301 Lecture 18: Network Security 1 Security of Networks Security is something that is not

More information

Chapter 16: Authentication in Distributed System

Chapter 16: Authentication in Distributed System Chapter 16: Authentication in Distributed System Ajay Kshemkalyani and Mukesh Singhal Distributed Computing: Principles, Algorithms, and Systems Cambridge University Press A. Kshemkalyani and M. Singhal

More information

QUANTUM COMPUTERS AND CRYPTOGRAPHY. Mark Zhandry Stanford University

QUANTUM COMPUTERS AND CRYPTOGRAPHY. Mark Zhandry Stanford University QUANTUM COMPUTERS AND CRYPTOGRAPHY Mark Zhandry Stanford University Classical Encryption pk m c = E(pk,m) sk m = D(sk,c) m??? Quantum Computing Attack pk m aka Post-quantum Crypto c = E(pk,m) sk m = D(sk,c)

More information

CIS 5371 Cryptography. 8. Encryption --

CIS 5371 Cryptography. 8. Encryption -- CIS 5371 Cryptography p y 8. Encryption -- Asymmetric Techniques Textbook encryption algorithms In this chapter, security (confidentiality) is considered in the following sense: All-or-nothing secrecy.

More information

The Misuse of RC4 in Microsoft Word and Excel

The Misuse of RC4 in Microsoft Word and Excel The Misuse of RC4 in Microsoft Word and Excel Hongjun Wu Institute for Infocomm Research, Singapore hongjun@i2r.a-star.edu.sg Abstract. In this report, we point out a serious security flaw in Microsoft

More information

A Server and Browser-Transparent CSRF Defense for Web 2.0 Applications. Slides by Connor Schnaith

A Server and Browser-Transparent CSRF Defense for Web 2.0 Applications. Slides by Connor Schnaith A Server and Browser-Transparent CSRF Defense for Web 2.0 Applications Slides by Connor Schnaith Cross-Site Request Forgery One-click attack, session riding Recorded since 2001 Fourth out of top 25 most

More information

Breaking Generalized Diffie-Hellman Modulo a Composite is no Easier than Factoring

Breaking Generalized Diffie-Hellman Modulo a Composite is no Easier than Factoring Breaking Generalized Diffie-Hellman Modulo a Composite is no Easier than Factoring Eli Biham Dan Boneh Omer Reingold Abstract The Diffie-Hellman key-exchange protocol may naturally be extended to k > 2

More information

Authentication Types. Password-based Authentication. Off-Line Password Guessing

Authentication Types. Password-based Authentication. Off-Line Password Guessing Authentication Types Chapter 2: Security Techniques Background Secret Key Cryptography Public Key Cryptography Hash Functions Authentication Chapter 3: Security on Network and Transport Layer Chapter 4:

More information

Security Aspects of. Database Outsourcing. Vahid Khodabakhshi Hadi Halvachi. Dec, 2012

Security Aspects of. Database Outsourcing. Vahid Khodabakhshi Hadi Halvachi. Dec, 2012 Security Aspects of Database Outsourcing Dec, 2012 Vahid Khodabakhshi Hadi Halvachi Security Aspects of Database Outsourcing Security Aspects of Database Outsourcing 2 Outline Introduction to Database

More information

Ch.9 Cryptography. The Graduate Center, CUNY.! CSc 75010 Theoretical Computer Science Konstantinos Vamvourellis

Ch.9 Cryptography. The Graduate Center, CUNY.! CSc 75010 Theoretical Computer Science Konstantinos Vamvourellis Ch.9 Cryptography The Graduate Center, CUNY! CSc 75010 Theoretical Computer Science Konstantinos Vamvourellis Why is Modern Cryptography part of a Complexity course? Short answer:! Because Modern Cryptography

More information

Blazing Fast 2PC in the Offline/Online Setting with Security for Malicious Adversaries

Blazing Fast 2PC in the Offline/Online Setting with Security for Malicious Adversaries Blazing Fast 2PC in the Offline/Online Setting with Security for Malicious Adversaries Yehuda Lindell Ben Riva October 12, 2015 Abstract Recently, several new techniques were presented to dramatically

More information

Privacy-Preserving Aggregation of Time-Series Data

Privacy-Preserving Aggregation of Time-Series Data Privacy-Preserving Aggregation of Time-Series Data Elaine Shi PARC/UC Berkeley elaines@eecs.berkeley.edu Richard Chow PARC rchow@parc.com T-H. Hubert Chan The University of Hong Kong hubert@cs.hku.hk Dawn

More information

Verifiable Outsourced Computations Outsourcing Computations to Untrusted Servers

Verifiable Outsourced Computations Outsourcing Computations to Untrusted Servers Outsourcing Computations to Untrusted Servers Security of Symmetric Ciphers in Network Protocols ICMS, May 26, 2015, Edinburgh Problem Motivation Problem Motivation Problem Motivation Problem Motivation

More information

Associate Prof. Dr. Victor Onomza Waziri

Associate Prof. Dr. Victor Onomza Waziri BIG DATA ANALYTICS AND DATA SECURITY IN THE CLOUD VIA FULLY HOMOMORPHIC ENCRYPTION Associate Prof. Dr. Victor Onomza Waziri Department of Cyber Security Science, School of ICT, Federal University of Technology,

More information

Secure Remote Password (SRP) Authentication

Secure Remote Password (SRP) Authentication Secure Remote Password (SRP) Authentication Tom Wu Stanford University tjw@cs.stanford.edu Authentication in General What you are Fingerprints, retinal scans, voiceprints What you have Token cards, smart

More information

The Order of Encryption and Authentication for Protecting Communications (Or: How Secure is SSL?)

The Order of Encryption and Authentication for Protecting Communications (Or: How Secure is SSL?) The Order of Encryption and Authentication for Protecting Communications (Or: How Secure is SSL?) Hugo Krawczyk Abstract. We study the question of how to generically compose symmetric encryption and authentication

More information

Computational Soundness of Symbolic Security and Implicit Complexity

Computational Soundness of Symbolic Security and Implicit Complexity Computational Soundness of Symbolic Security and Implicit Complexity Bruce Kapron Computer Science Department University of Victoria Victoria, British Columbia NII Shonan Meeting, November 3-7, 2013 Overview

More information

MESSAGE AUTHENTICATION IN AN IDENTITY-BASED ENCRYPTION SCHEME: 1-KEY-ENCRYPT-THEN-MAC

MESSAGE AUTHENTICATION IN AN IDENTITY-BASED ENCRYPTION SCHEME: 1-KEY-ENCRYPT-THEN-MAC MESSAGE AUTHENTICATION IN AN IDENTITY-BASED ENCRYPTION SCHEME: 1-KEY-ENCRYPT-THEN-MAC by Brittanney Jaclyn Amento A Thesis Submitted to the Faculty of The Charles E. Schmidt College of Science in Partial

More information

Advanced Cryptography

Advanced Cryptography Family Name:... First Name:... Section:... Advanced Cryptography Final Exam July 18 th, 2006 Start at 9:15, End at 12:00 This document consists of 12 pages. Instructions Electronic devices are not allowed.

More information

NETWORK SECURITY: How do servers store passwords?

NETWORK SECURITY: How do servers store passwords? NETWORK SECURITY: How do servers store passwords? Servers avoid storing the passwords in plaintext on their servers to avoid possible intruders to gain all their users passwords. A hash of each password

More information

Simulation-Based Security with Inexhaustible Interactive Turing Machines

Simulation-Based Security with Inexhaustible Interactive Turing Machines Simulation-Based Security with Inexhaustible Interactive Turing Machines Ralf Küsters Institut für Informatik Christian-Albrechts-Universität zu Kiel 24098 Kiel, Germany kuesters@ti.informatik.uni-kiel.de

More information

2.4: Authentication Authentication types Authentication schemes: RSA, Lamport s Hash Mutual Authentication Session Keys Trusted Intermediaries

2.4: Authentication Authentication types Authentication schemes: RSA, Lamport s Hash Mutual Authentication Session Keys Trusted Intermediaries Chapter 2: Security Techniques Background Secret Key Cryptography Public Key Cryptography Hash Functions Authentication Chapter 3: Security on Network and Transport Layer Chapter 4: Security on the Application

More information

Privacy, Security and Cloud

Privacy, Security and Cloud Privacy, Security and Cloud Giuseppe Di Luna July 2, 2012 Giuseppe Di Luna 2012 1 July 2, 2012 Giuseppe Di Luna 2012 2 July 2, 2012 Giuseppe Di Luna 2012 3 Security Concerns: Data leakage Data handling

More information

Mitigating Server Breaches with Secure Computation. Yehuda Lindell Bar-Ilan University and Dyadic Security

Mitigating Server Breaches with Secure Computation. Yehuda Lindell Bar-Ilan University and Dyadic Security Mitigating Server Breaches with Secure Computation Yehuda Lindell Bar-Ilan University and Dyadic Security The Problem Network and server breaches have become ubiquitous Financially-motivated and state-sponsored

More information

MANAGING OF AUTHENTICATING PASSWORD BY MEANS OF NUMEROUS SERVERS

MANAGING OF AUTHENTICATING PASSWORD BY MEANS OF NUMEROUS SERVERS INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE MANAGING OF AUTHENTICATING PASSWORD BY MEANS OF NUMEROUS SERVERS Kanchupati Kondaiah 1, B.Sudhakar 2 1 M.Tech Student, Dept of CSE,

More information

Authentication Protocols Using Hoover-Kausik s Software Token *

Authentication Protocols Using Hoover-Kausik s Software Token * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 22, 691-699 (2006) Short Paper Authentication Protocols Using Hoover-Kausik s Software Token * WEI-CHI KU AND HUI-LUNG LEE + Department of Computer Science

More information

A Secure & Efficient Data Integrity Model to establish trust in cloud computing using TPA

A Secure & Efficient Data Integrity Model to establish trust in cloud computing using TPA A Secure & Efficient Data Integrity Model to establish trust in cloud computing using TPA Mr.Mahesh S.Giri Department of Computer Science & Engineering Technocrats Institute of Technology Bhopal, India

More information

Digital Signatures. (Note that authentication of sender is also achieved by MACs.) Scan your handwritten signature and append it to the document?

Digital Signatures. (Note that authentication of sender is also achieved by MACs.) Scan your handwritten signature and append it to the document? Cryptography Digital Signatures Professor: Marius Zimand Digital signatures are meant to realize authentication of the sender nonrepudiation (Note that authentication of sender is also achieved by MACs.)

More information

Sharing Of Multi Owner Data in Dynamic Groups Securely In Cloud Environment

Sharing Of Multi Owner Data in Dynamic Groups Securely In Cloud Environment Sharing Of Multi Owner Data in Dynamic Groups Securely In Cloud Environment Deepa Noorandevarmath 1, Rameshkumar H.K 2, C M Parameshwarappa 3 1 PG Student, Dept of CS&E, STJIT, Ranebennur. Karnataka, India

More information

CLOUD computing systems, in which the clients

CLOUD computing systems, in which the clients IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. X, NO. X, JANUARY 20XX 1 A Practical, Secure, and Verifiable Cloud Computing for Mobile Systems Sriram N. Premnath, Zygmunt J. Haas, Fellow, IEEE arxiv:1410.1389v1

More information

Efficient Unlinkable Secret Handshakes for Anonymous Communications

Efficient Unlinkable Secret Handshakes for Anonymous Communications 보안공학연구논문지 (Journal of Security Engineering), 제 7권 제 6호 2010년 12월 Efficient Unlinkable Secret Handshakes for Anonymous Communications Eun-Kyung Ryu 1), Kee-Young Yoo 2), Keum-Sook Ha 3) Abstract The technique

More information

On Generating the Initial Key in the Bounded-Storage Model

On Generating the Initial Key in the Bounded-Storage Model On Generating the Initial Key in the Bounded-Storage Model Stefan Dziembowski Institute of Informatics, Warsaw University Banacha 2, PL-02-097 Warsaw, Poland, std@mimuw.edu.pl Ueli Maurer Department of

More information

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wānanga o te Ūpoko o te Ika a Māui

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wānanga o te Ūpoko o te Ika a Māui VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wānanga o te Ūpoko o te Ika a Māui School of Engineering and Computer Science Te Kura Mātai Pūkaha, Pūrorohiko PO Box 600 Wellington New Zealand Tel: +64 4 463

More information

Privacy and Verifiability for Data Storage in Cloud Computing. Melek Ӧnen August 17, 2015 IFIP Summer School, Edinburgh

Privacy and Verifiability for Data Storage in Cloud Computing. Melek Ӧnen August 17, 2015 IFIP Summer School, Edinburgh Privacy and Verifiability for Data Storage in Cloud Computing Melek Ӧnen August 17, 2015 IFIP Summer School, Edinburgh Cloud Computing Outsourcing storage & computation High availability No IT maintenance

More information

CS 161 Computer Security Spring 2010 Paxson/Wagner MT2

CS 161 Computer Security Spring 2010 Paxson/Wagner MT2 CS 161 Computer Security Spring 2010 Paxson/Wagner MT2 PRINT your name:, (last) SIGN your name: (first) PRINT your class account login: cs161- Your T s name: Your section time: Name of the person sitting

More information

Ensuring Data Storage Security in Cloud Computing

Ensuring Data Storage Security in Cloud Computing Ensuring Data Storage Security in Cloud Computing Cong Wang 1, Qian Wang 1, Kui Ren 1, and Wenjing Lou 2 1 ECE Department, Illinois Institute of Technology 2 ECE Department, Worcester Polytechnic Institute

More information

Private Inference Control For Aggregate Database Queries

Private Inference Control For Aggregate Database Queries Private Inference Control For Aggregate Database Queries Geetha Jagannathan geetha@cs.rutgers.edu Rebecca N. Wright Rebecca.Wright@rutgers.edu Department of Computer Science Rutgers, State University of

More information

WHITE PAPER www.tresorit.com

WHITE PAPER www.tresorit.com WHITE PAPER tresor [tʀeˈzoːɐ ] noun (German) 1. lockable, armoured cabinet THE CLOUD IS UNTRUSTED The cloud has huge potential when it comes to storing, sharing and exchanging files, but the security provided

More information

Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs

Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs Data Reduction: Deduplication and Compression Danny Harnik IBM Haifa Research Labs Motivation Reducing the amount of data is a desirable goal Data reduction: an attempt to compress the huge amounts of

More information

Improved Online/Offline Signature Schemes

Improved Online/Offline Signature Schemes Improved Online/Offline Signature Schemes Adi Shamir and Yael Tauman Applied Math. Dept. The Weizmann Institute of Science Rehovot 76100, Israel {shamir,tauman}@wisdom.weizmann.ac.il Abstract. The notion

More information

Dashlane Security Whitepaper

Dashlane Security Whitepaper Dashlane Security Whitepaper November 2014 Protection of User Data in Dashlane Protection of User Data in Dashlane relies on 3 separate secrets: The User Master Password Never stored locally nor remotely.

More information

Cloud Storage Security

Cloud Storage Security Cloud Storage Security Sven Vowé Fraunhofer Institute for Secure Information Technology (SIT) Darmstadt, Germany SIT is a member of CASED (Center for Advanced Security Research Darmstadt) Cloud Storage

More information

Cryptography: Authentication, Blind Signatures, and Digital Cash

Cryptography: Authentication, Blind Signatures, and Digital Cash Cryptography: Authentication, Blind Signatures, and Digital Cash Rebecca Bellovin 1 Introduction One of the most exciting ideas in cryptography in the past few decades, with the widest array of applications,

More information

Books and Beyond. Erhan J Kartaltepe, Paul Parker, and Shouhuai Xu Department of Computer Science University of Texas at San Antonio

Books and Beyond. Erhan J Kartaltepe, Paul Parker, and Shouhuai Xu Department of Computer Science University of Texas at San Antonio How to Secure Your Email Address Books and Beyond Erhan J Kartaltepe, Paul Parker, and Shouhuai Xu p Department of Computer Science University of Texas at San Antonio Outline Email: A Brief Overview The

More information

A Survey and Analysis of Solutions to the. Oblivious Memory Access Problem. Erin Elizabeth Chapman

A Survey and Analysis of Solutions to the. Oblivious Memory Access Problem. Erin Elizabeth Chapman A Survey and Analysis of Solutions to the Oblivious Memory Access Problem by Erin Elizabeth Chapman A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in

More information

Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing

Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing Practical Yet Universally Composable Two-Server Password-Authenticated Secret Sharing Jan Camenisch IBM Research Zurich jca@zurich.ibm.com Anna Lysyanskaya Brown University anna@cs.brown.edu Gregory Neven

More information

DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION

DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION Hasna.R 1, S.Sangeetha 2 1 PG Scholar, Dhanalakshmi Srinivasan College of Engineering, Coimbatore. 2 Assistant Professor, Dhanalakshmi Srinivasan

More information

Elements of Applied Cryptography Public key encryption

Elements of Applied Cryptography Public key encryption Network Security Elements of Applied Cryptography Public key encryption Public key cryptosystem RSA and the factorization problem RSA in practice Other asymmetric ciphers Asymmetric Encryption Scheme Let

More information

Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010

Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010 CS 494/594 Computer and Network Security Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010 1 Introduction to Cryptography What is cryptography?

More information

Network Security CS 5490/6490 Fall 2015 Lecture Notes 8/26/2015

Network Security CS 5490/6490 Fall 2015 Lecture Notes 8/26/2015 Network Security CS 5490/6490 Fall 2015 Lecture Notes 8/26/2015 Chapter 2: Introduction to Cryptography What is cryptography? It is a process/art of mangling information in such a way so as to make it

More information

Module 7 Security CS655! 7-1!

Module 7 Security CS655! 7-1! Module 7 Security CS655! 7-1! Issues Separation of! Security policies! Precise definition of which entities in the system can take what actions! Security mechanism! Means of enforcing that policy! Distributed

More information

Privacy-Preserving Set Operations

Privacy-Preserving Set Operations Privacy-Preserving Set Operations Lea Kissner and Dawn Song Carnegie Mellon University Abstract In many important applications, a collection of mutually distrustful parties must perform private computation

More information

Differentially private client-side data deduplication protocol for cloud storage services

Differentially private client-side data deduplication protocol for cloud storage services SECURITY AND COMMUNICATION NETWORKS Security Comm. Networks (2014) Published online in Wiley Online Library (wileyonlinelibrary.com)..1159 RESEARCH ARTICLE Differentially private client-side data deduplication

More information