Proof of Ownership in Remote Storage Systems S. Halevi, D. Harnik, B. Pinkas and A. Shulman-Peleg Summarized by Eli Haim Advanced Topics in Storage Systems School of Engineering - EE Tel Aviv University May 26, 2013 Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 1 / 24
Outline 1 Introduction 2 PoW 3 Solution: A General Protocol 4 Security-Efficiency Tradeoff 5 Performance Evaluation Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 2 / 24
Outline Introduction 1 Introduction 2 PoW 3 Solution: A General Protocol 4 Security-Efficiency Tradeoff 5 Performance Evaluation Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 3 / 24
Introduction Deduplication Client-Side Cross-User Deduplication Prior to uploading the file: The client computes a hash over the file (key), and sends it to the server. The server checks if the file already exists in its storage (via the key). If not, it uploads the file from the client. If yes, it does not need to upload it. Benefits: Saves storage space (at the server). Saves bandwidth (at both sides). Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 4 / 24
Introduction Deduplication Principles which lead to risks, as raised in this paper Knowing only the hash gives access to the entire file. Leads to attacks based on the principles: Getting a small amount of data, leads to access of a huge amount of data. Distribution of huge amounts of data. Leakage amplification. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 5 / 24
Introduction Attacks Potential Attacks Using the storage service as a Content Distribution Network (CDN): For example, a backup service is designed to support many uploads but very few downloads. Server break-in (cache). Malicious client software (low bandwidth). Leakage - risk increases as the number of users sharing the file. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 6 / 24
Outline PoW 1 Introduction 2 PoW 3 Solution: A General Protocol 4 Security-Efficiency Tradeoff 5 Performance Evaluation Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 7 / 24
PoW Proof of Ownership (PoW) A new concept: Proof of Ownership A proof mechanism that prevents such vulnerability. A protocol by which the client can prove to the server that it has a copy of the file, without actually sending the file. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 8 / 24
PoW Proof of Ownership (PoW) A new concept: Proof of Ownership A proof mechanism that prevents such vulnerability. A protocol by which the client can prove to the server that it has a copy of the file, without actually sending the file. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 8 / 24
PoW Proof of Ownership (PoW) A new concept: Proof of Ownership A proof mechanism that prevents such vulnerability. A protocol by which the client can prove to the server that it has a copy of the file, without actually sending the file. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 8 / 24
PoW Proof of Ownership (PoW) A new concept: Proof of Ownership A proof mechanism that prevents such vulnerability. A protocol by which the client can prove to the server that it has a copy of the file, without actually sending the file. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 8 / 24
PoW Requirements and Model Requirements from the Protocol Public hash function (enables cross-user deduplication). Bandwidth efficient. The server access only a short information per file (as the file may be stored in a secondary storage). Client side constraints: Single-pass over the file. Reasonable amount of memory. Security: no very-short state from which the proof can be computed. The protocol is efficient. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 9 / 24
Attacker Model PoW Requirements and Model The attacker have accomplices, who have the file. Constraints: The total number of bits that the accomplices send the attacker is the min-entropy of the file. The accomplices can help only at an off-line stage, i.e., before the protocol begins. The min-entropy of a distributiond = {p 1,...,p n} is defined by H = log max i p i. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 10 / 24
PoW Requirements and Model Strong PoW The file is randomly drawn from a distribution D. t = bits of min-entropy of D. s = minimal number of bits which the attacker didn t get from the accomplices. T = event that the attacker convinces the server. ǫ = soundness. Definition (Strong PoW) where f(s) is negligible in s. Pr(T) ǫ+f(s), Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 11 / 24
PoW Requirements and Model Strong PoW The file is randomly drawn from a distribution D. t = bits of min-entropy of D. s = minimal number of bits which the attacker didn t get from the accomplices. T = event that the attacker convinces the server. ǫ = soundness. Definition (Strong PoW) where f(s) is negligible in s. Pr(T) ǫ+f(s), Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 11 / 24
Outline Solution: A General Protocol 1 Introduction 2 PoW 3 Solution: A General Protocol 4 Security-Efficiency Tradeoff 5 Performance Evaluation Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 12 / 24
Solution: A General Protocol Preprocessing: Construction of the Merkle-Tree Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 13 / 24
Solution: A General Protocol Validation Sibling-path of a leaf: the leaf together with the siblings of all the nodes in the path from the leaf to the root. The root of the tree can be computed from the sibling path. A sibling-path is valid if the root, as computed from it, is indeed equal to the root of the tree. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 14 / 24
Solution: A General Protocol Protocol The server selects in random a small number of leaf-indexes (poly-logarithmic in the filesize). The server sends these indexes to the client. The client returns the sibling-path of every leaf-index. The server check that every sibling-path is valid. If all are valid, then the client proves the server that it owns the file. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 15 / 24
Merkle-Tree Lemma Solution: A General Protocol Every prover that convinces the server with high enough probability, can be converted into an extractor which extracts most of the leaves of the tree. α = the redundancy of the erasure code, i.e. knowing (1 α) of the file suffices. s = number of leaves in the tree. u = number of requested leaves in the protocol. T = event that the prover convinces the server. K = number of leaves that the extractor can extracts. Lemma (Merkle-tree Lemma) For every prover and every δ [0, 1], there exists an extractor which makes at most u2 s(1+logs) δ calls to the prover. if Pr(T) (1 α) u +δ, then Pr(K (1 α)s) 1 4. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 16 / 24
Solution: A General Protocol Theorem Theorem The Merkle-tree based protocol is a strong PoW with soundness (1 α) u. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 17 / 24
Outline Security-Efficiency Tradeoff 1 Introduction 2 PoW 3 Solution: A General Protocol 4 Security-Efficiency Tradeoff 5 Performance Evaluation Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 18 / 24
Security-Efficiency Tradeoff Protocol with Small Space Computing the erasure code is expensive for very large files: requires many random accesses to the disk. Security assumptions (adds to previous ones): T is now an absolute leakage threshold to the knowledge of the attacker. The attacker now knows at most min(t,t s) bits of the file. Solution: the file is hashed down to L bits before the Merkle-tree construction. This requires a pairwise-independent hash ensemble. Theorem The leakage threshold is ( 1 T = L 3 1 ). 2b Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 19 / 24
Security-Efficiency Tradeoff Protocol with Small Space Computing the erasure code is expensive for very large files: requires many random accesses to the disk. Security assumptions (adds to previous ones): T is now an absolute leakage threshold to the knowledge of the attacker. The attacker now knows at most min(t,t s) bits of the file. Solution: the file is hashed down to L bits before the Merkle-tree construction. This requires a pairwise-independent hash ensemble. Theorem The leakage threshold is ( 1 T = L 3 1 ). 2b Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 19 / 24
Security-Efficiency Tradeoff Protocol with Small Space Computing the erasure code is expensive for very large files: requires many random accesses to the disk. Security assumptions (adds to previous ones): T is now an absolute leakage threshold to the knowledge of the attacker. The attacker now knows at most min(t,t s) bits of the file. Solution: the file is hashed down to L bits before the Merkle-tree construction. This requires a pairwise-independent hash ensemble. Theorem The leakage threshold is ( 1 T = L 3 1 ). 2b Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 19 / 24
Security-Efficiency Tradeoff A Streaming Protocol Computing the hash is expensive for very large files: requires many random accesses to the disk. Security assumption (adds to previous ones): The file is not arbitrary. It is drawn from an class of block-fixing distributions: Every block is completely random or fully known. The random blocks are chosen from a low-rank linear space. Solution: using a sparse linear hashing C. Theorem If for every t M full-rank matrix A, with high probability (over C), the code generated by the rows of AC has a minimum-distance at least d, then the scheme is a PoW with soundness ( ) u L d + 1 with respect to generalized block-fixing distributions with min-entropy t. L Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 20 / 24
Security-Efficiency Tradeoff A Streaming Protocol Computing the hash is expensive for very large files: requires many random accesses to the disk. Security assumption (adds to previous ones): The file is not arbitrary. It is drawn from an class of block-fixing distributions: Every block is completely random or fully known. The random blocks are chosen from a low-rank linear space. Solution: using a sparse linear hashing C. Theorem If for every t M full-rank matrix A, with high probability (over C), the code generated by the rows of AC has a minimum-distance at least d, then the scheme is a PoW with soundness ( ) u L d + 1 with respect to generalized block-fixing distributions with min-entropy t. L Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 20 / 24
Security-Efficiency Tradeoff A Streaming Protocol Computing the hash is expensive for very large files: requires many random accesses to the disk. Security assumption (adds to previous ones): The file is not arbitrary. It is drawn from an class of block-fixing distributions: Every block is completely random or fully known. The random blocks are chosen from a low-rank linear space. Solution: using a sparse linear hashing C. Theorem If for every t M full-rank matrix A, with high probability (over C), the code generated by the rows of AC has a minimum-distance at least d, then the scheme is a PoW with soundness ( ) u L d + 1 with respect to generalized block-fixing distributions with min-entropy t. L Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 20 / 24
Outline Performance Evaluation 1 Introduction 2 PoW 3 Solution: A General Protocol 4 Security-Efficiency Tradeoff 5 Performance Evaluation Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 21 / 24
Performance Evaluation Implementation Parameters Block size: B = 512bit Buffer size: L = min(64mbyte, filesize) Number of iterations for reduction & mixing): 5. Number of challenge leaves: u = 20. Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 22 / 24
Performance Evaluation Performance Evaluation Performance Evaluation Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 23 / 24
Proof of Ownership Thank you for your attention Summarized by Eli Haim (TAU) Proof of Ownership May 26, 2013 24 / 24