Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik, Benny Pinkas, Alexandra Shulman-Peleg Presented by Yair Yona Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 1 / 18
Outline 1 Introduction 2 Deduplication 3 Security Loophole 4 Solutions 5 Conclusions Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 2 / 18
Introduction Cloud storage Fast growth of data volumes: Demand for online storage services Cloud storage services: Low cost, scalable, pay-per-use Service delivered via internet Deduplication: Storing only a single copy of data Provides user a link to the existing copy Reduces storage space of service provider Decreases consumption of bandwidth from client to server Disk and bandwidth savings 90% Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 3 / 18
Main Contribution Introduction Pointing out security loopholes due to cross user deduplication Proposing solution that reduces the risk of data leakage Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 4 / 18
Deduplication Deduplication Strategies File-level: Stores a single copy of each file Block-level: Segments file to blocks. Stores single copy of each block Approaches Target-Based Approach: Dedup is handled by the storage service User is unaware of dedup Does not save bandwidth Source-Based Approach: User sends hash signature to the server If a copy already exists the file is not sent Saves bandwidth and storage Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 5 / 18
Security Loophole The Loophole Settings Source-based approach The client knows whether dedup has occured Cross user dedup Other users can find out whether the file was uploaded Dropbox, Mozy and Memopal apply this setting The server answers the following question by yes/no Did any user previously uploaded a copy of this file? Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 6 / 18
Security Loophole Attack 1: Identifying File Assumptions The file is known to the attacker Unlikely to be at the possession of any other user The attacker uploads the file and checks whether dedup was performed Dedup can answer whether the user possess the file Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 7 / 18
Security Loophole Attack 2: Finding the Content of a File Assume the number of possibilities is limited The attacker uploads each possible version of the file The deduplicated version is identical to the file Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 8 / 18
Security Loophole Attack 3: Covert Channel Assume malicious software installed on the users machine The software establishes a covert channel based on cross users dedup Bypasses the firewall and communicates with its control server Binary example The software saves two files on the users machine The files are uploaded to the users backup service The control server uploads these files to the same backup service The deduplicated file represents the binary value Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 9 / 18
Solutions Solution 1: Encryption Encrypting the file using personal key before uploading to the service Different keys for identical files yield different encrypted files Does not allow deduplication This solution is vulnerable to offline dictionary attacks Deduplication reveals the key May indicate that a certain user posses the file Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 10 / 18
Solutions Solution 2: Target-Based Approach Deduplication is performed on the server side Eliminates bandwidth saving The cost of transferring 1 GB is proportional to the cost of storing it for 2 month (Amazon S3 service, June 2010) Solution of Mozyhome Relatively small files are uploaded Source based deduplication is performed on larger files Effective when Sensitive data is stored in small files Most bandwidth is consumed by large files Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 11 / 18
Solutions Solution 3: Randomization Weakening the correlation between existence of files in the storage system and deduplication Each file is assigned with a random threshold Source based deduplication is performed when the number of copies of different users exceeds this threshold Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 12 / 18
Solutions Solution 3: Description For every file X select a threshold t X [2,..., d] d may be public t X is chosen uniformly at random It is known only to the server c X is the number of users that uploaded copies of the file Source based dedup occurs when either c X t X X is uploaded by a client that already uploaded it Otherwise the file is sent and target based dedup occurs Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 13 / 18
Solutions Solution 3: Deletions Possible attack Attacker identifies source based dedup after t uploads It then deletes two files and again uploads these copies If source based dedup occurs after only one upload it indicates that some other user uploaded the file Not very practical attack since deleted files are retained by the service for some period of time Solution When c X t X source based dedup is always performed the server must keep a copy of the file even when all files are deleted Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 14 / 18
Solutions Solution 3: Security Analysis Examine the case where either A single copy of the file was uploaded No copy was uploaded Seems to be the most relevant for breaching single user privacy The events are as follows Attacker uploads single copy and dedup occurs: Happens only when t X = 2 and a single copy was uploaded Attacker detects that the file was uploaded Attacker uploads d copies before dedup occurs: Happens only when t X = d and no copy was uploaded Attacker detects that the file was not uploaded Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 15 / 18
Solutions Solution 3: Security Analysis Attacker uploads 2 < t < d copies before dedup occurs: Either t X = t and no copy was uploaded or t X = t+1 and a copy was uploaded The probability that X was uploaded equals its a-priori probability For a fraction of 1 1 d 1 of the files the solution leaks no information that distinguish between the case where a single copy was uploaded and the case where no copy was uploaded Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 16 / 18
Solutions Solution 3: Implications Implications on the service provider No increase in storage For a file X bandwidth increases by t X 1 A new tradeoff is introduced: As d increases The fraction of unprotected files decreases The bandwidth consumption increases The analysis for the covert channel attack is similar to the previous one since the threshold refers to the number of users that uploaded the file Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 17 / 18
Conclusions Conclusions This work reveals security loopholes created in cloud storage services, due to deduplication Several solutions were proposed in order to deal with the security risk A randomized solution that decreases the risk for data leakage at moderate cost was proposed Since cloud storage services are becoming increasingly popular, this work may have significant impact on the privacy provided for many users Yair Yona (TAU) Side channels in cloud services Advanced Topics in Storage Systems 18 / 18
Conclusions Thank you for your attention!