Réunion CAPPRIS 21 mars 2013 Monir Azraoui, Kaoutar Elkhiyaoui, Refik Molva, Melek Ӧnen Slide 1
Cloud computing Idea: Outsourcing Ø Huge distributed data centers Ø Offer storage and computation Benefit: Cost reduction Ø Parallelization Ø Maintenance, reliability Main phases Ø Data upload Ø Computation upload (Java classes) Ø MapReduce Ø Result return Many large files 2
Privacy in Cloud Computing Sensitive data Ø Companies F Internal data F Human resources information Ø Governmental organizations F Prefecture: license plates, car owners... Challenge: Prying clouds Ø Adversary = honest-but-curious cloud Ø Data & Computation privacy Ø Do not cancel cloud advantages Ø Lightweight operations at the client side 3
Current Research Focus Proof of retrievability Handling encrypted data Accountability A4Cloud EU Project 4
Current research focus (cont d) Proof of Retrievability Ø Integrity Ø Very large amounts of data Ø Integrity proofs computed by untrusted clouds Ø Blockless verification PoR: Juels 2007 Provable Data Possession: Ateniese 2007 5
Current research focus (cont d) Handling encrypted data Ø Prying clouds F Data encrypted by the cloud Ø Very large amounts of data F Operations in the cloud performed by the cloud provider Solution for word search: PRISM 6
Handling encrypted data - scenario Data retention scenario Ø Internet Service Provider retains customers log/access data (for 6 years!) Ø Example: DNS logs (time, IP, hostname) Logs Save money: Outsource to cloud Challenge Ø Protect customer Privacy against prying clouds F Privacy: Encrypt log entries Ø Support queries: Has x accessed y (at time z)? F Word S Search Ø Efficiency: Leverage clouds massive parallelism F M MapReduce 7
PRIvacy preserving Search in MapReduce Contribution Ø Allows finding files containing words in clouds F Contrary to server-based solutions, e.g., Boneh et.al. 04 ( PEKS ), Song et.al. 00, Popa et.al. 11 ( Crypt-DB ) Ø Data privacy: No (non trivial) data analysis Ø Computation privacy: query privacy, query unlinkability Ø Evaluation: privacy proofs and implementation (11% overhead) Main idea Ø Word existence transformed to PIR problems Ø Map: Evaluate PIR problem per mapper on each InputSplit Ø Reduce: combine mapper output with simple addition Ø User decodes output, decides existence 8
PRISM: MapReduce Overview word? File Idea: Transform search for word into PIR Encrypt query & Upload Query for word Q(word) User E( ) E( ) E( ) E( ) Q(word) Q(word) Q(word) Q(word) InputSplit Mapper PIR Matrix E(0) homomorphic E(1) E(0) E(0) Reducer Cloud E(1) E(0) Result 9
PRISM - Upload Data privacy stateful cipher Ø efficient encryption AES Ø Indistinguishability AES + Plaintext counter Example: - K d = HMAC(K,d) - Initialize: γ w = 0 - Encrypt: E(w, γ w ), γ w = γ w + 1 - Maintain counter γ w for each w E(w) = E(w, γ w ) AES Pairing (e.g., padding + concatenation) Plaintext counter PRISM Privacy Privacy and Security Preserving Cloud Search Computing MapReduce 10
PIR: Private Information Retrieval d 1 d 2 d 3 d 4 1101 0100 1000 1010 k? wants to retrieve some data d k Upload: Data Matrix M d k Should not learn what is retrieved Query: User computes & send α= [ α 1, α 2,.., α k,.., α t ] Ø α k =b(1+ a k.n) mod p E(1) Ø α i = b(a i.n) mod p E(0) Process: Server computes β= 1 2... t 1 1 1 0 1 2 0 1 0 0... 1 0 0 0 t 1 0 1 0 11
PRISM Search: Query transformation User: PrepareQuery(w) Ø If w exists F W has been encrypted at least onceà E(w,1) has been uploaded Ø Computes candidate position: F CP : <X,Y> = E(w,1) Ø Compute PIR input α= [ α 1, α 2,.., α k,.., α t ] F α k =b(1+ a k.n) F α i = b(a i.n) α 2 = E(1) α i = E(0) PIR 1 2... t 1 Ø Send α to the cloud Query privacy 2... t CP 12
PRISM-Search: Map & Reduce Map: PIR Matrix construction (PIR matrix M data) Ø Matrix initialization to 0 Ø H( C i ),j 1 =1 compute CP i =<X i,y i >= C i Map: Process query: Column sums Ø For all rows F Compute: σ j = α i. M i,j σ 1 = α 3 + α 4 =E(0) σ 2 = α 2 + α 4 =E(1) C 1 C 2 C 3 C 4 PIR 1 2... t Map:Both steps repeated q times Ø Send q vectors σ Reduce: Ø Aggregation = addition Ø Homomorphism correctness 1 0 0 01 01 2 0 0 1 0 01... 0 1 0 01 01 t 01 01 0 01 13
PRISM Result analysis Receive t sums Ø Decrypt σ Y 0 1 0 1 Decision Ø D(σ Y ) =0 & h(c i )=1 contradiction, w cannot be in file Ø Otherwise w might be in file: false positives (collisions) Run q>1 rounds of PRISM Ø Depending on t, q,... tailor false positives probabilities Ø Result: after q rounds, w is in file with high probability 14
Overview: Privacy Properties Encryption of w using Stateful-Cipher Ø Idea: instead Pseudorandom of encrypting Permutation w, encrypt w with counter γ w Assumption Ø C := E(w, γ w ), γ w :=γ w +1 for each occurrence of w Ø Initialize γ w to 1, search for ciphertext E(w, 1) PIR scheme (computation of P-values) Ø query for column k (= candidate position, based on w) Trapdoor Group Ø P k := b (1 + a k N) mod Assumption p à E(1) a i random number Ø P i k := b a i N mod p à E(0) b, N, p system parameters We formally prove IND-CPA 15
Implementation Setup Ø Log scenario, search in encrypted DNS entries Ø DNS log file from local internet provider F 16 days, 3*10 8 log entries, total of 26 GByte F (Timestamp, customer IP, target host) Ø Hadoop 0.20.2, out of the box installation F 9 workers, 1 master F Fedora 11, 2.5Ghz Pentium Dual-Core, 4GByte RAM, à 16 CPUs F 96 Mbyte InputSplit (120 MByte) Ø Crypto tools: F AES 256bit (Gnu Crypto Library V2.0.1), F Trapdoor Group Assumption PIR using Java BigNumber(!) Analysis Ø Comparison with two baselines ( empty maps) 16
Evaluation Results 17
PRISM - Summary Searching for data in cloud is challenging Cloud untrusted, data encrypted Efficient solutions required PRISM Ø Efficient search on encrypted data in MapReduce Ø Main idea: map search to small PIR problem, combine partial results during reduce Ø 11% overhead over non-private search Ø Runs on standard MapReduce today (as offered by Amazon, Google, Microsoft, IBM ) 18
Conclusion Cloud computing Ø Revisit old problems Ø New setting F scalability F untrusted provider Future work Ø PRISM performed by third parties Ø Main focus on PoR Ø Accountability secure logging 19