Conjunctive, Subset, and Range Queries on Encrypted Data



Similar documents
Conjunctive, Subset, and Range Queries on Encrypted Data

Introduction to NP-Completeness Written and copyright c by Jie Wang 1

The Online Freeze-tag Problem

SOME PROPERTIES OF EXTENSIONS OF SMALL DEGREE OVER Q. 1. Quadratic Extensions

Point Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)

New Efficient Searchable Encryption Schemes from Bilinear Pairings

SECTION 6: FIBER BUNDLES

ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks

Public Key Encryption that Allows PIR Queries

A Certification Authority for Elliptic Curve X.509v3 Certificates

1 Gambler s Ruin Problem

A Note on Integer Factorization Using Lattices

Searchable encryption

Identity-Based Encryption from the Weil Pairing

A Modified Measure of Covert Network Performance

Local Connectivity Tests to Identify Wormholes in Wireless Networks

Assignment 9; Due Friday, March 17

Title: Stochastic models of resource allocation for services

FDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES

MESSAGE AUTHENTICATION IN AN IDENTITY-BASED ENCRYPTION SCHEME: 1-KEY-ENCRYPT-THEN-MAC

Monitoring Frequency of Change By Li Qin

Efficient Unlinkable Secret Handshakes for Anonymous Communications

X How to Schedule a Cascade in an Arbitrary Graph

The risk of using the Q heterogeneity estimator for software engineering experiments

Secure synthesis and activation of protocol translation agents

Public Key Encryption with keyword Search

Failure Behavior Analysis for Reliable Distributed Embedded Systems

The impact of metadata implementation on webpage visibility in search engine results (Part II) q

On Multicast Capacity and Delay in Cognitive Radio Mobile Ad-hoc Networks

Stat 134 Fall 2011: Gambler s ruin

Lectures on the Dirichlet Class Number Formula for Imaginary Quadratic Fields. Tom Weston

Computational Finance The Martingale Measure and Pricing of Derivatives

Evaluating a Web-Based Information System for Managing Master of Science Summer Projects

An important observation in supply chain management, known as the bullwhip effect,

POISSON PROCESSES. Chapter Introduction Arrival processes

Comparing Dissimilarity Measures for Symbolic Data Analysis

PRIME NUMBERS AND THE RIEMANN HYPOTHESIS

Security over Cloud Data through Encryption Standards

A Simple Model of Pricing, Markups and Market. Power Under Demand Fluctuations

Secure Group Oriented Data Access Model with Keyword Search Property in Cloud Computing Environment

Public Key Encryption with keyword Search

Automatic Search for Correlated Alarms

Enhanced Security Key Management Scheme for MANETS

FACTORING BIVARIATE SPARSE (LACUNARY) POLYNOMIALS

Secure and Efficient Data Retrieval Process based on Hilbert Space Filling Curve

Concurrent Program Synthesis Based on Supervisory Control

Softmax Model as Generalization upon Logistic Discrimination Suffers from Overfitting

Risk in Revenue Management and Dynamic Pricing

Learning Human Behavior from Analyzing Activities in Virtual Environments

Multiperiod Portfolio Optimization with General Transaction Costs

A Multivariate Statistical Analysis of Stock Trends. Abstract

TOWARDS REAL-TIME METADATA FOR SENSOR-BASED NETWORKS AND GEOGRAPHIC DATABASES

A MOST PROBABLE POINT-BASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION

Risk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7

Static and Dynamic Properties of Small-world Connection Topologies Based on Transit-stub Networks

TRANSCENDENTAL NUMBERS

Precalculus Prerequisites a.k.a. Chapter 0. August 16, 2013

C-Bus Voltage Calculation

The Economics of the Cloud: Price Competition and Congestion

Modeling and Simulation of an Incremental Encoder Used in Electrical Drives

As we have seen, there is a close connection between Legendre symbols of the form

Stochastic Derivation of an Integral Equation for Probability Generating Functions

How To Solve The Prime Prime Prime Root Problem In Algebraic Theory

Coin ToGa: A Coin-Tossing Game

IMPROVING NAIVE BAYESIAN SPAM FILTERING

Sage Timberline Office

Experiments in Encrypted and Searchable Network Audit Logs

Design of A Knowledge Based Trouble Call System with Colored Petri Net Models

Identity-Based Encryption

Large firms and heterogeneity: the structure of trade and industry under oligopoly

Interbank Market and Central Bank Policy

Machine Learning with Operational Costs

Chosen-Ciphertext Security from Identity-Based Encryption

From Simulation to Experiment: A Case Study on Multiprocessor Task Scheduling

FREQUENCIES OF SUCCESSIVE PAIRS OF PRIME RESIDUES

Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions

Transcription:

Conjunctive, Subset, and ange Queries on Encryted Data Dan Boneh dabo@cs.stanford.edu Brent Waters bwaters@csl.sri.com Abstract We construct ublic-key systems that suort comarison queries (x a) on encryted data as well as more general queries such as subset queries (x S). These systems also suort arbitrary conjunctive queries (P 1 P l ) without leaking information on individual conjuncts. We resent a general framework for constructing and analyzing ublic-key systems suorting queries on encryted data. 1 Introduction Queries on encryted data are easiest to exlain with an examle. Consider a creditcard ayment gateway that observes a stream of encryted transactions, say encryted under Visa s ublic key. The gateway needs to flag all transactions satisfying a certain redicate P. Say, all transactions whose value is over $ 1000. Storing Visa s secret key on the gateway is a bad idea for both security and rivacy concerns. Instead, Visa wishes to give the gateway a token TK P that enables the gateway to identify transactions satisfying P without learning anything else about these transactions. Of course, generating the token TK P will require Visa s secret key. As another examle, consider a mail server that receives a stream of email messages encryted under the reciients ublic key. If the email message satisfies a certain redicate P the mail server should forward the email to the reciient s ager. If the email satisfies some other redicate P the server should just discard the email. Otherwise, the server should lace the email in the reciient s inbox. The reciient does not want to give the mail server the full rivate key. Instead, she wants to give the server two tokens TK P and TK P enabling the server to test for the redicates P and P without learning any other information about the email. Our goal is to build a ublic-key system that suorts a rich set of query redicates. In our ayment gateway examle one can imagine comarison queries such as (value > 1000) or even conjunctions such as (value > 1000) and (TransactionTime > 5m). The gateway should learn no information other than the value of the conjunctive redicate. In case a conjunction P 1 P 2 is false, the gateway should not learn which of the two conjuncts P 1 or P 2 is false. In our second examle involving a mail server one can imagine testing for subset queries such as (sender S) where S is a set of sender emails. Conjunctive queries such as (sender S) and (subject = urgent) also make sense. Perhas in the distant future, when highly comlex queries on encryted data are ossible, one can imagine running an anti-virus/anti-sam redicate on encryted emails. The mail server learns nothing about incoming encryted email other than its sam status. Suorted by NSF and the Packard Foundation. 1

Unfortunately, until now, only simle equality queries on encryted data were ossible. Song et al. [18] develoed a mechanism for equality tests on data encryted with a symmetric key system. Boneh et al. [7] constructed equality tests in the ublic-key settings. Our results. We resent a general framework for analyzing and constructing searchable ublickey systems for various families of redicates. We then construct ublic-key systems that suort comarison queries (such as greater-than) and general subset queries. We also suort arbitrary conjunctions. We evaluate our results based on cihertext size and token size. Let T = {1, 2,..., n} and suose we encryt a tule x = (x 1,..., x w ) T w. Say x 1 is a transaction value, x 2 is a card exiration date, and so on. The following table summarizes our results at a high level. Cihertext Token Query Tye Source Size Size Equality query: (x i = a) for any a T [18, 16, 7, 1] O(1) O(1) Comarison query: (x i a) for any a T [10, 11] 1 O( n) O( n) Subset query: (x i A) for any A T This aer O(n) O(n) Equality conjunction: (x 1 = a 1 )... (x w = a w ) This aer O(w) O(w) Comarison conjunction: (x 1 a 1 )... (x w a w ) This aer O(nw) O(w) Subset conjunction: (x 1 A 1 )... (x w A w ) This aer O(nw) O(nw) Here (a 1,..., a w ) is an arbitrary vector that defines a conjunctive equality or a comarison redicate. Similarly, A 1,..., A w are arbitrary subsets of {1,..., n} that define a conjunctive subset query redicate. We emhasize that when a conjunction redicate is false, the system does not leak which of the w conjuncts caused it. Prior to these results the best systems for comarison and subset queries were the trivial bruteforce systems discussed in Section 3. For comarison queries these systems generate a cihertext of size O(n w ) and for subset queries they generate a cihertext of size O(2 nw ). Note that even without conjunction, namely for w = 1, our subset query construction generates cihertexts that are exonentially shorter than the best known revious solution (O(n) vs. O(2 n )). The main tool used in these constructions is a new rimitive we call Hidden Vector Encrytion or HVE for short. This rimitive can be viewed as an extreme generalization of Anonymous Identity Based Encrytion (AnonIBE) [7, 1, 12]. We show how HVE imlies all the results in the table. Are there ublic key systems that suort larger classes of redicates? Ultimately, one would like a ublic-key system that suorts searches for any redicate comutable by a shallow circuit. Presently, this aears to be a difficult oen roblem. elated work. Equality tests on encryted data were considered in [18, 7]. Equality searches on an encryted audit log were roosed in [19]. Equality tests in the symmetric key settings are closely related to oblivious AM techniques [16, 13]. Equality tests in the ublic key settings are closely related to Anonymous Identity Based Encrytion (AnonIBE) [1, 12]. Conjunctive equality queries were first studied in [14]. Equality searches on streaming data that hide the requested redicate were discussed in [17] and [4]. Efficient equality searches in databases were recently 1 Both aers [10, 11] focus on traitor tracing, but as we observe in Aendix C, their aroach directly gives a comarison searching system without conjunctions. 2

resented in [2]. Bethencourt et al. [3] recently gave a construction for efficient range queries in a weaker security model. That is, when the encryted index falls in the secified range, the search token reveals the index. 2 Definitions We begin by defining a general framework for queries on encryted data. Let Σ be a finite set of binary strings. A redicate P over Σ is a function P : Σ {0, 1}. We say that S Σ satisfies the redicate if P (S) = 1. 2.1 Searchable encrytion Let Φ be a set of redicates over Σ. A Φ-searchable ublic key system comrises of the following algorithms: Setu(λ) A robabilistic algorithm that takes as inut a security arameter and oututs a ublic key PK and secret key SK. Encryt(PK, S, M) Encryts the laintext air (S, M) using the ublic key PK. We view S Σ as the searchable field, called an index, and M M as the data. GenToken(SK, P ) Takes as inut a secret key SK and the descrition of a redicate P Φ. It oututs a token TK P. Query(TK, C) Takes a token TK for some redicate P Φ as inut and a cihertext C. It oututs a message M M or. oughly seaking, if C is an encrytion of (S, M) then the algorithm oututs M when P (S) = 1 and oututs otherwise. The recise requirement is catured in the query correctness roerty below. Correctness. The system must satisfy the following correctness roerty: Query correctness: For all (S, M) Σ M and all redicates P Φ: Let (PK, SK) Setu(λ), C Encryt(PK, S, M), and TK GenToken(SK, P ). If P (S) = 1 then Query(TK, C) = M. If P (S) = 0 then Pr[Query(TK, C) = ] > 1 ɛ(λ) where ɛ(λ) is a negligible function. Suose that given a cihertext C Encryt(PK, S, M) we are only interested in testing whether a redicate P (S) is satisfied. In this case the message sace M can be set to a singleton, say M = {true}. Algorithm Query(TK, C) will return true when P (S) = 1 and otherwise. A larger message sace M is useful if TK is intended to unlock some M M whenever the redicate P (S) = 1. For examle, when the transaction value is over 1000$ we may want the ayment gateway to obtain more information about the transaction. Otherwise, the gateway should learn nothing. Notice that a Φ-searchable system does not rovide a Decryt algorithm that uses SK to decryt a cihertext C and oututs (S, M). One can always add this caability by also encryting (S, M) under a standard ublic key system. There is no need for the searchable system to exlicitely rovide this caability. 3

x y z 1 σ 1 σ 2 σ 3 σ 4 n Figure 1: Tokens for σ 1, σ 2, σ 3, σ 4 given to the adversary An examle comarison queries. Before defining security, we first give a motivating examle using comarison queries. Let Σ = {1,..., n} for some integer n. For σ {1,..., n} let P σ be the following comarison redicate: { 1 if x σ, P σ (x) = 0 otherwise Let Φ n = {P 1,..., P n } be the set of all n comarison redicates. Suose the adversary has the tokens for redicates P σ1, P σ2,..., P σw where σ 1 < σ 2 < < σ w. Lets x, y, z be some integers as in Figure 1. Clearly the adversary can distinguish Encryt(PK, x, m) from Encryt(PK, y, m) using the token for the redicate P σ2. However, the adversary should not be able to distinguish Encryt(PK, y, m) from Encryt(PK, z, m). Indeed, searating an encrytion of y from an encrytion of z is information that should not be exosed by the tokens at the adversary s disosal. Our definition of security catures this roerty using the general framework. 2.2 Security We define security of a Φ-searchable system E using a query security game that catures the intuition that tokens TK reveal no unintended information about the laintext. The game gives the adversary a number of tokens and requires that the adversary cannot use these tokens to deduce unintended information. The game roceeds as follows: Setu. The challenger runs Setu(λ) and gives the adversary PK. Query hase 1. The adversary adatively oututs descritions of redicates P 1, P 2,..., P q1 Φ. The challenger resonds with the corresonding tokens TK i GenToken(SK, P i ). We refer to such queries as redicate queries. Challenge. The adversary oututs two airs (S 0, M 0 ) and (S 1, M 1 ) subject to two restrictions: First, P i (S 0 ) = P i (S 1 ) for all i = 1, 2,..., q 1. Second, if M 0 M 1 then P i (S 0 ) = P i (S 1 ) = 0 for all i = 1, 2,..., q 1. The challenger flis a coin β {0, 1} and gives C Encryt(PK, S β, M β ) to the adversary. The two restrictions ensure that the tokens given to the adversary do not trivially break the challenge. The first restriction ensures that tokens given to the adversary do not directly distinguish S 0 from S 1. The second restriction ensures that the tokens do not directly distinguish M 0 from M 1. Query hase 2. The adversary continues to adatively request tokens for redicates P q1 +1,..., P q Φ, subject to the two restrictions above. The challenger resonds with the corresonding tokens TK i GenToken(SK, P i ). 4

Guess The adversary returns a guess β {0, 1} of β. We define the advantage of adversary A in attacking E as the quantity QU Adv A = Pr[β = β] 1/2. Definition 2.1. We say that a Φ-searchable system E is secure if for all olynomial time adversaries A attacking E the function QU Adv A is a negligible function of λ. Another examle equality queries. equality redicate, namely P σ (x) = Let Σ be some finite set. For σ Σ let P σ (x) be an { 1 if x = σ, 0 otherwise Let Φ eq = {P σ for all σ Σ}. Then a Φ eq -searchable encrytion suorts equality queries on cihertexts. It is easy to see that a secure Φ eq -searchable encrytion is also an anonymous IBE system [7, 1, 12] an Identity Based Encrytion system where a cihertext reveals no useful information about the identity that was used to create it. This should not be too surrising since it was reviously shown [7, 1] that anonymous IBE is sufficient for equality searches. A Φ eq -searchable encrytion system (Setu, Encryt, GenToken, Query) gives an anonymous IBE as follows: Setu IBE (λ) runs Setu(λ) and oututs IBE arameters PK and master key SK. Encryt IBE (PK, I, M) where I Σ oututs Encryt(PK, I, M). Extract IBE (SK, I) where I Σ oututs TK I GenToken(SK, P I ). Decryt IBE (TK I, C) oututs Query(TK I, C). The correctness roerty ensures that if C is the result of Encryt(PK, I, M) then Query(TK I, C) will outut M since P I (I) = 1. It is not difficult to see that the Φ eq -security game ensures semantic security for both the message and the identity. Hence, the resulting system is an anonymous IBE. By considering larger classes of redicates Φ we obtain more general searching caabilities. The challenge is then to build secure encrytion schemes that are Φ-searchable for the most general Φ ossible. Chosen cihertext security. Definition 2.1 easily extends to address chosen cihertext attacks (CCA), but we do not ursue that here. 2.3 Selective security We will also need a slightly weaker security definition in which the adversary commits to the search strings S 0, S 1 at the beginning of the game. Everything else remains the same. The game roceeds as follows: Setu. The adversary oututs two strings S 0, S 1 Σ. The challenger runs Setu(λ) and gives the adversary PK. Query hase 1. The adversary adatively oututs descritions of redicates P 1, P 2,..., P q1 Φ. The only restriction is that P i (S 0 ) = P i (S 1 ) for all i = 1, 2,..., q 1 (1) The challenger resonds with the corresonding tokens TK i GenToken(SK, P i ). 5

Challenge. The adversary oututs two messages M 0, M 1 M subject to the restriction that: if M 0 M 1 then P i (S 0 ) = P i (S 1 ) = 0 for all i = 1, 2,..., q 1 (2) The challenger flis a coin β {0, 1} and gives C Encryt(PK, S β, M β ) to the adversary. Query hase 2. The adversary continues to adatively request query tokens for redicates P q1 +1,..., P q Φ, subject to the two restrictions (1) and (2). The challenger resonds with the corresonding tokens TK i GenToken(SK, P i ). Guess The adversary returns a guess β {0, 1} of β. The advantage of adversary A in attacking E is the quantity squ Adv A = Pr[β = β] 1/2. Definition 2.2. We say that a Φ-searchable system E is selectively secure if for all olynomial time adversaries A attacking E the function squ Adv A is a negligible functions of λ. 3 The Trivial Construction Let Σ be a finite set of binary strings. We build a Φ-searchable ublic key system E T, for any set of (olynomial time comutable) redicates Φ. We refer to this system as the brute force Φ-searchable system. The brute force system. Let E = (Setu, Encryt, Decryt ) be a ublic-key system. Let Φ = {P 1, P 2,..., P t } The Φ-searchable system E T is defined as follows: Setu(λ) un Setu (λ) t times to obtain Outut PK and SK. Encryt(PK, S, M) For i = 1,..., t define: PK (PK 1,..., PK n ) and SK (SK 1,..., SK n ) C i { Encryt (PK i, M) if P i (S) = 1, Encryt (PK i, ) otherwise. Outut C (C 1,..., C t ). Note that the length of C is linear in n. GenToken(SK, P ) Here P, the descrition of redicate Φ, is the index i of P i in Φ. Outut TK (i, SK i ). Query(TK, C) Let C = (C 1,..., C t ) and TK = (i, SK i ). Outut Decryt (SK i, C i ). The following lemma roves security of this construction. The roof is a straightforward hybrid argument and is given in Aendix A Lemma 3.1. The system E T above is a secure Φ-searchable encrytion system assuming E is a semantically secure ublic key system against chosen laintext attacks. 6

3.1 A third examle conjunctive comarison redicates Suose Σ = {1,..., n} w for some n, w. Let Φ n,w be the set of redicates { 1 if x i a i for all i = 1,..., w, P a1...a w (x 1,..., x w ) = 0 otherwise for all ā = (a 1... a w ) {1,..., n} w. Then Φ n,w = n w. The trivial system in this case roduces cihertexts of length O(n w ). Essentially, the system uses a unary encoding of the w columns and assigns a rivate key to each cell in this n by w matrix. We will construct a much better system in Section 6. 4 Background on airings and comlexity assumtions Our goal is to construct Φ-searchable systems for a large class of redicates Φ that is much better than the trivial construction. To do so we will make use of bilinear mas. 4.1 Bilinear grous of comosite order We review some general notions about bilinear mas and grous, with an emhasis on grous of comosite order. We follow [9] in which comosite order bilinear grous were first introduced. Let G be a an algorithm called a grou generator that takes as inut a security arameter λ Z >0 and oututs a tule (, q, G, G T, e) where, q are two distinct rimes, G and G T are two cyclic grous of order n = q, and e is a function e : G 2 G T satisfying the following roerties: (Bilinear) u, v G, a, b Z, e(u a, v b ) = e(u, v) ab. (Non-degenerate) g G such that e(g, g) has order n in G T. We assume that the grou action in G and G T as well as the bilinear ma e are all comutable in olynomial time in λ. Furthermore, we assume that the descrition of G and G T includes generators of G and G T resectively. To summarize, G oututs the descrition of a grou G of order n = q with an efficiently comutable bilinear ma. We will use the notation G, G q to denote the resective subgrous of order and order q of G. 4.2 The bilinear Diffie-Hellman assumtion First we review the standard Bilinear Diffie-Hellman assumtion, but in grous of comosite order. For a given grou generator G define the following distribution P (λ): (, q, G, G T, e) G(λ), n q, g G, g q G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b g c ) T e(g, g ) abc Outut ( Z, T ) 7

For an algorithm A, define A s advantage in solving the comosite bilinear Diffie-Hellman roblem for G as: cbdh Adv G,A (λ) := Pr[A( Z, T ) = 1] Pr[A( Z, ) = 1] where ( Z, T ) P (λ) and G T. Definition 4.1. We say that G satisfies the comosite bilinear Diffie-Hellman assumtion (cbdh) if for any olynomial time algorithm A we have that cbdh Adv G,A (λ) is a negligible function of λ. 4.3 The comosite 3-arty Diffie-Hellman assumtion Our construction also makes use of a natural assumtion in comosite bilinear grous. For a given grou generator G define the following distribution P (λ): (, q, G, G T, e) G(λ), n q, g G, g q G q 1, 2, 3 G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b g ab 1, g abc ) 2 T g c 3 Outut ( Z, T ) For an algorithm A, define A s advantage in solving the comosite 3-arty Diffie-Hellman roblem for G as: C3DH Adv G,A (λ) := Pr[A( Z, T ) = 1] Pr[A( Z, ) = 1] where ( Z, T ) P (λ) and G. Definition 4.2. We say that G satisfies the comosite 3-arty Diffie-Hellman assumtion (C3DH) if for any olynomial time algorithm A we have that C3DH Adv G,A (λ) is a negligible function of λ. 5 Hidden Vector Encrytion We construct a Φ-searchable encrytion system for a general class of equality redicates. We call such systems Hidden Vector Systems or HVEs for short. We then show in Section 6 that our HVE system leads to comarison and subset queries far more efficient than the trivial system. 5.1 HVE Definition Let Σ be a finite set and let be a secial symbol not in Σ. Define Σ = Σ { }. The star lays the role of a wildcard or don t care value. In our alications we tyically set Σ = {0, 1}. 8

For σ = (σ 1,..., σ l ) Σ l define a redicate P HVE σ P HVE σ (x) = over Σ l as follows. For x = (x 1,..., x l ) Σ l set: { 1 if for all i = 1,..., l : (σ i = x i or σ i = ), 0 otherwise In other words, the vector x matches σ in all the coordinates where σ is not. Let Φ HVE = {Pσ HVE for all σ Σ l }. We refer to l as the width of the HVE. Definition 5.1. A Hidden Vector System (HVE) over Σ l is a selectively secure Φ HVE -searchable encrytion system. The case l = 1 degenerates to the examle discussed in Section 2.2 where we showed equivalence to anonymous IBE [7, 1, 12]. For larger l we obtain a more general concet that is much harder to build. In articular, the wildcard character which is essential for the alications we have in mind makes it challenging to construct a Φ HVE -searchable system. We construct an HVE with the following arameters: CT-size = O(l) and TK-size = O( weight(σ) ) where weight ( σ = (σ 1,..., σ l ) ) is the number of coordinates where σ i. 5.2 Construction For our articular HVE construction we will let Σ = Z m for some integer m. We set Σ = Z m { }. Our HVE system works as follows: Setu(λ) The setu algorithm first chooses random rimes, q > m and creates a bilinear grou G of comosite order n = q, as secified in Section 4.1. Next, it icks random elements (u 1, h 1, w 1 ),..., (u l, h l, w l ) G 3, g, v G, g q G q. and an exonent α Z. It kees all these as the secret key SK. It then chooses 3l + 1 random blinding factors in G q : ( u,1, h,1, w,1 ),..., ( u,l, h,l, w,l ) G q and v G q. For the ublic key, PK, it ublishes the descrition of the grou G and the values U 1 = u 1 u,1, H 1 = h 1 h,1, W 1 = w 1 w,1 g q, V = v v, A = e(g, v) α,. U l = u l u,l, H l = h l h,l, W l = w l w,l The message sace M is set to be a subset of G T. Encryt(PK, I Z l m, M M G T ) Let I = (I 1,..., I l ) Z l m. works as follows: The encrytion algorithm choose a random s Z n and random Z, (Z 1,1, Z 1,2 ),..., (Z l,1, Z l,2 ) G q. (The algorithm icks random elements in G q by raising g q to random exonents from Z n.) 9

Outut the cihertext: C = ( C = MA s, C 0 = V s Z, C 1,1 = (U I 1 1 H 1) s Z 1,1, C 1,2 = W s 1 Z 1,2. C l,1 = (U I l l H l) s Z l,1, C l,2 = Wl sz l,2 ) GenToken(SK, I Σ l ) The key generation algorithm will take as inut the secret key and an l-tule I = (I 1,..., I l ) {Z m { }} l. Let S be the set of all indexes i such that I i. To generate a token for the redicate PI HVE choose random (r i,1, r i,2 ) Z 2 for all i S and outut: ( TK = I, K 0 = g α ) i S (ui i i h i) r i,1 w r i,2 i, i S : K i,1 = v r i,1, K i,2 = v r i,2 Query(TK, C) Using the notation in the descrition of Encryt and GenToken do: First, comute M C / ( e(c 0, K 0 ) / i S e(c i,1, K i,1 ) e(c i,2, K i,2 ) ) (3) If M M outut. Otherwise, outut M. Correctness Before roving security we first show that the system satisfies the correctness roerty defined in Section 2.1. Let (I, M) be a air in Σ l M and let B Σ l. This B defines a redicate P B in Φ HVE. Let (PK, SK) Setu(λ), C Encryt(PK, I, M), and TK GenToken(SK, B ). If P B (I) = 1 then a simle calculation shows that Query(TK, C) = M. If P B (I) = 0 we show that Pr[Query(TK, C) = ] 1 M n where the robability is over the random bits used to create the cihertext. Hence, if M is sufficiently small comared to n then the robability that Pr[Query(TK, C) ] is negligible. Let I = (I 1,..., I l ) Σ and let B = (B 1,..., B l ) Σ l. Let S be the set of all indexes i such that B i is not a wildcard at index i. Since P B (I) = 0 we know that there is some i S such that B i I i. Then the decrytion equation (3) contains a factor e(c 0, K 0 ) / e(c i,1, K i,1 ) e(c i,2, K i,2 ) = e(v, u i ) (B i I i ) sr i,1 which is a uniformly distributed value in (G T ) q and is indeendent of the rest of the equation. Hence equation (3) evaluates to a value indistinguishable from random in G T. It follows that Pr[Query(TK, C) = ] 1 M n as required. Extensions In our descrition above we limited the index sace Σ to be Z m. We can exand this sace to all of {0, 1} by taking a large enough m to contain the range of a collision-resistant hash function. Then Encryt(PK, I ({0, 1} ) l, M G T ) first hashes all the coordinates of I into Z m using the collision reistant hash and the alies the Encryt algorithm described above. 10

5.3 Proof of Security We rove our scheme selectively secure (as defined in Section 2.3) under the comosite 3-arty Diffie-Hellman assumtion and the bilinear Diffie-Hellman assumtion. We give the high-level arguments of the roof in this section and defer the roofs of some lemmas to the aendix. Suose the adversary commits to vectors L 0, L 1 Σ l at the beginning of the game. Let X be the set of indexes i such that L 0,i = L 1,i and X be the set of indexes i such that L 0,i L 1,i. The roof uses a sequence of 2l + 2 games to argue that the adversary cannot win the original security game of Section 2.3 which we denote by G. We begin by slightly modifying the game G into a game G. Games G and G are identical excet for how the challenge cihertext is generated. In G if M 0 M 1 then the adversary multilies the challenge cihertext comonent C by a random element of G T,. The rest of the cihertext is generated as usual. Additionally, if M 0 = M 1 then the challenge cihertext is generated correctly. Lemma 5.2. Assume that the Bilinear Diffie-Hellman assumtion holds. Then for any olynomial time adversary A the difference of advantage of A in game G and game G is negligible. The roof is in Aendix B.1. Next, we define a game G. In this game the adversary will give two challenge messages, M 0, M 1. If M 0 M 1 then the challenger oututs a random element of G T as the C comonent of the challenge cihertext. The rest of cihertext is constructed as normal. If M 0 = M 1 the challenger oututs the challenge cihertext as normal. Lemma 5.3. Assume that the Comosite 3-arty Diffie-Hellman assumtion holds. Then for any olynomial time adversary A the difference of advantage of A in game G and game G is negligible. The roof is in Aendix B.2. Finally, we define two sequences of hybrid games G j and G j for j = 1,..., X. We define the game G j as follows. Let X be j lowest indexes in X. In the challenge cihertext the challenger creates C 0 and C i,1, C i,2 as normal for all i / X. However, for all i X the challenger creates C i,1, C i,2 as comletely random grou elements in G. Additionally, if M 0 M 1 then C is relaced by a comletely random element from G T (otherwise it is created as normal). We define a game G j as follows. Let X be the j lowest indexes in X and let δ be the (j + 1)-th index in X. In the challenge cihertext the challenger creates C 0 and C i,1, C i,2 as normal for all i / X and i δ. For all i X the challenger creates C i,1, C i,2 as comletely random grou elements in G. For i = δ the challenger chooses a random s and creates C δ,1 = (u I δ h ) s g z δ,1 q, C δ,2 = g s g z δ,2 q. Additionally, if M 0 M 1 then C is relaced by a comletely random element from G T (otherwise it is created as normal). Observe that for all i in X the challenge identity contains no information about L β,i. Therefore the adversary s advantage in game G X is 0. Additionally, game G 0 is equivalent to G. We state the following two lemmas whose roofs are given in Aendix B.3 and B.4. Lemma 5.4. Assume the Comosite 3-arty Diffie-Hellman assumtion holds. Then for any olynomial time adversary A the difference of advantage of A in game G j and game G j is negligible. 11

Lemma 5.5. Assume the Comosite 3-arty Diffie-Hellman assumtion holds. Then for any olynomial time adversary A the difference of advantage of A in game G j and game G j+1 is negligible. It now follows that if the Comosite 3-arty Diffie-Hellman and Bilinear Diffie-Hellman assumtions hold then no olynomial-time adversary can break our scheme with non-negligible advantage. This follows from the sequence of hybrid games starting with the original game G: G, G, G 0, G 1, G 1, G 2, G 2,..., G X. The adversary s advantage in the game G X is 0 and the difference in adversary s advantage between any two consecutive hybrid games is negligible by the lemmas above. Hence, no olynomial adversary can win game G with non-negligible advantage. 6 Alications of HVE We show how HVE leads to efficient systems for subset queries and conjunctive comarison queries. Throughout the section we let Σ 01 = {0, 1} and Σ 01 = {0, 1, }. Conjunctive comarison queries. In Section 3.1 we defined conjunctive comarison queries and the redicate family Φ n,w. We use HVE to build a Φ n,w -searchable encrytion system with cihertext size O(nw) and token size O(w). Let (Setu HVE, Encryt HVE, GenToken HVE, Query HVE ) be a secure HVE over Σ nw 01. Thus, the width of this HVE is l = nw. We construct a Φ n,w -searchable system as follows: Setu(λ) is the same as Setu HVE (λ). Encryt(PK, S, M) where S = (x 1,..., x w ) {1,..., n} w. Build a vector σ(s) = (σ i,j ) Σ nw 01 as follows: { 1 if x i j, σ i,j = (4) 0 otherwise Then outut Encryt HVE (PK, σ(s), M) which gives a cihertext of size O(nw). For examle, for w = 2 and S = (x 1, x 2 ) the vector σ(s) looks like: σ(s) = 1 x 1 n 1 x 2 n 0 0 1 1 1 0 0 1 1 1 {0, 1} 2n GenToken(SK, Pā ) where ā = (a 1,..., a w ) {1,..., n} w. Define σ (ā) = (σ i,j ) Σ nw 01 as follows: { 1 if x i = j, σ i,j = (5) otherwise Outut TKā GenToken HVE (SK, σ (ā)) which gives a token of size O(w). For examle, for w = 2 and ā = (x 1, x 2 ) the vector σ (ā) looks like: σ (ā) = 1 x 1 n 1 x 2 n 1 1 {0, 1, } 2n 12

Query(TKā, C) outut Query HVE (TKā, C) To argue correctness and security, observe that for a redicate Pā Φ n,w and an index S {1,..., n} w we have that: Pā(S) = 1 if and only if Pσ HVE (ā) (σ(s)) = 1. Therefore, correctness and security follow from the roerties of the HVE. We thus obtain the following immediate theorem. Theorem 6.1. (Setu, Encryt, GenToken, Query) is a selectively secure Φ n,w -searchable system assuming (Setu HVE, Encryt HVE, GenToken HVE, Query HVE ) is an HVE over Σ n 01 w. Conjunctive range queries. We note that a system that suorts comarison queries can also suort range queries. To search for laintexts where x [a, b] the encrytor encryts the air (x, x). The redicate then tests x a x b. 6.1 Subset queries Next we show how to search for general subset redicates. Let T be a set of size n. For a subset A T we define a subset redicate as follows: { 1 if x A P A (x) = 0 otherwise We wish to suort searches for any subset redicate. More generally, we wish to suort searches for conjunctive subset redicates over T w. That is, let σ = (A 1,..., A w ) be a w-tule where A i T for all i = 1,..., w. Then σ is an elements of (2 T ) w. Define the redicate P σ : T w {0, 1} as follows: ( P σ (x1,..., x w ) ) { 1 if x i A i for all i = 1,..., w, = 0 otherwise Let Φ = { P σ for all σ (2 T ) w }. Note that Φ is huge its size is 2 nw. The Φ-searchable system is as follows: Encryt(PK, S, M) where S = (x 1,..., x w ) T w. Build a vector σ(s) = (σ i,j ) Σ nw 01 as: { 1 if x i j, σ i,j = 0 otherwise (6) Then outut Encryt HVE (PK, σ(s), M). The cihertext size is O(nw) as was the case for comarison queries. GenToken(SK, P α ) where α = (A 1,..., A w ). Define σ (α) = (σ i,j ) Σ nw 01 as follows: { 1 if j A i, σ i,j = otherwise (7) Outut TK α GenToken HVE (SK, σ (α)). tokens for comarison queries. The token size is O(nw) which is bigger than 13

Setu and Query are the same algorithms from the HVE system, as for comarison queries. It is easiest to see how this works in the one dimensional settings, namely w = 1. We encryt a value x T using an HVE vector σ(x) = 1 x n 1 1 0 1 1 {0, 1} n Consider a redicate P A where, for examle, A = {2, 3, n} T. We generate a token for P A by calling GenToken HVE (SK, σ (A)) using the HVE vector σ (A) = 1 2 3 4 5 n 1 1 1 1 {, 1} n The main oint is that x A if and only if Pσ HVE (A) (σ(x)) = 1. Therefore, correctness and security follow from the roerties of the HVE. We obtain a secure system for subset queries for arbitrary subsets. Theorem 6.2. (Setu, Encryt, GenToken, Query) is a selectively secure Φ-searchable system assuming (Setu HVE, Encryt HVE, GenToken HVE, Query HVE ) is an HVE over Σ nw 01. Note that the trivial system of Section 3 for subset queries roduces cihertexts of size O(2 n ). The construction above generates cihertexts of size O(n). 7 Extensions Privacy for search queries. In some cases one may want the token TK P not to identify which redicate P is being queried. For examle, in the anti-sam examle from the introduction, the user may not want to reveal his anti-sam redicate to the server. A similar roblem was studied by Ostrovsky and Skeith [17] and is related to Private Information etrieval [15]. For ublic-key systems suorting comarison queries this is clearly not ossible since, given TK P the server can identify the threshold in P with a simle binary search. It is an oen roblem to convert our system to a symmetric-key system where TK P does not exose P. One aroach is to simly kee the ublic key secret from the server. This, however, is not sufficient in our system. Validating cihertexts. Throughout the aer we assumed that the encrytor is honestly creating cihertexts as secified by the encrytion system. For some alications discussed in the introduction (e.g. sam filtering) this may not be the case. By creating malformed cihertexts an attacker may generate false-ositive or false-negatives for the server using the tokens. Fortunately, in many settings, such as a ayment gateway or sam filter, this is easily avoidable. One technique is to do the following. The authority who has SK will also ublish a regular ublickey PK 1 and ask the encrytor to (i) encryt the message S with both the searchable system and with PK 1. The resulting cihertext is the air C = ( Encryt(PK, S, 0), Encryt PKE (PK 1, S) ). When the authority (e.g. visa) receives a cihertext C = (C 0, C 1 ) it recovers S from C 1. It then uses SK to test that C 0 was in fact created from the message S. If not then the transaction is rejected immediately. Similarly, for anti-sam, if this test fails the email would be immediately rejected as sam. In doing so, the authority ensures that a malformed cihertext did not fool the server. 14

Hence, Φ-searchable systems should also rovide an algorithm Test(C, S, M, SK) that oututs true when C was generated from the message (S, M) and false otherwise. Our HVE system suorts this tye of test. Alternatively, one could require the encrytor to rove that his cihertext is well formed, for examle to rove that C 0 is consistent with C 1. This can be done using non-interactive roof techniques [5, 6], but as mentioned above, often there is no need for this. 8 Conclusion In ublic key systems suorting queries on encryted data a secret key can roduce tokens for testing any suorted query redicate. The token lets anyone test the redicate on a given cihertext without learning any other information about the laintext. We resented a general framework for analyzing security of searching on encryted data systems. We then constructed systems for comarisons and subset queries as well as conjunctive versions of these redicates. The underlying tool behind these new constructions is a rimitive we call HVE. The onedimensional version of HVE (namely l = 1) is essentially an Anonymous IBE system. For large l we obtain a new concet that is extremely useful for a large variety of searching redicates. We note that by setting l = 1 in our HVE construction we obtain a new simle anonymous IBE system secure without random oracles. This work osses many challenging oen roblems. For examle, the best non-conjunctive (i.e. w = 1) comarison system we currently have requires cihertexts of size O( n) where n is the domain size. In rincial it should be ossible to imrove this to O(log n), but this is currently a wide oen roblem that will require new ideas. Similarly, for non-conjunctive subset queries the best we have requires cihertexts of size O(n). Again, can this be imroved to O(log n)? Our results mostly focus on conjunction. Are there similar results for disjunctive queries? More generally, what other classes of redicates can we search on? eferences [1] Michel Abdalla, Mihir Bellare, Dario Catalano, Eike Kiltz, Tadayoshi Kohno, Tanja Lange, John Malone-Lee, Gregory Neven, Pascal Paillier, and Haixia Shi. Searchable encrytion revisited: Consistency roerties, relation to anonymous ibe, and extensions. In CYPTO, ages 205 222, 2005. [2] Mihir Bellare, Alexandra Boldyreva, and Adam O Neill. Efficiently-searchable and deterministic asymmetric encrytion. htt://erint.iacr.org/2006/186, 2006. [3] J. Bethencourt, H. Chan, A. Perrig, E. Shi, and D. Song. Anonymous multi-attribute encrytion with range query and conditional decrytion. Technical reort, C.M.U, 2006. CMU-CS- 06-135. [4] John Bethencourt, Dawn Song, and Brent Waters. New constructions and ractical alications for rivate stream searching. In Proceeding of 2006 IEEE Symosium on Security and Privacy, 2006. [5] Manuel Blum, Paul Feldman, and Silvio Micali. Non-interactive zero-knowledge and its alications (extended abstract). In STOC, ages 103 112, 1988. 15

[6] Manuel Blum, Alfredo De Santis, Silvio Micali, and Giusee Persiano. Noninteractive zeroknowledge. SIAM J. Comut., 20(6):1084 1118, 1991. [7] D. Boneh, G. Di Crescenzo,. Ostrovsky, and G. Persiano. Public key encrytion with keyword search. In Proceedings of Eurocryt 04, 2004. [8] Dan Boneh and Xavier Boyen. Efficient selective-id identity based encrytion without random oracles. In Proceedings of Eurocryt 2004, LNCS, ages 223 238. Sringer-Verlag, 2004. [9] Dan Boneh, Eu-Jin Goh, and Kobbi Nissim. Evaluating 2-dnf formulas on cihertexts. In Joe Kilian, editor, Proceedings of Theory of Crytograhy Conference 2005, volume 3378 of LNCS, ages 325 342. Sringer, 2005. [10] Dan Boneh, Amit Sahai, and Brent Waters. Fully collusion resistant traitor tracing with short cihertexts and rivate keys. In Eurocryt 06, 2006. [11] Dan Boneh and Brent Waters. A fully collusion resistant broadcast trace and revoke system with ublic traceability. htt://cryto.stanford.edu/dabo/ubs.html, 2006. [12] Xavier Boyen and Brent Waters. Anonymous hierarchical identity-based encrytion (without random oracles). htt://erint.iacr.org/2006/085, 2006. [13] O. Goldreich and. Ostrovsky. Software rotection and simulation by oblivious rams. JACM, 1996. [14] Philie Golle, Jessica Staddon, and Brent. Waters. Secure conjunctive keyword search over encryted data. In ACNS, ages 31 45, 2004. [15] Eyal Kushilevitz and afail Ostrovsky. elication is not needed: Single database, comutationally-rivate information retrieval. In FOCS, ages 364 373, 1997. [16]. Ostrovsky. Software rotection and simulation on oblivious AMs. PhD thesis, M.I.T, 1992. Preliminary version in STOC 1990. [17] afail Ostrovsky and William Skeith. Private searching on streaming data. In Proceedings of Cryto 2005, LNCS. Sringer, 2005. [18] D. Song, D. Wagner, and A. Perrig. Practical techniques for searches on encryted data. In Proceedings of the 2000 IEEE symosium on Security and Privacy (S&P 2000), 2000. [19] B. Waters, D. Balfanz, G. Durfee, and D. Smetters. Building an encryted and searchabe audit log. In Proceedings of NDSS 04, 2004. A Proof of Lemma 3.1 We rove that the trivial system resented in Section 3 is secure. Proof. Showing that QU Adv A is a straight forward hybrid argument. Let A be an adversary laying the query security game. For i = 1,..., n + 1 we define exeriment number i as follows: 16

The challenger runs Setu(λ) to obtain PK (PK 1,..., PK n ) and SK (SK 1,..., SK n ) It gives PK to A. Next, A is given the tokens for any redicates of its choice. Then A oututs two airs (S 0, M 0 ) and (S 1, M 1 ) subject to the restrictions of the query security game challenge hase. For j = 1,..., n the challenger constructs the following cihertexts: Encryt (PK j, M 0 ) if P j (S 0 ) = 1 and j i, C j Encryt (PK j, M 1 ) if P j (S 1 ) = 1 and j < i, Encryt (PK j, ) otherwise The challenger gives C (C 1,..., C n ) to A. The adversary continues to adatively request query tokens subject to the restrictions of the query security game. Finally, A oututs a bit β {0, 1} which we denote by EXP (i) QU[A]. A standard argument shows that 2 QU Adv A = EXP (1) QU[A] EXP (n+1) QU [A] n EXP QU[A] (i) EXP (i+1) QU [A] But EXP (i) QU[A] EXP (i+1) QU [A] is clearly negligible assuming E is semantically secure against chosen laintext attacks. i=1 B Proofs for HVE Construction Suose the adversary commits to vectors L 0, L 1 Σ l at the beginning of the game. Let X be the set of indexes i such that L 0,i = L 1,i and X be the set of indexes i such that L 0,i L 1,i. The adversary can issue redicate queries to request a token for any redicate PL HVE where L Σ l. Let S be all the indexes for which L is not a wildcard. We distinguish between three tyes of queries. Tye 1 For all S X =. That is the redicate does not check any of the indexes in which the challenge tules differ. These queries can only be made if in the eventual challenge stage M 0 = M 1. Tye 2 Case 1 is not met and there exists an i S X such that L i L 0,i and L i L 1,i Tye 3 Case 1 and Case 2 are both not met and there exists an i S X such that and L i L 0,i = L 1,i. These cases are mutually exclusive (by definition) and comlete. 17

B.1 Proof of Lemma 5.2 We rove our lemma by suosing that a oly-time adversary A has non-negligible difference ɛ between its advantage in game G and its advantage in game G. We build a simulator that lays the Bilinear Diffie-Hellman game with advantage ɛ. The challenger first creates Bilinear Diffie-Hellman challenge as: (, q, G, G T, e) G(λ), n q, g G, g q G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b g c ) T e(g, g ) abc It then randomly decides whether to give ( Z, T = T ) or ( Z, T = ) where is a random element in G T. We create the following simulation: Init The attacker gives the simulator two identities L 0, L 1. The challenger then flis the coin β internally. Setu The simulator first chooses random ( u,1, h,1, w,1 ),..., ( u,l, h,l, w,l ) G 3 q, v G q and random t 1, x 1, y 1,..., t l, x l, y l Z n. The simulator first ublishes the grou descrition and g q, V = g v. It lets A = e(g, a g). b Finally, for all i it creates: U i = (g b ) t i u,i, H i = (g b ) t il β,i g y i h,i, W i = g x i w,i We observe that the arameters are distributed identically to the real scheme. Query 1 The adversary will make rivate key queries to the simulator. The way they are handled deends uon the tye of query. Suose the adversary queries for a key I, where the set of indexes that are non-wild cards is denoted as S. Tye 1 If the adversary issue a Tye 1 query, the simulator simly aborts and takes a random guess. The reason for this is by our definition if a tye one query is made then the challenge messages M 0, M 1 will be equal. However, in this case the games G and G are identical, so there can be no difference in the adversary s advantage when he makes this tye of a challenge. Therefore, we can just take a random guess. Tye 2 and Tye 3 We handle Tye 2 and Tye 3 queries in the same manner. The rimary intuition is that neither a Tye 2 or Tye 3 query can distinguish the challenge cihertext. Suose the adversary queries for a vector I and let γ be an (arbitrary) index where I γ L β,γ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( i S (g) b r i,1(i i L β,i )t i g r i,1y i g r i,2x i ) 18

Additionally, it creates: i S/{γ} : K i,1 = (g a ) r i,1, K i,2 = (g a ) r i,2 Finally, it creates: K γ,1 = g r γ,1 (g) a 1/(Iγ Lβ,γ) K γ,2 = g r γ,2 The argument for the well-formness of the keys is similar to that of the Boneh-Boyen [8] Identity-Based Encrytion system. Challenge The adversary first gives the simulator messages M 0, M 1. If M 0 = M 1 we can abort the simulation and take a random guess for the reason given above. The simulator chooses random Z G q (Z 1,1, Z 1,2 ),..., (Z l,1, Z l,2 ) G 2 q (this can be done since the simulator has g q ). and oututs the challenge as follows: C = M β T C 0 = g c Z, : C i,1 = (g c ) y i Z i,1, C i,2 = (g c ) x i Z i,2. If T is forms a tule, then the simulator is laying game G, otherwise it is laying game G. Query Phase 2 Same as Query Phase 1. Guess The adversary oututs a guess β. If β = β outut 0 otherwise outut 1. By our assumtion the robability that the adversary guesses β correctly in game G j has a non-negligible ɛ difference from that of it guessing it correctly in game G j. However, it is in game G if and only if the challenger gave the simulator instead of T. Therefore the simulator has advantage ɛ in the Bilinear Diffie-Hellman game. B.2 Proof of Lemma 5.3 We begin by reviewing an assumtion called the Bilinear Subgrou Decision roblem that was introduced by Boneh, Sahai, and Waters [10] and is imlied by the Comosite 3-Party Diffie- Hellman assumtion. For a given grou generator G define the following distribution P (λ): (, q, G, G T, e) G(λ), n q, g G, g q G q Z ((n, G, G T, e), g q, g, ) T G T, For an algorithm A, define A s advantage in solving the Bilinear subgrou assumtion for G as: BSD Adv G,A (λ) := Pr[A( Z, T ) = 1] Pr[A( Z, ) = 1] where ( Z, T ) P (λ) and G. 19

Definition B.1. We say that G satisfies the Bilinear Subgrou Decision assumtion if for any olynomial time algorithm A we have that BSD Adv G,A (λ) is a negligible function of λ. It is easy to see that the Comosite 3-Party Diffie-Hellman assumtion imlies the Bilinear Subgrou Decision assumtion. 2 For simlicity we will use the Decision Subgrou assumtion directly in our roof. We suose that there exist an adversary with non-negligible difference in advantage ɛ between winning the game G and the game G. We build a simulator that takes in a Bilinear Subgrou challenge ( Z, T ). The simulation roceeds as follows. Init The attacker gives the simulator two identities L 0, L 1. The challenger then flis the coin β internally. Setu The simulator setus u the arameters as would the real setu algorithm. All the simulator needs to do this is g and g q from the assumtion. Query Phase 1 The simulator answers queries as the real authority would. One small difference is that the simulator chooses exonents from Z n instead of Z. However, this doesn t change anything since the both the simulator and a real authority will raise element from G to the exonents. Challenge Challenge The adversary first gives the simulator messages M 0, M 1. If M 0 = M 1 then the adversary simly encryts the message to the identity L β. Otherwise, the simulator creates the challenge cihertext of message M β to L β exactly as normal with the excetion that C is multilied by T. If T is forms a tule, then the simulator is laying game G, otherwise it is laying game G. Query Phase 2 Same as Query Phase 1. Guess The adversary oututs a guess β. If β = β outut 0 otherwise outut 1. By our assumtion the robability that the adversary guesses β correctly in game G has a non-negligible ɛ difference from that of it guessing it correctly in game G. However, it is in game G if and only if the challenger gave the simulator instead of T. Therefore the simulator has advantage ɛ in the Bilinear Subgrou Decision game which imlies an advantage of ɛ in the Comosite 3-Party Diffie-Hellman game. 2 One first reverses the labellings of, q in the Comosite 3-Party Diffie-Hellman assumtion. Next, we can use the airing to create an element that will be a random in G T, if and only if we were give a well formed tule. Otherwise the element is random one in G T. 20

B.3 Proof of Lemma 5.4 We rove our lemma by suosing that a oly-time adversary A has non-negligible difference ɛ between its advantage in game G j and its advantage in game G j for some index j. We build a simulator that lays the Comosite 3-Party Diffie-Hellman game with advantage ɛ. The challenger first creates a 3-Party challenge as: (, q, G, G T, e) G(λ), n q, g G, g q G q 1, 2, 3 G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b Γ = g ab 1, Y = g abc ) 2 T g c 3 It then randomly decides whether to give ( Z, T = T ) or ( Z, T = ) where is a random element in G. We create the following simulation: Init The attacker gives the simulator two identities L 0, L 1. The challenger then flis the coin β internally. Setu Let δ be the j + 1-th index in X. The simulator first chooses random ( u,1, h,1, w,1 ),..., ( u,l, h,l, w,l ) G 3 q and random t 1, x 1, y 1,..., t l, x l, y l Z n. The simulator first ublishes the grou descrition and g q, V = Γ. It icks a random α Z n and lets A = e(γ, g ) α. It next creates Finally, for all i δ it creates: U δ = (g b ) t δ u,δ, H δ = (g b ) t δl β,δ g y δ h,δ, W δ = g x δ w,δ U i = (g b ) t i u,i, H i = (g b ) t il β,i Γ y i h,i, W i = Γ x i w,i We observe that the arameters are distributed identically to the real scheme. Query 1 The adversary will make rivate key queries to the simulator. The way they are handled deends uon the tye of query. Suose the adversary queries for a key I, where the set of indexes that are non-wild cards is denoted as S. Tye 1 If δ / S then the simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = g α i S g r i,1(i i L β,i )t i (g) a r i,1y i g r i,2x i Additionally, it creates: i S : K i,1 = (g a ) r i,1, K i,2 = g r i,2 21

Tye 2 Suose δ S, but I δ L 0,δ and I δ L 1,δ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( Additionally, it creates: Finally, it creates: i S/{δ} g r i,1(i i L β,i )t i (g) a r i,1y i g r i,2x i )g r δ,1(i δ L β,δ )t δ x δ i S/{δ} : K i,1 = (g a ) r i,1, K i,2 = g r i,2 K δ,1 = (g) a x δr δ,1 g x δr δ,2, K δ,2 = (g) a y δr δ,1 (g y δ (g) b t δ(i δ L β,δ ) ) r δ,2 The keys are distributed as if the randomness for the δ comonent was: r δ,1 = r δ,1 x δ /b + r δ,2 x δ /(ab) (mod ) r δ,2 = y δ r δ,1 /b (y δ /ab + t δ (I δ L β,δ )/a)r δ,2 (mod ) Since, r δ,1, r δ,2 are indeendent the keys generated from the simulation are identical to that of the real scheme. Tye 3 Suose δ S and I δ = L β,δ, but there exists an (arbitrary) index γ S such that I γ L β,γ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: Additionally, it creates: K 0 = ( i S/{δ} g r i,1(i i L β,i )t 1 (g) a r i,1y i g r i,2x i )g r δ,1y δ y γ i S/({δ} {γ}) : K i,1 = (g a ) r i,1, K i,2 = g r i,2 K δ,1 = g r δ,2x δ (g) b r δ,1(i γ L β,γ )t γ, K δ,2 = g r δ,2y δ K γ,1 = (g) a r γ,1 g y δr δ,1, K γ,2 = g r γ,2 The keys are distributed as if the randomness for the δ, γ comonents was: r δ,1 = r δ,2 x δ /(ab) r δ,1 t γ (I γ L β,γ )/a (mod ) r δ,2 = r δ,2 y δ /(ab) (mod ) r γ,1 = r γ,1 /b + y δ r δ,1 /(ab) (mod ) Since, r δ,1, r δ,2, r γ,1 are indeendent the keys generated from the simulation are identical to that of the real scheme. 22

Challenge The adversary first gives the simulator messages M 0, M 1. Let X j be the first j indexes in X. The simulator chooses random Z G q (Z 1,1, Z 1,2 ),..., (Z l,1, Z l,2 ) G 2 q (this can be done since the simulator has g q ). and oututs the challenge as follows: C 0 = Y Z, C δ,1 = T y δz δ,1, C δ,2 = T x δz δ,2, i s.t. i δ and i / H j : C i,1 = Y y i Z i,1, C i,2 = Y x i Z i,2. For all i H j the simulator chooses random elements in G for C i,1, C i,2. If M 0 = M 1 the simulator creates C as C = e(y, g ) α M 0, otherwise it chooses a random grou element for C. If T is forms a tule, then the simulator is laying game H j, otherwise it is laying game H j. Query Phase 2 Same as Query Phase 1. Guess The adversary oututs a guess β. If β = β outut 0 otherwise outut 1. By our assumtion the robability that the adversary guesses β correctly in game G j has a non-negligible ɛ difference from that of it guessing it correctly in game G j. However, it is in game G j if and only if the challenger gave the simulator instead of T. Therefore the simulator has advantage ɛ in the Comosite 3-Party Diffie-Hellman game. B.4 Proof of Lemma 5.5 We rove our lemma by suosing that a oly-time adversary A has non-negligible difference ɛ between its advantage in game G j and its advantage in game G j+1 for some index j. We build a simulator that lays the Comosite 3-Party Diffie-Hellman game with advantage ɛ. The challenger first creates a 3-Party challenge as: (, q, G, G T, e) G(λ), n q, g G, g q G q 1, 2, 3 G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b Γ = g ab 1, Y = g abc ) 2 T g c 3 It then randomly decides whether to give ( Z, T = T ) or ( Z, T = ) where is a random element in G. We create the following simulation: Init The attacker gives the simulator two identities L 0, L 1. The challenger then flis the coin β internally. Setu Let δ be the j + 1-th index in X. The simulator first chooses random ν Z n and random t 1, x 1, y 1,..., t l, x l, y l. G q, ( u,1, h,1, w,1 ),..., ( u,l, h,l, w,l ) G 3 q, 23

The simulator first ublishes the grou descrition and g q, V = g. It icks a random α Z n and lets A = e(g, g ) α. It next creates It next creates Finally, for all i δ it creates: U δ = (g b ) t δ u,δ, H δ = (g b ) t δl β,δ g y δ h,δ, W δ = Γ w,δ U i = (g b ) t i u,i, H i = (g b ) t il β,i g y i h,i, W i = g x i w,i We observe that the arameters are distributed identically to the real scheme. Query 1 The adversary will make rivate key queries to the simulator. The way they are handled deends uon the tye of query. Suose the adversary queries for a key L, where the set of indexes that are non-wild cards is denoted as S. Tye 1 If δ / S then the simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = i S (g) b r i,1(i i L β,i )t i g r i,1y i g r i,2x i Additionally, it creates: i S : K i,1 = (g ) r i,1, K i,2 = g r i,2 Tye 2 Suose δ S, but I δ L 0,δ and I δ L 1,δ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( i S/{δ} Additionally, it creates: (g b ) r i,1(i i L β,i )t i g r i,1y i g r i,2x i )(g) a r δ,1y δ (g) b r δ,2(i δ L β,δ )t δ g r δ,2y δ i S/{δ} : K i,1 = (g ) r i,1, K i,2 = g r i,2 Finally it creates: K δ,1 = (g) a r δ,1 g r δ,2, K δ,2 = g t δ(i δ L β,δ )r δ,1 The keys are distributed as if the randomness for the δ comonent was: r δ,1 = ar δ,1 + r δ,2 (mod ) r δ,2 = t δ (I δ L β,δ )r δ,1 (mod ) Since, r δ,1, r δ,2 are indeendent the keys generated from the simulation are identical to that of the real scheme. 24

Tye 3 Suose δ S and I δ = L β,δ, but there exists an (arbitrary) index γ S such that I γ L β,γ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( Additionally, it creates: Finally it creates: i S/{δ} (g b ) r i,1(i i L β,i )t i g r i,1y i g r i,2x i )(g) b r δ,2(i δ L β,δ )t δ g r δ,2y δ i S/({γ} {δ} : K i,1 = (g ) r i,1, K i,2 = g r i,2 K δ,1 = g r δ,2 (g) a r γ,1y δ /y δ, K δ,2 = g tγ(iγ L β,γ)r δ,1 K γ,1 = (g ) r γ,1 (g a ) r δ,1, K γ,2 = g r γ,2 The keys are distributed as if the randomness for the δ, γ comonents was: r δ,1 = r δ,2 + ar δ,1 y γ /y γ (mod ) r δ,2 = t γ r δ1 (I γ L β,γ ) (mod ) r γ,1 = r γ,1 + ar δ,1 (mod ) Since, r δ,1, r δ,2, r γ,1 are indeendent the keys generated from the simulation are identical to that of the real scheme. Challenge The adversary first gives the simulator messages M 0, M 1. Let X j be the first j indexes in X. The simulator chooses random Z G q (Z 1,1, Z 1,2 ),..., (Z l,1, Z l,2 ) G 2 q It also chooses random s Z n. It oututs the challenge as follows: C 0 = g s Z, C δ,1 = T y δz δ,1, C δ,2 = Y x δz δ,2, i s.t. i δ and i / H j : C i,1 = g s y i Z i,1, C i,2 = g s x i Z i,2. For all i H j the simulator chooses random elements in G for C i,1, C i,2. If M 0 = M 1 the simulator creates C as C = e(g, g ) s α M 0, otherwise it chooses a random grou element for C. If T is forms a tule, then the simulator is laying game H j, otherwise it is laying game H j+1. Query Phase 2 Same as Query Phase 1. Guess The adversary oututs a guess β. If β = β outut 0 otherwise outut 1. By our assumtion the robability that the adversary guesses β correctly in game G j has a non-negligible ɛ difference from that of it guessing it correctly in game G j+1. However, it is in game G j+1 if and only if the challenger gave the simulator instead of T. Therefore the simulator has advantage ɛ in the Comosite 3-Party Diffie-Hellman game. 25

C Comarison queries with n size cihertext In this section we focus on the comarison searching roblem discussed in Section 3.1 for the secial case w = 1, namely the case considered in Figure 1. We let n denote the domain size. ecall that the trivial system in this case achieves cihertext size O(n) as does the system based on Hidden Vector Encrytion. Here, we briefly describe a construction that achieves cihertext size of n. Boneh, Sahai, and Waters [10] recently described a tracing traitors system where cihertext size is n where n is the number of users in the system. There construction is based on a general rimitive called PLBE (Private Linear Broadcast Encrytion). Boneh and Waters [11] recently generalized the construction to obtain a trace and revoke system with cihertexts having the same size. Their generalization is based on a construction for Augmented Broadcast Encrytion (ABE). Setting the reciient set S to S = {1,..., n} in an ABE system results in a ublic variant of PLBE which we call ublic-plbe. The definition of a ublic-plbe is imlicit in [11]. For comleteness, we give the comlete definition in Aendix D here. The main result in [11] is an ABE system with the following arameters: CT-size = Key-size = PK-size = O( n) This gives a ublic-plbe with similar arameters (by setting S = {1,..., n}). We denote the algorithms in the BW ublic-plbe by (Setu PKLBE, Encryt PKLBE, Decryt PKLBE ). We also note that the PLBE of [10] can be easily extended as in [11] to obtain a ublic-plbe with arameters Key-size = O(1), CT-size = PK-size = O( n) In Section 3.1 we defined the set of comarison redicates Φ n,w. We show that for w = 1, any secure ublic-plbe gives a Φ n,1 -searchable encrytion as follows: Setu(λ) un Setu PKLBE (n, λ) to obtain a ublic key PK and n secret keys (SK 1,..., SK n ). Outut PK and SK := (SK 1,..., SK n ). Encryt(PK, s, M) where s {1,..., n}. Outut C Encryt PKLBE (PK, s, M). GenToken(SK, P ) A redicate P Φ n,1 is a number i {1,..., n}. Outut TK (i, SK i ). Query(TK, C) Let TK = (i, SK i ). un Decryt PKLBE (i, SK i, C). Using a ublic-plbe we thus obtain a Φ n,1 -searchable ublic key encrytion where cihertext size in n. Security follows easily from the roerties of ublic-plbe. Theorem C.1. The Φ n,1 -searchable encrytion system is secure assuming the underlying ublic- PLBE is secure. D Definition of ublic-plbe Boneh and Waters [11] define a rimitive called Augmented Broadcast Encrytion (ABE) which they use to build a trace and revoke system. Setting the reciient set S to S = {1,..., n} in an ABE results in a concet we call ublic-plbe. For comleteness, we include the full definition here. A ublic-plbe is a restricted broadcast system comrising of the following algorithms: 26

Setu PKLBE (N, λ) A robabilistic algorithm that takes as inut N, the number of users in the system, and a security arameter λ. The algorithm runs in olynomial time in λ and oututs a ublic key PK and rivate keys SK 1,..., SK N, where SK u is given to user u. Encryt PKLBE (PK, i, M) Takes as inut a ublic key PK, an integer i satisfying 1 i N+1, and a message M. It oututs a cihertext C. This cihertext is intended for users {i, i+1,..., N}. Decryt PKLBE (j, SK j, C) Takes as inut the rivate key SK j for user j and a cihertext C. The algorithm oututs a message M or. The system must satisfy the following correctness roerty: for all i, j {1,..., N + 1} (where j N), and all messages M: Let (PK, (SK 1,..., SK N )) Setu PKLBE (N, Λ) and C Encryt PKLBE (PK, i, M). If j i then Decryt PKLBE (j, SK j, C) = M. Security. We define security of an PKLBE system using two games. The first game is a message hiding game and says that a cihertext created using index i = N + 1 is unreadable by anyone. The second game is an index hiding game and catures the intuition that a broadcast cihertext created using index i reveals no non-trivial information about i. We will consider all these games for a fixed number of users, N. Game 1. The first game, called the Message Hiding Game says that an adversary cannot break semantic security when encryting using index i = N + 1. The game roceeds as follows: Setu The challenger runs the Setu PKLBE algorithm and gives the adversary PK and all secret keys {SK 1,..., SK N }. Challenge The adversary oututs two equal length messages M 0, M 1. The challenger flis a coin β {0, 1} and sets C Encryt PKLBE (PK, N + 1, M β ). The challenger gives C to the adversary. Guess The adversary returns a guess β {0, 1} of β. We define the advantage of adversary A in winning the game as MH Adv A = Pr[β = β] 1/2. Game 2. The second game, called the Index Hiding Game says that an adversary cannot distinguish between an encrytion to index i and one to index i+1 without the key SK i. The game takes as inut a arameter i {1,..., N} which is given to both the challenger and the adversary. The game roceeds as follows: Setu The challenger runs the Setu PKLBE algorithm and gives the adversary PK and the set of rivate keys { SK j s.t. j i }. Challenge The adversary oututs a message M. The challenger flis a coin β {0, 1} and comutes C Encryt PKLBE (PK, i + β, M). The challenger returns C to the adversary. 27

Guess The adversary returns a guess β {0, 1} of β. We define the advantage of adversary A as the quantity IH Adv A [i] = Pr[β = β] 1/2. In words, the game catures the fact that even if all users other than i collude they cannot distinguish whether i or i + 1 was used to create a cihertext C. With this games we define a secure PKLBE as follows. Definition D.1. We say that an N-user ublic-plbe system is secure if for all olynomial time adversaries A we have that MH Adv A and IH Adv A [i] for i = 1,..., N, are negligible functions of λ. 28