Conjunctive, Subset, and Range Queries on Encrypted Data

Similar documents

Conjunctive, Subset, and Range Queries on Encrypted Data

The Online Freeze-tag Problem

Introduction to NP-Completeness Written and copyright c by Jie Wang 1

SOME PROPERTIES OF EXTENSIONS OF SMALL DEGREE OVER Q. 1. Quadratic Extensions

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks

Point Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)

ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS

1 Gambler s Ruin Problem

SECTION 6: FIBER BUNDLES

A Certification Authority for Elliptic Curve X.509v3 Certificates

New Efficient Searchable Encryption Schemes from Bilinear Pairings

Searchable encryption

Assignment 9; Due Friday, March 17

Local Connectivity Tests to Identify Wormholes in Wireless Networks

FDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES

A Note on Integer Factorization Using Lattices

Public Key Encryption that Allows PIR Queries

A Modified Measure of Covert Network Performance

Evaluating a Web-Based Information System for Managing Master of Science Summer Projects

C-Bus Voltage Calculation

POISSON PROCESSES. Chapter Introduction Arrival processes

The risk of using the Q heterogeneity estimator for software engineering experiments

Stat 134 Fall 2011: Gambler s ruin

Title: Stochastic models of resource allocation for services

Secure synthesis and activation of protocol translation agents

X How to Schedule a Cascade in an Arbitrary Graph

On Multicast Capacity and Delay in Cognitive Radio Mobile Ad-hoc Networks

Identity-Based Encryption from the Weil Pairing

Computational Finance The Martingale Measure and Pricing of Derivatives

Failure Behavior Analysis for Reliable Distributed Embedded Systems

Lectures on the Dirichlet Class Number Formula for Imaginary Quadratic Fields. Tom Weston

An important observation in supply chain management, known as the bullwhip effect,

Monitoring Frequency of Change By Li Qin

Risk in Revenue Management and Dynamic Pricing

The impact of metadata implementation on webpage visibility in search engine results (Part II) q

MESSAGE AUTHENTICATION IN AN IDENTITY-BASED ENCRYPTION SCHEME: 1-KEY-ENCRYPT-THEN-MAC

Public Key Encryption with keyword Search

Efficient Unlinkable Secret Handshakes for Anonymous Communications

Concurrent Program Synthesis Based on Supervisory Control

TRANSCENDENTAL NUMBERS

Risk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7

Comparing Dissimilarity Measures for Symbolic Data Analysis

PRIME NUMBERS AND THE RIEMANN HYPOTHESIS

Multiperiod Portfolio Optimization with General Transaction Costs

Security over Cloud Data through Encryption Standards

A Simple Model of Pricing, Markups and Market. Power Under Demand Fluctuations

Public Key Encryption with keyword Search

A Multivariate Statistical Analysis of Stock Trends. Abstract

Learning Human Behavior from Analyzing Activities in Virtual Environments

Secure Group Oriented Data Access Model with Keyword Search Property in Cloud Computing Environment

Enhanced Security Key Management Scheme for MANETS

A MOST PROBABLE POINT-BASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION

Softmax Model as Generalization upon Logistic Discrimination Suffers from Overfitting

FREQUENCIES OF SUCCESSIVE PAIRS OF PRIME RESIDUES

The Economics of the Cloud: Price Competition and Congestion

Complex Conjugation and Polynomial Factorization

FACTORING BIVARIATE SPARSE (LACUNARY) POLYNOMIALS

Precalculus Prerequisites a.k.a. Chapter 0. August 16, 2013

Automatic Search for Correlated Alarms

Sage Timberline Office

Static and Dynamic Properties of Small-world Connection Topologies Based on Transit-stub Networks

Secure and Efficient Data Retrieval Process based on Hilbert Space Filling Curve

Two-resource stochastic capacity planning employing a Bayesian methodology

Web Application Scalability: A Model-Based Approach

TOWARDS REAL-TIME METADATA FOR SENSOR-BASED NETWORKS AND GEOGRAPHIC DATABASES

Pinhole Optics. OBJECTIVES To study the formation of an image without use of a lens.

Modeling and Simulation of an Incremental Encoder Used in Electrical Drives

Interbank Market and Central Bank Policy

Large firms and heterogeneity: the structure of trade and industry under oligopoly

IMPROVING NAIVE BAYESIAN SPAM FILTERING

Machine Learning with Operational Costs

An inventory control system for spare parts at a refinery: An empirical comparison of different reorder point methods

F inding the optimal, or value-maximizing, capital

Stochastic Derivation of an Integral Equation for Probability Generating Functions

Alpha Channel Estimation in High Resolution Images and Image Sequences

Price Elasticity of Demand MATH 104 and MATH 184 Mark Mac Lean (with assistance from Patrick Chan) 2011W

How To Solve The Prime Prime Prime Root Problem In Algebraic Theory

Time-Cost Trade-Offs in Resource-Constraint Project Scheduling Problems with Overlapping Modes

Security Aspects of. Database Outsourcing. Vahid Khodabakhshi Hadi Halvachi. Dec, 2012

Sage Document Management. User's Guide Version 13.1

Transcription:

Conjunctive, Subset, and ange Queries on Encryted Data Dan Boneh dabo@cs.stanford.edu Brent Waters bwaters@csl.sri.com Abstract We construct ublic-key systems that suort comarison queries (x a) on encryted data as well as more general queries such as subset queries (x S). These systems suort arbitrary conjunctive queries (P 1 P l ) without leaking information on individual conjuncts. In addition, we resent a general framework for constructing and analyzing ublic-key systems suorting queries on encryted data. 1 Introduction Queries on encryted data are easiest to exlain with an examle. Consider a credit card ayment gateway that observes a stream of encryted transactions, say encryted under Visa s ublic key. The gateway needs to flag all transactions satisfying a certain redicate P. Say, all transactions whose value is over $1000. Storing Visa s secret key on the gateway is a bad idea for both security and rivacy concerns. Instead, Visa wishes to give the gateway a token TK P that enables the gateway to identify transactions satisfying P without learning anything else about these transactions. Of course, generating the token TK P will require Visa s secret key. As another examle, consider a mail server that receives a stream of email messages encryted under the reciients ublic key. If the email message satisfies a certain redicate P the mail server should forward the email to the reciient s ager. If the email satisfies some other redicate P the server should just discard the email. Otherwise, the server should lace the email in the reciient s inbox. The reciient does not want to give the mail server the full rivate key. Instead, she wants to give the server two tokens TK P and TK P enabling the server to test for the redicates P and P without learning any other information about the email. Our goal is to build a ublic-key system that suorts a rich set of query redicates. In our ayment gateway examle one can imagine comarison queries such as (value > 1000) or even conjunctions such as (value > 1000) and (TransactionTime > 5m). The gateway should learn no information other than the value of the conjunctive redicate. In case a conjunction P 1 P 2 is false, the gateway should not learn which of the two conjuncts P 1 or P 2 is false. In our second examle involving a mail server one can imagine testing for subset queries such as (sender S) where S is a set of email addresses. Conjunctive queries such as (sender S) and (subject = urgent) also make sense. Perhas in the distant future, when highly comlex queries on encryted data are ossible, one can imagine running an anti-virus/anti-sam redicate on encryted emails. The mail server learns nothing about incoming encryted email other than its sam status. Suorted by NSF and the Packard Foundation. Suorted by NSF and U.S. Army esearch Office under esearch Grant No. W911NF-06-1-0316. 1

Unfortunately, until now, only simle equality queries on encryted data were ossible. Song et al. [20] develoed a mechanism for equality tests on data encryted with a symmetric key system. Boneh et al. [9] constructed equality tests in the ublic-key settings. Our results. We resent a general framework for analyzing and constructing searchable ublickey systems for various families of redicates. We then construct ublic-key systems that suort comarison queries (such as greater-than) and general subset queries. We also suort arbitrary conjunctions. We evaluate our results based on cihertext size and token size. Let T = {1, 2,..., n} and suose we encryt a tule x = (x 1,..., x w ) T w. Say x 1 is a transaction value, x 2 is a card exiration date, and so on. The following table summarizes our results at a high level. Cihertext Token Query Tye Source Size Size Equality query: (x i = a) for any a T [20, 18, 9, 1] O(1) O(1) Comarison query: (x i a) for any a T [11, 12] 1 O( n) O( n) Subset query: (x i A) for any A T This aer O(n) O(n) Equality conjunction: (x 1 = a 1 )... (x w = a w ) This aer O(w) O(w) Comarison conjunction: (x 1 a 1 )... (x w a w ) This aer O(nw) O(w) Subset conjunction: (x 1 A 1 )... (x w A w ) This aer O(nw) O(nw) Here (a 1,..., a w ) is an arbitrary vector that defines a conjunctive equality or a comarison redicate. Similarly, A 1,..., A w are arbitrary subsets of {1,..., n} that define a conjunctive subset query redicate. We emhasize that when a conjunction redicate is false, the system does not leak which of the w conjuncts caused it. Prior to these results the best systems for comarison and subset queries were the trivial bruteforce systems that we discuss in Section 3. For comarison queries these systems generate a cihertext of size O(n w ) and for subset queries they generate a cihertext of size O(2 nw ). Note that even without conjunction, namely for w = 1, our subset query construction generates cihertexts that are exonentially shorter than the best known revious solution (O(n) vs. O(2 n )). The main tool used in these constructions is a new rimitive we call Hidden Vector Encrytion or HVE for short. This rimitive can be viewed as an extreme generalization of Anonymous Identity Based Encrytion (AnonIBE) [9, 1, 13]. We show how HVE imlies all the results in the table. A natural question is to look for ublic key systems that suort larger classes of redicates, such as regular exressions. Ultimately, one would like a ublic-key system that suorts searches for any redicate comutable by a oly-size circuit. Presently, this aears to be a difficult oen roblem. elated work. Equality tests on encryted data were considered in [20, 9]. Equality searches on an encryted audit log were roosed in [21]. Equality tests in the symmetric key settings are closely related to oblivious AM techniques [18, 15]. Equality tests in the ublic key settings are closely related to Anonymous Identity Based Encrytion (AnonIBE) [9, 1, 13]. Conjunctive equality queries were first studied in [16]. Equality searches on streaming data that hide the 1 Both aers [11, 12] focus on traitor tracing, but as we observe in Aendix C, their aroach directly gives a comarison searching system without conjunctions. 2

requested redicate were discussed in [19] and [4]. Efficient equality searches in databases were recently resented in [2]. Bethencourt et al. [3] recently gave a construction for efficient range queries in a weaker security model. That is, when the encryted index falls in the secified range, the search token reveals the index. 2 Definitions We begin by defining a general framework for queries on encryted data. Let Σ be a finite set of binary strings. A redicate P over Σ is a function P : Σ {0, 1}. We say that I Σ satisfies the redicate if P (I) = 1. 2.1 Searchable encrytion Let Φ be a set of redicates over Σ. A Φ-searchable ublic key system comrises of the following algorithms: Setu(λ) A robabilistic algorithm that takes as inut a security arameter and oututs a ublic key PK and secret key SK. Encryt(PK, I, M) Encryts the laintext air (I, M) using the ublic key PK. We view I Σ as the searchable field, called an index, and M M as the data. GenToken(SK, P ) Takes as inut a secret key SK and the descrition of a redicate P Φ. It oututs a token TK P. Query(TK, C) Takes a token TK for some redicate P Φ as inut and a cihertext C. It oututs a message M M or. oughly seaking, if C is an encrytion of (I, M) then the algorithm oututs M when P (I) = 1 and oututs otherwise. The recise requirement is catured in the query correctness roerty below. Correctness. The system must satisfy the following correctness roerty: Query correctness: For all (I, M) Σ M and all redicates P Φ: Let (PK, SK) Setu(λ), C Encryt(PK, I, M), and TK GenToken(SK, P ). If P (I) = 1 then Query(TK, C) = M. If P (I) = 0 then Pr[Query(TK, C) = ] > 1 ɛ(λ) where ɛ(λ) is a negligible function. Suose that given a cihertext C Encryt(PK, I, M) we are only interested in testing whether a redicate P (I) is satisfied. In this case the message sace M can be set to a singleton, say M = {true}. Algorithm Query(TK, C) will return true when P (I) = 1 and otherwise. A larger message sace M is useful if TK is intended to unlock some M M whenever the redicate P (I) = 1. For examle, when the transaction value is over $1000 we may want the ayment gateway to obtain more information about the transaction. Otherwise, the gateway should learn nothing. Notice that a Φ-searchable system does not rovide a Decryt algorithm that uses SK to decryt a cihertext C and oututs (I, M). One can always add this caability by also encryting (I, M) under a standard ublic key system. There is no need for the searchable system to exlicitly rovide this caability. 3

x y z 1 σ 1 σ 2 σ 3 σ 4 n Figure 1: Tokens for σ 1, σ 2, σ 3, σ 4 given to the adversary An examle comarison queries. Before defining security, we first give a motivating examle using comarison queries. Let Σ = {1,..., n} for some integer n. For σ {1,..., n} let P σ be the following comarison redicate: { 1 if x σ, P σ (x) = 0 otherwise Let Φ n = {P 1,..., P n } be the set of all n comarison redicates. Suose the adversary has the tokens for redicates P σ1, P σ2,..., P σw where σ 1 < σ 2 < < σ w. Lets x, y, z be some integers as in Figure 1. Clearly the adversary can distinguish Encryt(PK, x, m) from Encryt(PK, y, m) using the token for the redicate P σ2. However, the adversary should not be able to distinguish Encryt(PK, y, m) from Encryt(PK, z, m). Indeed, searating an encrytion of y from an encrytion of z is information that should not be exosed by the tokens at the adversary s disosal. Our definition of security catures this roerty using the general framework. 2.2 Security We define security of a Φ-searchable system E using a query security game that catures the intuition that tokens TK reveal no unintended information about the laintext. The game gives the adversary a number of tokens and requires that the adversary cannot use these tokens to deduce unintended information. The game roceeds as follows: Setu. The challenger runs Setu(λ) and gives the adversary PK. Query hase 1. The adversary adatively oututs descritions of redicates P 1, P 2,..., P q1 Φ. The challenger resonds with the corresonding tokens TK j GenToken(SK, P j ). We refer to such queries as redicate queries. Challenge. The adversary oututs two airs (I 0, M 0 ) and (I 1, M 1 ) subject to two restrictions: First, P j (I 0 ) = P j (I 1 ) for all j = 1, 2,..., q 1. Second, if M 0 M 1 then P j (I 0 ) = P j (I 1 ) = 0 for all j = 1, 2,..., q 1. The challenger flis a coin β {0, 1} and gives C Encryt(PK, I β, M β ) to the adversary. The two restrictions ensure that the tokens given to the adversary do not trivially break the challenge. The first restriction ensures that tokens given to the adversary do not directly distinguish I 0 from I 1. The second restriction ensures that the tokens do not directly distinguish M 0 from M 1. Query hase 2. The adversary continues to adatively request tokens for redicates P q1 +1,..., P q Φ, subject to the two restrictions above. The challenger resonds with the corresonding tokens TK j GenToken(SK, P j ). 4

Guess The adversary returns a guess β {0, 1} of β. We define the advantage of adversary A in attacking E as the quantity QU Adv A = Pr[β = β] 1/2. Definition 2.1. We say that a Φ-searchable system E is secure if for all olynomial time adversaries A attacking E the function QU Adv A is a negligible function of λ. Another examle equality queries. equality redicate, namely P σ (x) = Let Σ be some finite set. For σ Σ let P σ (x) be an { 1 if x = σ, 0 otherwise Let Φ eq = {P σ for all σ Σ}. Then a Φ eq -searchable encrytion suorts equality queries on cihertexts. It is easy to see that a secure Φ eq -searchable encrytion is also an anonymous IBE system [9, 1, 13] an Identity Based Encrytion system where a cihertext reveals no useful information about the identity that was used to create it. This should not be too surrising since it was reviously shown [9, 1] that anonymous IBE is sufficient for equality searches. A Φ eq -searchable encrytion system (Setu, Encryt, GenToken, Query) gives an anonymous IBE as follows: Setu IBE (λ) runs Setu(λ) and oututs IBE arameters PK and master key SK. Encryt IBE (PK, I, M) where I Σ oututs Encryt(PK, I, M). Extract IBE (SK, I) where I Σ oututs TK I GenToken(SK, P I ). Decryt IBE (TK I, C) oututs Query(TK I, C). The correctness roerty ensures that if C is the result of Encryt(PK, I, M) then Query(TK I, C) will outut M since P I (I) = 1. It is not difficult to see that the Φ eq -security game ensures semantic security for both the message and the identity. Hence, the resulting system is an anonymous IBE. By considering larger classes of redicates Φ we obtain more general searching caabilities. The challenge is then to build secure encrytion schemes that are Φ-searchable for the most general Φ ossible. Chosen cihertext security. Definition 2.1 easily extends to address chosen cihertext attacks (CCA), but we do not ursue that here. 2.3 Selective security We will also need a slightly weaker security definition in which the adversary commits to the search strings I 0, I 1 at the beginning of the game. Everything else remains the same. The game roceeds as follows: Setu. The adversary oututs two strings I 0, I 1 Σ. The challenger runs Setu(λ) and gives the adversary PK. Query hase 1. The adversary adatively oututs descritions of redicates P 1, P 2,..., P q1 Φ. The only restriction is that P j (I 0 ) = P j (I 1 ) for all j = 1, 2,..., q 1 (1) The challenger resonds with the corresonding tokens TK j GenToken(SK, P j ). 5

Challenge. The adversary oututs two messages M 0, M 1 M subject to the restriction that: if M 0 M 1 then P j (I 0 ) = P j (I 1 ) = 0 for all j = 1, 2,..., q 1 (2) The challenger flis a coin β {0, 1} and gives C Encryt(PK, I β, M β ) to the adversary. Query hase 2. The adversary continues to adatively request query tokens for redicates P q1 +1,..., P q Φ, subject to the two restrictions (1) and (2). The challenger resonds with the corresonding tokens TK j GenToken(SK, P j ). Guess The adversary returns a guess β {0, 1} of β. The advantage of adversary A in attacking E is the quantity squ Adv A = Pr[β = β] 1/2. Definition 2.2. We say that a Φ-searchable system E is selectively secure if for all olynomial time adversaries A attacking E the function squ Adv A is a negligible functions of λ. 3 The Trivial Construction Let Σ be a finite set of binary strings. We build a Φ-searchable ublic key system E T, for any set of (olynomial time comutable) redicates Φ. We refer to this system as the brute force Φ-searchable system. The brute force system. Let E = (Setu, Encryt, Decryt ) be a ublic-key system. Let Φ = {P 1, P 2,..., P t } The Φ-searchable system E T is defined as follows: Setu(λ) un Setu (λ) t times to obtain Outut PK and SK. Encryt(PK, I, M) For j = 1,..., t define: PK (PK 1,..., PK t ) and SK (SK 1,..., SK t ) C j { Encryt (PK j, M) if P j (I) = 1, Encryt (PK j, ) otherwise. Outut C (C 1,..., C t ). Note that the length of C is linear in n. GenToken(SK, P ) Here P (the descrition of a redicate P ) is the index j of P in Φ. Outut TK (j, SK j ). Query(TK, C) Let C = (C 1,..., C t ) and TK = (j, SK j ). Outut Decryt (SK j, C j ). The following lemma roves security of this construction. The roof is a straightforward hybrid argument and is given in Aendix A. Lemma 3.1. The system E T above is a secure Φ-searchable encrytion system assuming E is a semantically secure ublic key system against chosen laintext attacks. 6

3.1 A third examle conjunctive comarison redicates Suose Σ = {1,..., n} w for some n, w. Let Φ n,w be the set of n w redicates { 1 if x j a j for all j = 1,..., w, P a1...a w (x 1,..., x w ) = 0 otherwise for all ā = (a 1... a w ) {1,..., n} w. Then Φ n,w = n w. The trivial system in this case roduces cihertexts of length O(n w ). Essentially, the system uses a unary encoding of the w columns and assigns a rivate key to each cell in this n by w matrix. We will construct a much better system in Section 6. 4 Background on airings and comlexity assumtions Our goal is to construct Φ-searchable systems for a large class of redicates Φ that is much better than the trivial construction. To do so we will make use of bilinear mas. 4.1 Bilinear grous of comosite order We review some general notions about bilinear mas and grous, with an emhasis on grous of comosite order. We follow [10] in which comosite order bilinear grous were first introduced. Let G be a an algorithm called a grou generator that takes as inut a security arameter λ Z >0 and oututs a tule (, q, G, G T, e) where, q are two distinct rimes, G and G T are two cyclic grous of order n = q, and e is a function e : G 2 G T satisfying the following roerties: (Bilinear) u, v G, a, b Z, e(u a, v b ) = e(u, v) ab. (Non-degenerate) g G such that e(g, g) has order n in G T. We assume that the grou action in G and G T as well as the bilinear ma e are all comutable in olynomial time in λ. Furthermore, we assume that the descrition of G and G T includes generators of G and G T resectively. To summarize, G oututs the descrition of a grou G of order n = q with an efficiently comutable bilinear ma. We will use the notation G, G q to denote the resective subgrous of order and order q of G and we will use the notation G T,, G T,q to denote the resective subgrous of order and order q of G T. 4.2 The bilinear Diffie-Hellman assumtion First we review the standard Bilinear Diffie-Hellman assumtion, but in grous of comosite order. For a given grou generator G define the following distribution P (λ): (, q, G, G T, e) G(λ), n q, g G, g q G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b g c ) T e(g, g ) abc Outut ( Z, T ) 7

For an algorithm A, define A s advantage in solving the comosite bilinear Diffie-Hellman roblem for G as: cbdh Adv G,A (λ) := Pr[A( Z, T ) = 1] Pr[A( Z, ) = 1] where ( Z, T ) P (λ) and G T,. Definition 4.1. We say that G satisfies the comosite bilinear Diffie-Hellman assumtion (cbdh) if for any olynomial time algorithm A we have that cbdh Adv G,A (λ) is a negligible function of λ. 4.3 The comosite 3-arty Diffie-Hellman assumtion Our construction makes use of an additional assumtion in comosite bilinear grous. For a given grou generator G define the following distribution P (λ): (, q, G, G T, e) G(λ), n q, g G, g q G q 1, 2, 3 G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b g ab 1, g abc ) 2 T g c 3 Outut ( Z, T ) For an algorithm A, define A s advantage in solving the comosite 3-arty Diffie-Hellman roblem for G as: C3DH Adv G,A (λ) := Pr[A( Z, T ) = 1] Pr[A( Z, ) = 1] where ( Z, T ) P (λ) and G. Definition 4.2. We say that G satisfies the comosite 3-arty Diffie-Hellman assumtion (C3DH) if for any olynomial time algorithm A we have that C3DH Adv G,A (λ) is a negligible function of λ. The assumtion is formed around the intuition that it is hard to test for Diffie-Hellman tules in the order subgrou if the elements to be tested have a random order q subgrou comonent. 5 Hidden Vector Encrytion We construct a Φ-searchable encrytion system for a general class of equality redicates. We call such systems Hidden Vector Systems or HVEs for short. We then show in Section 6 that our HVE system leads to comarison and subset queries far more efficient than the trivial system. 8

5.1 HVE Definition Let Σ be a finite set and let be a secial symbol not in Σ. Define Σ = Σ { }. The star lays the role of a wildcard or don t care value. In our subset and range query alications we tyically set Σ = {0, 1}. Note that here we use the symbol Σ differently than how it was used in Section 2.1. For σ = (σ 1,..., σ l ) Σ l define a redicate Pσ HVE over Σ l as follows. For x = (x 1,..., x l ) Σ l set: { 1 if for all i = 1,..., l : (σ i = x i or σ i = ), P HVE σ (x) = 0 otherwise In other words, the vector x matches σ in all the coordinates where σ is not. Let Φ HVE = {Pσ HVE for all σ Σ l }. We refer to l as the width of the HVE. Definition 5.1. A Hidden Vector System (HVE) over Σ l is a selectively secure Φ HVE -searchable encrytion system. The case l = 1 degenerates to the examle discussed in Section 2.2 where we showed equivalence to anonymous IBE [9, 1, 13]. For larger l we obtain a more general concet that is much harder to build. In articular, the wildcard character which is essential for the alications we have in mind makes it challenging to construct a Φ HVE -searchable system. We construct an HVE with the following arameters: CT-size = O(l) and TK-size = O( weight(σ) ) where weight ( σ = (σ 1,..., σ l ) ) is the number of coordinates where σ i. 5.2 Construction For our articular HVE construction we will let Σ = Z m for some integer m. We set Σ = Z m { }. We describe an HVE where the ayload M is in a small subset M of G T, namely M < G T 1/4. This is not a serious restriction since the ayload M is tyically a short symmetric message key. Our HVE system works as follows: Setu(λ) The setu algorithm first chooses random rimes, q > m and creates a bilinear grou G of comosite order n = q, as secified in Section 4.1. Next, it icks random elements (u 1, h 1, w 1 ),..., (u l, h l, w l ) G 3, g, v G, g q G q. and an exonent α Z. It kees all these as the secret key SK. It then chooses 3l + 1 random blinding factors in G q : ( u,1, h,1, w,1 ),..., ( u,l, h,l, w,l ) G q and v G q. For the ublic key, PK, it ublishes the descrition of the grou G and the values U 1 = u 1 u,1, H 1 = h 1 h,1, W 1 = w 1 w,1 g q, V = v v, A = e(g, v) α,. U l = u l u,l, H l = h l h,l, W l = w l w,l The message sace M is set to be a subset of G T of size less than n 1/4. 9

Encryt(PK, I Z l m, M M G T ) Let I = (I 1,..., I l ) Z l m. works as follows: The encrytion algorithm choose a random s Z n and random Z, (Z 1,1, Z 1,2 ),..., (Z l,1, Z l,2 ) G q. (The algorithm icks random elements in G q by raising g q to random exonents from Z n.) Outut the cihertext: C = ( C = MA s, C 0 = V s Z, C 1,1 = (U I 1 1 H 1) s Z 1,1, C 1,2 = W s 1 Z 1,2. C l,1 = (U I l l H l) s Z l,1, C l,2 = Wl sz l,2 GenToken(SK, I Σ l ) The key generation algorithm will take as inut the secret key and an l-tule I = (I 1,..., I l ) {Z m { }} l. Let S be the set of all indexes i such that I i. To generate a token for the redicate PI HVE choose random (r i,1, r i,2 ) Z 2 for all i S and outut: ( TK = I, K 0 = g α ) i S (ui i i h i) r i,1 w r i,2 i, i S : K i,1 = v r i,1, K i,2 = v r i,2 ) Query(TK, C) Using the notation in the descrition of Encryt and GenToken do: First, comute M C / ( e(c 0, K 0 ) / i S e(c i,1, K i,1 ) e(c i,2, K i,2 ) ) (3) If M M outut. Otherwise, outut M. Correctness Before roving security we first show that the system satisfies the correctness roerty defined in Section 2.1. Let (I, M) be a air in Σ l M and let B Σ l. This B defines a redicate P B in Φ HVE. Let (PK, SK) Setu(λ), C Encryt(PK, I, M), and TK GenToken(SK, B ). If P B (I) = 1 then a simle calculation shows that Query(TK, C) = M. This uses in a crucial way the fact that e(h, h q ) = 1 for all h G and h q G q. If P B (I) = 0 the following lemma shows that when the message sace M satisfies M < n 1/4 then Pr[Query(TK, C) ] is negligible. Here the robability is over the random bits used to create the cihertext. Lemma 5.2. With the notation as above, and assuming M < n 1/4, whenever P B (I) = 0 the quantity Pr[Query(TK, C) ] is negligible. The robability is over the random bits used to create the cihertext. Proof. Let I = (I 1,..., I l ) Σ and let B = (B 1,..., B l ) Σ l. Let S be the set of all indexes i such that B i is not a wildcard at index i. Since P B (I) = 0 we know that there is some i S such that B i I i. Then the decrytion equation (3) contains a factor e(c 0, K 0 ) / e(c i,1, K i,1 ) e(c i,2, K i,2 ) = e(v, u i ) (B i I i ) sr i,1 10

which is a uniformly distributed value in G T, and is indeendent of the rest of the equation. Since the message sace is of size n 1/4 and the size of G T, is aroximately n 1/2, the false ositive robability is at most 1/n 1/4, which is negligible in the security arameter as required. We note that in ractice there is no need to use a small message sace M G T to determine if decrytion succeeded. We only use M to simlify the descrition of the system. In ractice, one could do the following. The encrytor first icks a random k G T and derives two uniform and indeendent b-bit symmetric keys (k 0, k 1 ) from k. It encryts the ayload M using a symmetric encrytion system under key k 0 to obtain C 1. Next, it runs our Encryt(PK, I, k) to obtain C. The final cihertext is the tule (C, C 1, k 1 ). Now, our Query algorithm works as follows. It first recovers a k from C using the given token TK. Next, it derives (k 0, k 1 ) from k and oututs if k 1 k 1. Otherwise, it oututs the decrytion of C 1 under k 0 using a symmetric system. Lemma 5.2 shows that the false error robability is now 1/2 b. Alternatively, if the symmetric encrytion system rovides authenticated encrytion, then one could decide if Query roduced the right value based on whether symmetric decrytion succeeded. Extensions In our descrition above we limited the index sace Σ to be Z m. We can exand this sace to all of {0, 1} by taking a large enough m to contain the range of a collision-resistant hash function. Then Encryt(PK, I ({0, 1} ) l, M G T ) first hashes all the coordinates of I into Z m using the collision resistant hash and then alies the Encryt algorithm described above. 5.3 Proof of Security We rove our scheme selectively secure (as defined in Section 2.3) under the comosite 3-arty Diffie-Hellman assumtion and the bilinear Diffie-Hellman assumtion. We give the high-level arguments of the roof in this section and defer the roofs of some lemmas to the aendix. Suose the adversary commits to vectors L 0, L 1 Σ l at the beginning of the game. Let X be the set of indexes i such that L 0,i = L 1,i and X be the set of indexes i such that L 0,i L 1,i. The roof uses a sequence of 2l + 2 games to argue that the adversary cannot win the original security game of Section 2.3 which we denote by G. We begin by slightly modifying the game G into a game G. Games G and G are identical excet for how the challenge cihertext is generated. In G if M 0 M 1 then the adversary multilies the challenge cihertext comonent C by a random element of G T,. The rest of the cihertext is generated as usual. Additionally, if M 0 = M 1 then the challenge cihertext is generated correctly. Lemma 5.3. Assume that the Bilinear Diffie-Hellman assumtion holds. Then for any olynomial time adversary A the difference of advantage of A in game G and game G is negligible. The roof is in Aendix B.1. Next, we define a game G. In this game the adversary will give two challenge messages, M 0, M 1. If M 0 M 1 then the challenger oututs a random element of G T as the C comonent of the challenge cihertext. The rest of cihertext is constructed as normal. If M 0 = M 1 the challenger oututs the challenge cihertext as normal. Lemma 5.4. Assume that the Comosite 3-arty Diffie-Hellman assumtion holds. Then for any olynomial time adversary A the difference of advantage of A in game G and game G is negligible. 11

The roof is in Aendix B.2. Finally, we define two sequences of hybrid games G j and G j for j = 1,..., X. We define the game G j as follows. Let X be a set containing the first j indexes in X. The challenger creates the challenge cihertext comonents C 0 and C i,1, C i,2 as normal for all i / X. However, for all i X the challenger creates C i,1, C i,2 as comletely random grou elements in G. Additionally, if M 0 M 1 then C is relaced by a comletely random element from G T (otherwise it is created as normal). We define a game G j as follows. Let X be a set containing the first j indexes in X and let δ be the (j + 1)-th index in X. In the challenge cihertext the challenger creates C 0 and C i,1, C i,2 as normal for all i / X and i δ. For all i X the challenger creates C i,1, C i,2 as comletely random grou elements in G. Finally, the challenger chooses a random s and creates C δ,1 = (u I δ h ) s g z δ,1 q, C δ,2 = g s g z δ,2 q. Additionally, if M 0 M 1 then C is relaced by a comletely random element from G T (otherwise it is created as normal). Observe that for all i in X the challenge cihertext contains no information about L β,i. Therefore the adversary s advantage in game G X is 0. Additionally, game G 0 is equivalent to G. We state the following two lemmas whose roofs are given in Aendix B.3 and B.4. Lemma 5.5. Assume the Comosite 3-arty Diffie-Hellman assumtion holds. Then for all j and any olynomial time adversary A the difference of advantage of A in game G j and game G j is negligible. Lemma 5.6. Assume the Comosite 3-arty Diffie-Hellman assumtion holds. Then for all j and any olynomial time adversary A the difference of advantage of A in game G j and game G j+1 is negligible. It now follows that if the Comosite 3-arty Diffie-Hellman and Bilinear Diffie-Hellman assumtions hold then no olynomial-time adversary can break our scheme with non-negligible advantage. This follows from the sequence of hybrid games starting with the original game G: G, G, G 0, G 1, G 1, G 2, G 2,..., G X. The adversary s advantage in the game G X is 0 and the difference in adversary s advantage between any two consecutive hybrid games is negligible by the lemmas above. Hence, no olynomial adversary can win game G with non-negligible advantage. 6 Alications of HVE We show how HVE leads to efficient systems for subset queries and conjunctive comarison queries. Throughout the section we let Σ 01 = {0, 1} and Σ 01 = {0, 1, }. Conjunctive comarison queries. In Section 3.1 we defined conjunctive comarison queries and the redicate family Φ n,w. We use HVE to build a Φ n,w -searchable encrytion system with cihertext size O(nw) and token size O(w). Let (Setu HVE, Encryt HVE, GenToken HVE, Query HVE ) be a secure HVE over Σ nw 01. Thus, the width of this HVE is l = nw. We construct a Φ n,w -searchable system as follows: 12

Setu(λ) is the same as Setu HVE (λ). Encryt(PK, I, M) where I = (x 1,..., x w ) {1,..., n} w. Build a vector σ(i) = (σ i,j ) Σ nw 01 as follows: { 1 if j x i, σ i,j = (4) 0 otherwise Then outut Encryt HVE (PK, σ(i), M) which gives a cihertext of size O(nw). For examle, for w = 2 and I = (x 1, x 2 ) the vector σ(i) looks like: σ(s) = 1 x 1 n 1 x 2 n 0 0 1 1 1 0 0 1 1 1 {0, 1} 2n GenToken(SK, Pā ) where ā = (a 1,..., a w ) {1,..., n} w. Define σ (ā) = (σ i,j ) Σ nw 01 as follows: { 1 if x i = j, σ i,j = (5) otherwise Outut TKā GenToken HVE (SK, σ (ā)) which gives a token of size O(w). For examle, for w = 2 and ā = (x 1, x 2 ) the vector σ (ā) looks like: σ (ā) = 1 x 1 n 1 x 2 n 1 1 {0, 1, } 2n Query(TKā, C) outut Query HVE (TKā, C) To argue correctness and security, observe that for a redicate Pā Φ n,w and an index I {1,..., n} w we have that: Pā(I) = 1 if and only if Pσ HVE (ā) (σ(i)) = 1. Therefore, correctness and security follow from the roerties of the HVE. We thus obtain the following immediate theorem. Theorem 6.1. (Setu, Encryt, GenToken, Query) is a selectively secure Φ n,w -searchable system assuming (Setu HVE, Encryt HVE, GenToken HVE, Query HVE ) is an HVE over Σ nw 01. Conjunctive range queries. We note that a system that suorts comarison queries can also suort range queries. To search for laintexts where x [a, b] the encrytor encryts the air (x, x). The redicate then tests x a x b. 6.1 Subset queries Next, we show how to search for general subset redicates. Let T be a set of size n. For a subset A T we define a subset redicate as follows: { 1 if x A P A (x) = 0 otherwise We wish to suort searches for any subset redicate. More generally, we wish to suort searches for conjunctive subset redicates over T w. That is, let σ = (A 1,..., A w ) be a w-tule where A i T 13

for all i = 1,..., w. Then σ is an elements of (2 T ) w. Define the redicate P σ : T w {0, 1} as follows: ( P σ (x1,..., x w ) ) { 1 if x i A i for all i = 1,..., w, = 0 otherwise Let Φ = { P σ for all σ (2 T ) w }. Note that Φ is huge its size is 2 nw. The Φ-searchable system is as follows: Encryt(PK, I, M) where I = (x 1,..., x w ) T w. Build a vector σ(s) = (σ i,j ) Σ nw 01 as: { 1 if x i = j, σ i,j = 0 otherwise (6) Then outut Encryt HVE (PK, σ(i), M). The cihertext size is O(nw) as was the case for comarison queries. GenToken(SK, P α ) where α = (A 1,..., A w ). Define σ (α) = (σ i,j ) Σ nw 01 as follows: { 0 if j A i, σ i,j = otherwise (7) Outut TK α GenToken HVE (SK, σ (α)). The token size is O(nw), which is bigger than tokens for comarison queries. Setu and Query are the same algorithms from the HVE system, as for comarison queries. It is easiest to see how this works in the one dimensional setting, namely w = 1. We encryt a value x T using an HVE vector σ(x) = 1 x n 0 0 1 0 0 {0, 1} n Consider a redicate P A where, for examle, A = {2, 3, n} T. We generate a token for P A by calling GenToken HVE (SK, σ (A)) using the HVE vector σ (A) = 1 2 3 4 5 n 0 0 0 0 {, 1} n The main oint is that x A if and only if Pσ HVE (A) (σ(x)) = 1. Therefore, correctness and security follow from the roerties of the HVE. We obtain a secure system for subset queries for arbitrary subsets. Theorem 6.2. (Setu, Encryt, GenToken, Query) is a selectively secure Φ-searchable system assuming (Setu HVE, Encryt HVE, GenToken HVE, Query HVE ) is an HVE over Σ nw 01. Note that the trivial system of Section 3 for subset queries roduces cihertexts of size O(2 n ). The construction above generates cihertexts of size O(n). 14

Subset queries on large domains using Bloom filters. So far we considered subset queries over a domain of size n. In Section 1 we resented examles where one wishes to test a subset relation over a large domain. For examle, we discussed email filtering queries of tye (sender S) where S is a set of email addresses. To use our construction one would first hash email addresses to a set {1,..., n} for some n, using a ublicly known hash function, and then use the HVE for small domain. Unfortunately, by hashing into a small domain there is some chance for false ositives, namely Query may outut M even though (sender S). False ositives result from hash collisions. The false ositive robability can be reduced by a standard alication of Bloom filters [5]. Instead of using one hash function, we use multile functions H 1,..., H d : {0, 1} T. Again, consider the one-dimensional case, namely w = 1. To encryt a word W {0, 1} the encrytor creates a vector σ(w ) {0, 1} n that contains a 1 at ositions H 1 (W ),..., H d (W ) and 0 everywhere else. The encrytor then runs Encryt(PK, σ(w ), M). To generate a token for a set A = {W 1,..., W s } the GenToken algorithm builds a vector σ (A) {0, } n that contains at ositions H i (W j ), for all i = 1,..., d and j = 1,..., s, and contains 0 everywhere else. By choosing n and d aroriately, the false ositive robability can be made arbitrarily small. We note that false ositives in basic hashing or in the Bloom filter can lead to a rivacy comromise in the underlying HVE. Consequently, it is necessary to ensure that the adversary cannot find a false ositive examle. That is, the adversary should not be able to find a set S and an element x S for which the Bloom filter indicates x S. In the random oracle, where one models the Bloom filter hash functions as random oracles, this can be arranged by choosing the Bloom filter arameters n and d so that the false ositive robability is negligible (concretely, say, less then 2 80 ). Without random oracles one can first hash a given element x using a collision resistant hash function and ma the resulting digest to a subset I x of {1,..., n} where I x is a member of a cover free set system [14]. Another subset query alication. In our subset query alication we identified a cihertext with an element x and a user s token with a set A. This allowed us to test whether x A. We observe that we can easily aly HVE to achieve the oosite semantics where a user s key is associated with an element x and the cihertext with a set A. This could be used by a gateway to test if a articular user was one of the (ossibly) many receivers of an email. We exect there to be several other alications that one can build with HVE. 7 Extensions Privacy for search queries. In some cases one may want the token TK P not to identify which redicate P is being queried. For examle, in the anti-sam examle from the introduction, the user may not want to reveal his anti-sam redicate to the server. A similar roblem was studied by Ostrovsky and Skeith [19] and is related to Private Information etrieval [17]. For ublic-key systems suorting comarison queries this is clearly not ossible since, given TK P the server can identify the threshold in P with a simle binary search. It is an oen roblem to convert our system to a symmetric-key system where TK P does not exose P. One aroach is to simly kee the ublic key secret from the server; however, this is not sufficient in our system. 15

Validating cihertexts. Throughout the aer we assumed that the encrytor is honestly creating cihertexts as secified by the encrytion system. For some alications discussed in the introduction (e.g. sam filtering) this may not be the case. By creating malformed cihertexts an attacker may generate false-ositive or false-negatives for the server using the tokens. Fortunately, in some settings including a ayment gateway or sam filter, this is easily avoidable. Briefly, one technique is as follows. The reciient who has SK will also ublish a regular ublic-key PK 1 and ask the encrytor to encryt the laintext (I, M) with both the searchable system and with PK 1. The resulting cihertext is the air C = ( Encryt(PK, I, M), Encryt PKE (PK 1, (I, M)) ). When the reciient receives a cihertext C = (C 0, C 1 ) it recovers (I, M) from C 1 and uses SK to test that C 0 is a valid encrytion of (I, M). If not then the cihertext is immediately rejected. In doing so, the reciient automatically dros invalid cihertexts. More recisely, a Φ-searchable system could rovide an algorithm Test(C, I, M, SK) that oututs true when C is a valid encrytion of (I, M) and false otherwise. Our HVE system suorts this tye of test. Alternatively, one could require the encrytor to rove that his cihertext is well formed, for examle to rove that C 0 is consistent with C 1. This can be done using non-interactive roof techniques [6, 7]. 8 Conclusion In ublic key systems suorting queries on encryted data a secret key can roduce tokens for testing any suorted query redicate. The token lets anyone test the redicate on a given cihertext without learning any other information about the laintext. We resented a general framework for analyzing security of searching on encryted data systems. We then constructed systems for comarisons and subset queries as well as conjunctive versions of these redicates. The underlying tool behind these new constructions is a rimitive we call HVE. The onedimensional version of HVE (namely l = 1) is essentially an Anonymous IBE system. For large l we obtain a new concet that is extremely useful for a large variety of searching redicates. We note that by setting l = 1 in our HVE construction we obtain a new simle anonymous IBE system secure without random oracles. This work osses many challenging oen roblems. For examle, the best non-conjunctive (i.e. w = 1) comarison system we currently have requires cihertexts of size O( n) where n is the domain size. In rincial it should be ossible to imrove this to O(log n), but this is currently a wide oen roblem that will require new ideas. Similarly, for non-conjunctive subset queries the best we have requires cihertexts of size O(n). Again, can this be imroved to O(log n)? Our results mostly focus on conjunction. Are there similar results for disjunctive queries? More generally, what other classes of redicates can we search on? Acknowledgments We thank Amit Sahai and Alice Silverberg for helful comments about this work. eferences [1] Michel Abdalla, Mihir Bellare, Dario Catalano, Eike Kiltz, Tadayoshi Kohno, Tanja Lange, John Malone-Lee, Gregory Neven, Pascal Paillier, and Haixia Shi. Searchable encrytion 16

revisited: Consistency roerties, relation to anonymous ibe, and extensions. In CYPTO, ages 205 222, 2005. [2] Mihir Bellare, Alexandra Boldyreva, and Adam O Neill. Efficiently-searchable and deterministic asymmetric encrytion. htt://erint.iacr.org/2006/186, 2006. [3] J. Bethencourt, H. Chan, A. Perrig, E. Shi, and D. Song. Anonymous multi-attribute encrytion with range query and conditional decrytion. Technical reort, C.M.U, 2006. CMU-CS- 06-135. [4] John Bethencourt, Dawn Song, and Brent Waters. New constructions and ractical alications for rivate stream searching. In Proceeding of 2006 IEEE Symosium on Security and Privacy, 2006. [5] Burton H. Bloom. Sace/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13:422 426, 1970. [6] Manuel Blum, Paul Feldman, and Silvio Micali. Non-interactive zero-knowledge and its alications (extended abstract). In STOC, ages 103 112, 1988. [7] Manuel Blum, Alfredo De Santis, Silvio Micali, and Giusee Persiano. Noninteractive zeroknowledge. SIAM J. Comut., 20(6):1084 1118, 1991. [8] Dan Boneh and Xavier Boyen. Efficient selective-id identity based encrytion without random oracles. In Proceedings of Eurocryt 2004, LNCS, ages 223 238. Sringer-Verlag, 2004. [9] Dan Boneh, Giovanni Di Crescenzo, afial Ostrovsky, and Giusee Persiano. Public key encrytion with keyword search. In Proceedings of Eurocryt 04, 2004. [10] Dan Boneh, Eu-Jin Goh, and Kobbi Nissim. Evaluating 2-dnf formulas on cihertexts. In Joe Kilian, editor, Proceedings of Theory of Crytograhy Conference 2005, volume 3378 of LNCS, ages 325 342. Sringer, 2005. [11] Dan Boneh, Amit Sahai, and Brent Waters. Fully collusion resistant traitor tracing with short cihertexts and rivate keys. In Eurocryt 06, 2006. [12] Dan Boneh and Brent Waters. A fully collusion resistant broadcast trace and revoke system with ublic traceability. In ACM Conference on Comuter and Communication Security (CCS), 2006. [13] Xavier Boyen and Brent Waters. Anonymous hierarchical identity-based encrytion (without random oracles). In Cryto 06, 2006. [14] P. Erdös, P. Frankl, and Z. Füredi. Families of finite sets in which no set is covered by the union of r others. Israel J. of math., 51:79 89, 1985. [15] O. Goldreich and. Ostrovsky. Software rotection and simulation by oblivious rams. JACM, 1996. [16] Philie Golle, Jessica Staddon, and Brent. Waters. Secure conjunctive keyword search over encryted data. In ACNS, ages 31 45, 2004. 17

[17] Eyal Kushilevitz and afail Ostrovsky. elication is not needed: Single database, comutationally-rivate information retrieval. In FOCS, ages 364 373, 1997. [18] afail Ostrovsky. Software rotection and simulation on oblivious AMs. PhD thesis, M.I.T, 1992. Preliminary version in STOC 1990. [19] afail Ostrovsky and William Skeith. Private searching on streaming data. In Proceedings of Cryto 2005, LNCS. Sringer, 2005. [20] Dawn Song, David Wagner, and Adrian Perrig. Practical techniques for searches on encryted data. In Proceedings of the 2000 IEEE symosium on Security and Privacy (S&P 2000), 2000. [21] Brent Waters, Dirk Balfanz, Glenn Durfee, and Dianna Smetters. Building an encryted and searchabe audit log. In Proceedings of NDSS 04, 2004. A Proof of Lemma 3.1 We rove that the trivial system resented in Section 3 is secure. Proof. Showing that QU Adv A is negligible is a straight forward hybrid argument. Let A be an adversary laying the query security game. For i = 1,..., n + 1 we define exeriment number i as follows: The challenger runs Setu(λ) to obtain PK (PK 1,..., PK n ) and SK (SK 1,..., SK n ) It gives PK to A. Next, A is given the tokens for any redicates of its choice. Then A oututs two airs (I 0, M 0 ) and (I 1, M 1 ) subject to the restrictions of the query security game challenge hase. For j = 1,..., n the challenger constructs the following cihertexts: Encryt (PK j, M 0 ) if P j (I 0 ) = 1 and j i, C j Encryt (PK j, M 1 ) if P j (I 1 ) = 1 and j < i, Encryt (PK j, ) otherwise The challenger gives C (C 1,..., C n ) to A. The adversary continues to adatively request query tokens subject to the restrictions of the query security game. Finally, A oututs a bit β {0, 1}. We let EXP (i) QU[A] denote the robability that β equals 1. This comletes the descrition of exeriment i. A standard argument shows that 2 QU Adv A = EXP (1) QU[A] EXP (n+1) QU [A] n EXP (i) QU[A] EXP (i+1) QU [A] But EXP (i) QU[A] EXP (i+1) QU [A] is clearly negligible assuming E is semantically secure against chosen laintext attacks. i=1 18

B Proofs for HVE Construction The adversary commits to vectors L 0, L 1 Σ l at the beginning of the game. Let X be the set of indexes i such that L 0,i = L 1,i and X be the set of indexes i such that L 0,i L 1,i. The adversary can issue redicate queries to request a token for any redicate PL HVE where L Σ l. Let S be all the indexes for which L is not a wildcard. We distinguish between three tyes of queries. Tye 1 For all S X =. That is the redicate does not check any of the indexes in which the challenge tules differ. These queries can only be made if in the eventual challenge stage M 0 = M 1. Tye 2 Case 1 is not met and there exists an i S X such that L i L 0,i and L i L 1,i Tye 3 Case 1 and Case 2 are both not met and there exists an i S X such that and L i L 0,i = L 1,i. These cases are mutually exclusive (by definition) and comlete. B.1 Proof of Lemma 5.3 We rove our lemma by suosing that a oly-time adversary A has non-negligible difference ɛ between its advantage in game G and its advantage in game G. We build a simulator that lays the Bilinear Diffie-Hellman game with advantage ɛ. The challenger first creates Bilinear Diffie-Hellman challenge as: (, q, G, G T, e) G(λ), n q, g G, g q G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b g c ) T e(g, g ) abc It then randomly decides whether to give ( Z, T = T ) or ( Z, T = ) where is a random element in G T. We create the following simulation: Init The attacker gives the simulator two identities L 0, L 1. The challenger then flis the coin β internally. Setu The simulator first chooses random ( u,1, h,1, w,1 ),..., ( u,l, h,l, w,l ) G 3 q, v G q and random t 1, x 1, y 1,..., t l, x l, y l Z n. The simulator first ublishes the grou descrition and g q, V = g v. It lets A = e(g, a g). b Finally, for all i it creates: U i = (g b ) t i u,i, H i = (g b ) t il β,i g y i h,i, W i = g x i w,i We observe that the arameters are distributed identically to the real scheme. 19

Query 1 The adversary will make rivate key queries to the simulator. The way they are handled deends uon the tye of query. Suose the adversary queries for a key I, where the set of indexes that are non-wild cards is denoted as S. Tye 1 If the adversary issue a Tye 1 query, the simulator simly aborts and takes a random guess. The reason for this is by our definition if a tye one query is made then the challenge messages M 0, M 1 will be equal. However, in this case the games G and G are identical, so there can be no difference in the adversary s advantage when he makes this tye of a challenge. Therefore, we can just take a random guess. Tye 2 and Tye 3 We handle Tye 2 and Tye 3 queries in the same manner. The rimary intuition is that neither a Tye 2 or Tye 3 query can be used to distinguish the challenge cihertext. Suose the adversary queries for a vector I and let γ be an (arbitrary) index where I γ L β,γ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( i S (g) b r i,1(i i L β,i )t i g r i,1y i g r i,2x i ) Additionally, it creates: i S/{γ} : K i,1 = (g a ) r i,1, K i,2 = (g a ) r i,2 Finally, it creates: K γ,1 = g r γ,1 (g) a 1/(Iγ Lβ,γ) K γ,2 = g r γ,2 The argument for the well-formness of the keys is similar to that of the Boneh-Boyen [8] Identity-Based Encrytion system. Challenge The adversary first gives the simulator messages M 0, M 1. If M 0 = M 1 we can abort the simulation and take a random guess for the reason given above. The simulator chooses random Z G q (Z 1,1, Z 1,2 ),..., (Z l,1, Z l,2 ) G 2 q (this can be done since the simulator has g q ). and oututs the challenge as follows: C = M β T C 0 = g c Z, : C i,1 = (g c ) y i Z i,1, C i,2 = (g c ) x i Z i,2. If T is forms a tule, then the simulator is laying game G, otherwise it is laying game G. Query Phase 2 Same as Query Phase 1. Guess The adversary oututs a guess β. If β = β outut 0 otherwise outut 1. By our assumtion the robability that the adversary guesses β correctly in game G j has a non-negligible ɛ difference from that of it guessing it correctly in game G j. However, it is in game G if and only if the challenger gave the simulator instead of T. Therefore the simulator has advantage ɛ in the Bilinear Diffie-Hellman game. 20

B.2 Proof of Lemma 5.4 We begin by reviewing an assumtion called the Bilinear Subgrou Decision roblem that was introduced by Boneh, Sahai, and Waters [11] and is imlied by the Comosite 3-Party Diffie- Hellman assumtion. For a given grou generator G define the following distribution P (λ): (, q, G, G T, e) G(λ), n q, g G, g q G q Z ((n, G, G T, e), g q, g, ) T G T, For an algorithm A, define A s advantage in solving the Bilinear subgrou assumtion for G as: BSD Adv G,A (λ) := Pr[A( Z, T ) = 1] Pr[A( Z, ) = 1] where ( Z, T ) P (λ) and G. Definition B.1. We say that G satisfies the Bilinear Subgrou Decision assumtion if for any olynomial time algorithm A we have that BSD Adv G,A (λ) is a negligible function of λ. It is easy to see that the Comosite 3-Party Diffie-Hellman assumtion imlies the Bilinear Subgrou Decision assumtion. 2 For simlicity we will use the Decision Subgrou assumtion directly in our roof. We suose that there exist an adversary with non-negligible difference in advantage ɛ between winning the game G and the game G. We build a simulator that takes in a Bilinear Subgrou challenge ( Z, T ). The simulation roceeds as follows. Init The attacker gives the simulator two identities L 0, L 1. The challenger then flis the coin β internally. Setu The simulator setus u the arameters as would the real setu algorithm. All the simulator needs to do this is g and g q from the assumtion. Query Phase 1 The simulator answers queries as the real authority would. One small difference is that the simulator chooses exonents from Z n instead of Z. However, this doesn t change anything since the both the simulator and a real authority will raise element from G to the exonents. Challenge The adversary first gives the simulator messages M 0, M 1. If M 0 = M 1 then the adversary simly encryts the message to the identity L β. Otherwise, the simulator creates the challenge cihertext of message M β to L β exactly as normal with the excetion that C is multilied by T. If T is forms a tule, then the simulator is laying game G, otherwise it is laying game G. 2 One first reverses the labellings of, q in the Comosite 3-Party Diffie-Hellman assumtion. Next, we can use the airing to create an element that will be a random in G T, if and only if we were give a well formed tule. Otherwise the element is random one in G T. 21

Query Phase 2 Same as Query Phase 1. Guess The adversary oututs a guess β. If β = β outut 0 otherwise outut 1. By our assumtion the robability that the adversary guesses β correctly in game G has a non-negligible ɛ difference from that of it guessing it correctly in game G. However, it is in game G if and only if the challenger gave the simulator instead of T. Therefore the simulator has advantage ɛ in the Bilinear Subgrou Decision game which imlies an advantage of ɛ in the Comosite 3-Party Diffie-Hellman game. B.3 Proof of Lemma 5.5 We rove our lemma by suosing that a oly-time adversary A has non-negligible difference ɛ between its advantage in game G j and its advantage in game G j for some index j. We build a simulator that lays the Comosite 3-Party Diffie-Hellman game with advantage ɛ. The challenger first creates a 3-Party challenge as: (, q, G, G T, e) G(λ), n q, g G, g q G q 1, 2, 3 G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b Γ = g ab 1, Y = g abc ) 2 T g c 3 It then randomly decides whether to give ( Z, T = T ) or ( Z, T = ) where is a random element in G. We create the following simulation: Init The attacker gives the simulator two identities L 0, L 1. The challenger then flis the coin β internally. Setu Let δ be the j + 1-th index in X. The simulator first chooses random ( u,1, h,1, w,1 ),..., ( u,l, h,l, w,l ) G 3 q and random t 1, x 1, y 1,..., t l, x l, y l Z n. The simulator first ublishes the grou descrition and g q, V = Γ. It icks a random α Z n and lets A = e(γ, g ) α. It next creates Finally, for all i δ it creates: U δ = (g b ) t δ u,δ, H δ = (g b ) t δl β,δ g y δ h,δ, W δ = g x δ w,δ U i = (g b ) t i u,i, H i = (g b ) t il β,i Γ y i h,i, W i = Γ x i w,i We observe that the arameters are distributed identically to the real scheme. 22

Query 1 The adversary will make rivate key queries to the simulator. The way they are handled deends uon the tye of query. Suose the adversary queries for a key I, where the set of indexes that are non-wild cards is denoted as S. Tye 1 If δ / S then the simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = g α g r i,1(i i L β,i )t i (g) a r i,1y i g r i,2x i Additionally, it creates: i S i S : K i,1 = (g a ) r i,1, K i,2 = g r i,2 Tye 2 Suose δ S, but I δ L 0,δ and I δ L 1,δ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( Additionally, it creates: Finally, it creates: i S/{δ} g r i,1(i i L β,i )t i (g) a r i,1y i g r i,2x i )g r δ,1(i δ L β,δ )t δ x δ i S/{δ} : K i,1 = (g a ) r i,1, K i,2 = g r i,2 K δ,1 = (g) a x δr δ,1 g x δr δ,2, K δ,2 = (g) a y δr δ,1 (g y δ (g) b t δ(i δ L β,δ ) ) r δ,2 The keys are distributed as if the randomness for the δ comonent was: r δ,1 = r δ,1 x δ /b + r δ,2 x δ /(ab) (mod ) r δ,2 = y δ r δ,1 /b (y δ /ab + t δ (I δ L β,δ )/a)r δ,2 (mod ) Since, r δ,1, r δ,2 are indeendent the keys generated from the simulation are identical to that of the real scheme. Tye 3 Suose δ S and I δ = L β,δ, but there exists an (arbitrary) index γ S such that I γ L β,γ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( g r i,1(i i L β,i )t 1 (g) a r i,1y i g r i,2x i )g r δ,1y δ y γ Additionally, it creates: i S/{δ} i S/({δ} {γ}) : K i,1 = (g a ) r i,1, K i,2 = g r i,2 K δ,1 = g r δ,2x δ (g) b r δ,1(i γ L β,γ )t γ, K δ,2 = g r δ,2y δ K γ,1 = (g) a r γ,1 g y δr δ,1, K γ,2 = g r γ,2 23

The keys are distributed as if the randomness for the δ, γ comonents was: r δ,1 = r δ,2 x δ /(ab) r δ,1 t γ (I γ L β,γ )/a (mod ) r δ,2 = r δ,2 y δ /(ab) (mod ) r γ,1 = r γ,1 /b + y δ r δ,1 /(ab) (mod ) Since, r δ,1, r δ,2, r γ,1 are indeendent the keys generated from the simulation are identical to that of the real scheme. Challenge The adversary first gives the simulator messages M 0, M 1. Let X j be the first j indexes in X. The simulator chooses random Z G q (Z 1,1, Z 1,2 ),..., (Z l,1, Z l,2 ) G 2 q (this can be done since the simulator has g q ). and oututs the challenge as follows: C 0 = Y Z, C δ,1 = T y δz δ,1, C δ,2 = T x δz δ,2, i s.t. i δ and i / H j : C i,1 = Y y i Z i,1, C i,2 = Y x i Z i,2. For all i H j the simulator chooses random elements in G for C i,1, C i,2. If M 0 = M 1 the simulator creates C as C = e(y, g ) α M 0, otherwise it chooses a random grou element for C. If T is forms a tule, then the simulator is laying game H j, otherwise it is laying game H j. Query Phase 2 Same as Query Phase 1. Guess The adversary oututs a guess β. If β = β outut 0 otherwise outut 1. By our assumtion the robability that the adversary guesses β correctly in game G j has a non-negligible ɛ difference from that of it guessing it correctly in game G j. However, it is in game G j if and only if the challenger gave the simulator instead of T. Therefore the simulator has advantage ɛ in the Comosite 3-Party Diffie-Hellman game. B.4 Proof of Lemma 5.6 We rove our lemma by suosing that a oly-time adversary A has non-negligible difference ɛ between its advantage in game G j and its advantage in game G j+1 for some index j. We build a simulator that lays the Comosite 3-Party Diffie-Hellman game with advantage ɛ. The challenger first creates a 3-Party challenge as: (, q, G, G T, e) G(λ), n q, g G, g q G q 1, 2, 3 G q a, b, c Z n Z ( (n, G, G T, e), g q, g, g, a g, b Γ = g ab 1, Y = g abc ) 2 T g c 3 It then randomly decides whether to give ( Z, T = T ) or ( Z, T = ) where is a random element in G. We create the following simulation: 24

Init The attacker gives the simulator two identities L 0, L 1. The challenger then flis the coin β internally. Setu Let δ be the j + 1-th index in X. The simulator first chooses random G q, ( u,1, h,1, w,1 ),..., ( u,l, h,l, w,l ) G 3 q, ν Z n and random t 1, x 1, y 1,..., t l, x l, y l. The simulator first ublishes the grou descrition and g q, V = g. It icks a random α Z n and lets A = e(g, g ) α. It next creates It next creates Finally, for all i δ it creates: U δ = (g b ) t δ u,δ, H δ = (g b ) t δl β,δ g y δ h,δ, W δ = Γ w,δ U i = (g b ) t i u,i, H i = (g b ) t il β,i g y i h,i, W i = g x i w,i We observe that the arameters are distributed identically to the real scheme. Query 1 The adversary will make rivate key queries to the simulator. The way they are handled deends uon the tye of query. Suose the adversary queries for a key L, where the set of indexes that are non-wild cards is denoted as S. Tye 1 If δ / S then the simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = i S (g) b r i,1(i i L β,i )t i g r i,1y i g r i,2x i Additionally, it creates: i S : K i,1 = (g ) r i,1, K i,2 = g r i,2 Tye 2 Suose δ S, but I δ L 0,δ and I δ L 1,δ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( i S/{δ} Additionally, it creates: (g b ) r i,1(i i L β,i )t i g r i,1y i g r i,2x i )(g) a r δ,1y δ (g) b r δ,2(i δ L β,δ )t δ g r δ,2y δ i S/{δ} : K i,1 = (g ) r i,1, K i,2 = g r i,2 Finally it creates: K δ,1 = (g) a r δ,1 g r δ,2, K δ,2 = g t δ(i δ L β,δ )r δ,1 The keys are distributed as if the randomness for the δ comonent was: r δ,1 = ar δ,1 + r δ,2 (mod ) r δ,2 = t δ (I δ L β,δ )r δ,1 (mod ) Since, r δ,1, r δ,2 are indeendent the keys generated from the simulation are identical to that of the real scheme. 25

Tye 3 Suose δ S and I δ = L β,δ, but there exists an (arbitrary) index γ S such that I γ L β,γ. The simulator first chooses random r i,1, r i,2 Z n i S. Next it creates: K 0 = ( Additionally, it creates: Finally it creates: i S/{δ} (g b ) r i,1(i i L β,i )t i g r i,1y i g r i,2x i )(g) b r δ,2(i δ L β,δ )t δ g r δ,2y δ i S/({γ} {δ} : K i,1 = (g ) r i,1, K i,2 = g r i,2 K δ,1 = g r δ,2 (g) a r γ,1y δ /y δ, K δ,2 = g tγ(iγ L β,γ)r δ,1 K γ,1 = (g ) r γ,1 (g a ) r δ,1, K γ,2 = g r γ,2 The keys are distributed as if the randomness for the δ, γ comonents was: r δ,1 = r δ,2 + ar δ,1 y γ /y γ (mod ) r δ,2 = t γ r δ1 (I γ L β,γ ) (mod ) r γ,1 = r γ,1 + ar δ,1 (mod ) Since, r δ,1, r δ,2, r γ,1 are indeendent the keys generated from the simulation are identical to that of the real scheme. Challenge The adversary first gives the simulator messages M 0, M 1. Let X j be the first j indexes in X. The simulator chooses random Z G q (Z 1,1, Z 1,2 ),..., (Z l,1, Z l,2 ) G 2 q It also chooses random s Z n. It oututs the challenge as follows: C 0 = g s Z, C δ,1 = T y δz δ,1, C δ,2 = Y x δz δ,2, i s.t. i δ and i / H j : C i,1 = g s y i Z i,1, C i,2 = g s x i Z i,2. For all i H j the simulator chooses random elements in G for C i,1, C i,2. If M 0 = M 1 the simulator creates C as C = e(g, g ) s α M 0, otherwise it chooses a random grou element for C. If T is forms a tule, then the simulator is laying game H j, otherwise it is laying game H j+1. Query Phase 2 Same as Query Phase 1. Guess The adversary oututs a guess β. If β = β outut 0 otherwise outut 1. By our assumtion the robability that the adversary guesses β correctly in game G j has a non-negligible ɛ difference from that of it guessing it correctly in game G j+1. However, it is in game G j+1 if and only if the challenger gave the simulator instead of T. Therefore the simulator has advantage ɛ in the Comosite 3-Party Diffie-Hellman game. 26

C Comarison queries with n size cihertext In this section we focus on the comarison searching roblem discussed in Section 3.1 for the secial case w = 1, namely the case considered in Figure 1. We let n denote the domain size. ecall that the trivial system in this case achieves cihertext size O(n) as does the system based on Hidden Vector Encrytion. Here, we briefly describe a construction that achieves cihertext size of n. Boneh, Sahai, and Waters [11] recently described a tracing traitors system where cihertext size is n where n is the number of users in the system. There construction is based on a general rimitive called PLBE (Private Linear Broadcast Encrytion). Boneh and Waters [12] recently generalized the construction to obtain a trace and revoke system with cihertexts having the same size. Their generalization is based on a construction for Augmented Broadcast Encrytion (ABE). Setting the reciient set S to S = {1,..., n} in an ABE system results in a ublic variant of PLBE which we call ublic-plbe. The definition of a ublic-plbe is imlicit in [12]. For comleteness, we give the comlete definition in Aendix D here. The main result in [12] is an ABE system with the following arameters: CT-size = Key-size = PK-size = O( n) This gives a ublic-plbe with similar arameters (by setting S = {1,..., n}). We denote the algorithms in the BW ublic-plbe by (Setu PKLBE, Encryt PKLBE, Decryt PKLBE ). We also note that the PLBE of [11] can be easily extended as in [12] to obtain a ublic-plbe with arameters Key-size = O(1), CT-size = PK-size = O( n) In Section 3.1 we defined the set of comarison redicates Φ n,w. We show that for w = 1, any secure ublic-plbe gives a Φ n,1 -searchable encrytion as follows: Setu(λ) un Setu PKLBE (n, λ) to obtain a ublic key PK and n secret keys (SK 1,..., SK n ). Outut PK and SK := (SK 1,..., SK n ). Encryt(PK, s, M) where s {1,..., n}. Outut C Encryt PKLBE (PK, s, M). GenToken(SK, P ) A redicate P Φ n,1 is a number i {1,..., n}. Outut TK (i, SK i ). Query(TK, C) Let TK = (i, SK i ). un Decryt PKLBE (i, SK i, C). Using a ublic-plbe we thus obtain a Φ n,1 -searchable ublic key encrytion where cihertext size in n. Security follows easily from the roerties of ublic-plbe. Theorem C.1. The Φ n,1 -searchable encrytion system is secure assuming the underlying ublic- PLBE is secure. D Definition of ublic-plbe Boneh and Waters [12] define a rimitive called Augmented Broadcast Encrytion (ABE) which they use to build a trace and revoke system. Setting the reciient set S to S = {1,..., n} in an ABE results in a concet we call ublic-plbe. For comleteness, we include the full definition here. A ublic-plbe is a restricted broadcast system comrising of the following algorithms: 27

Setu PKLBE (N, λ) A robabilistic algorithm that takes as inut N, the number of users in the system, and a security arameter λ. The algorithm runs in olynomial time in λ and oututs a ublic key PK and rivate keys SK 1,..., SK N, where SK u is given to user u. Encryt PKLBE (PK, i, M) Takes as inut a ublic key PK, an integer i satisfying 1 i N+1, and a message M. It oututs a cihertext C. This cihertext is intended for users {i, i+1,..., N}. Decryt PKLBE (j, SK j, C) Takes as inut the rivate key SK j for user j and a cihertext C. The algorithm oututs a message M or. The system must satisfy the following correctness roerty: for all i, j {1,..., N + 1} (where j N), and all messages M: Let (PK, (SK 1,..., SK N )) Setu PKLBE (N, Λ) and C Encryt PKLBE (PK, i, M). If j i then Decryt PKLBE (j, SK j, C) = M. Security. We define security of an PKLBE system using two games. The first game is a message hiding game and says that a cihertext created using index i = N + 1 is unreadable by anyone. The second game is an index hiding game and catures the intuition that a broadcast cihertext created using index i reveals no non-trivial information about i. We will consider all these games for a fixed number of users, N. Game 1. The first game, called the Message Hiding Game says that an adversary cannot break semantic security when encryting using index i = N + 1. The game roceeds as follows: Setu The challenger runs the Setu PKLBE algorithm and gives the adversary PK and all secret keys {SK 1,..., SK N }. Challenge The adversary oututs two equal length messages M 0, M 1. The challenger flis a coin β {0, 1} and sets C Encryt PKLBE (PK, N + 1, M β ). The challenger gives C to the adversary. Guess The adversary returns a guess β {0, 1} of β. We define the advantage of adversary A in winning the game as MH Adv A = Pr[β = β] 1/2. Game 2. The second game, called the Index Hiding Game says that an adversary cannot distinguish between an encrytion to index i and one to index i+1 without the key SK i. The game takes as inut a arameter i {1,..., N} which is given to both the challenger and the adversary. The game roceeds as follows: Setu The challenger runs the Setu PKLBE algorithm and gives the adversary PK and the set of rivate keys { SK j s.t. j i }. Challenge The adversary oututs a message M. The challenger flis a coin β {0, 1} and comutes C Encryt PKLBE (PK, i + β, M). The challenger returns C to the adversary. 28

Guess The adversary returns a guess β {0, 1} of β. We define the advantage of adversary A as the quantity IH Adv A [i] = Pr[β = β] 1/2. In words, the game catures the fact that even if all users other than i collude they cannot distinguish whether i or i + 1 was used to create a cihertext C. With this games we define a secure PKLBE as follows. Definition D.1. We say that an N-user ublic-plbe system is secure if for all olynomial time adversaries A we have that MH Adv A and IH Adv A [i] for i = 1,..., N, are negligible functions of λ. 29