Security Analysis of Cryptographically Controlled Access to XML Documents



Similar documents
Computational Soundness of Symbolic Security and Implicit Complexity

Chapter 11. Asymmetric Encryption Asymmetric encryption schemes

Victor Shoup Avi Rubin. Abstract

Non-Black-Box Techniques In Crytpography. Thesis for the Ph.D degree Boaz Barak

Lecture 3: One-Way Encryption, RSA Example

MESSAGE AUTHENTICATION IN AN IDENTITY-BASED ENCRYPTION SCHEME: 1-KEY-ENCRYPT-THEN-MAC

Lecture 5 - CPA security, Pseudorandom functions

Advanced Cryptography

1 Construction of CCA-secure encryption

MACs Message authentication and integrity. Table of contents

Lecture 9 - Message Authentication Codes

Lecture 10: CPA Encryption, MACs, Hash Functions. 2 Recap of last lecture - PRGs for one time pads

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

Lecture 15 - Digital Signatures

Security Aspects of. Database Outsourcing. Vahid Khodabakhshi Hadi Halvachi. Dec, 2012

Topology-based network security

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur

Simulation-Based Security with Inexhaustible Interactive Turing Machines

Cryptographic hash functions and MACs Solved Exercises for Cryptographic Hash Functions and MACs

1 Message Authentication

Yale University Department of Computer Science

An Overview of Common Adversary Models

Theoretical Computer Science

SECTION 6: FIBER BUNDLES

Network Security. Computer Networking Lecture 08. March 19, HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23

SYMMETRIC ENCRYPTION. Mihir Bellare UCSD 1

2.1 Complexity Classes

Provable-Security Analysis of Authenticated Encryption in Kerberos

Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur

Post-Quantum Cryptography #4

Department Informatik. Privacy-Preserving Forensics. Technical Reports / ISSN Frederik Armknecht, Andreas Dewald

Breaking Generalized Diffie-Hellman Modulo a Composite is no Easier than Factoring

Lecture 2: Complexity Theory Review and Interactive Proofs

Identity-Based Encryption from the Weil Pairing

1 Signatures vs. MACs

Multi-Input Functional Encryption for Unbounded Arity Functions

Capture Resilient ElGamal Signature Protocols

Universal Hash Proofs and a Paradigm for Adaptive Chosen Ciphertext Secure Public-Key Encryption

Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm By Mihir Bellare and Chanathip Namprempre

Foundations of mathematics. 2. Set theory (continued)

FIXED INCOME ATTRIBUTION

Security Analysis of DRBG Using HMAC in NIST SP

Copyright 2005 IEEE. Reprinted from IEEE MTT-S International Microwave Symposium 2005

Cryptography: Authentication, Blind Signatures, and Digital Cash

Verifiable Outsourced Computations Outsourcing Computations to Untrusted Servers

Introduction. Digital Signature

Counter Expertise Review on the TNO Security Analysis of the Dutch OV-Chipkaart. OV-Chipkaart Security Issues Tutorial for Non-Expert Readers

The Basics of Graphical Models

A FRAMEWORK FOR AUTOMATIC FUNCTION POINT COUNTING

The Order of Encryption and Authentication for Protecting Communications (Or: How Secure is SSL?)

Message authentication

A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA

Reading 13 : Finite State Automata and Regular Expressions

The Tangled Web of Agricultural Insurance: Evaluating the Impacts of Government Policy

Authenticated Encryption: Relations among notions and analysis of the generic composition paradigm

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Factoring & Primality

Message Authentication Codes 133

How To Encrypt With A 64 Bit Block Cipher

Correspondence analysis for strong three-valued logic

Security Analysis for Order Preserving Encryption Schemes

CryptographicallyEnforced

On-Line/Off-Line Digital Signatures

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Sample or Random Security A Security Model for Segment-Based Visual Cryptography

Outline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures

Is the trailing-stop strategy always good for stock trading?


Rainfall generator for the Meuse basin

Time-Memory Trade-Offs: False Alarm Detection Using Checkpoints

Mechanism Design for Online Real-Time Scheduling

A performance analysis of EtherCAT and PROFINET IRT

Offline sorting buffers on Line

CPSC 467b: Cryptography and Computer Security

A Survey and Analysis of Solutions to the. Oblivious Memory Access Problem. Erin Elizabeth Chapman

The Tangled Web of Agricultural Insurance: Evaluating the Impacts of Government Policy

Chosen-Ciphertext Security from Identity-Based Encryption

Accommodating Transient Connectivity in Ad Hoc and Mobile Settings

Computer Networks. Network Security 1. Professor Richard Harris School of Engineering and Advanced Technology

International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013

Secure Computation Without Authentication

Ch.9 Cryptography. The Graduate Center, CUNY.! CSc Theoretical Computer Science Konstantinos Vamvourellis

Analysis of Privacy-Preserving Element Reduction of Multiset

Associate Prof. Dr. Victor Onomza Waziri

Mining the Situation: Spatiotemporal Traffic Prediction with Big Data

Message Authentication Code

A MPCP-Based Centralized Rate Control Method for Mobile Stations in FiWi Access Networks

Integrated airline scheduling

SECURITY EVALUATION OF ENCRYPTION USING RANDOM NOISE GENERATED BY LCG

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

A first step towards modeling semistructured data in hybrid multimodal logic

Polynomials with non-negative coefficients

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Which Semantics for Neighbourhood Semantics?

Information Security in Big Data using Encryption and Decryption

Authentication and Encryption: How to order them? Motivation

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Transcription:

Security Analysis o Cryptographically Controlled Access to XML Documents MARTÍN ABADI University o Caliornia, Santa Cruz, and Microsot Research, Silicon Valley, USA and BOGDAN WARINSCHI University o Bristol, UK Some promising recent schemes or XML access control employ encryption or implementing security policies on published data, avoiding data duplication. In this paper we study one such scheme, due to Miklau and Suciu. That scheme was introduced with some intuitive explanations and goals, but without precise deinitions and guarantees or the use o cryptography (speciically, symmetric encryption and secret sharing). We bridge this gap in the present work. We analyze the scheme in the context o the rigorous models o modern cryptography. We obtain ormal results in simple, symbolic terms close to the vocabulary o Miklau and Suciu. We also obtain more detailed computational results that establish security against probabilistic polynomial-time adversaries. Our approach, which relates these two layers o the analysis, continues a recent thrust in security research and may be applicable to a broad class o systems that rely on cryptographic data protection. Categories and Subject Descriptors: E.3 Data Encryption: ; H.2.7 Database Administration: Security, integrity, and protection; F.1.1 Models o Computation: Relations between models General Terms: Security, Theory Additional Key Words and Phrases: access control, authorization, encryption, XML 1. INTRODUCTION A classic method or enorcing policies on access to data is to keep all data in trusted servers and to rely on these servers or mediating all requests by clients, authenticating the clients and perorming any necessary checks. An alternative method, which is sometimes more attractive, consists in publishing the data in such a way that each client can see only the appropriate parts. In a naive scheme, many sani- This work was supported in part by the National Science Foundation under Grants CCR-0204162, CCR-0208800, CCF-0524078, and ITR-0430594, and by ACI Jeunes Chercheurs JC 9005 and ARA SSIA Formacrypt. It was partly carried out while Bogdan Warinschi was ailiated with the University o Caliornia at Santa Cruz, with Stanord University, and with Loria, INRIA, in Nancy. Authors s addresses: Martín Abadi, Computer Science Department, University o Caliornia at Santa Cruz, Santa Cruz, CA 95064, USA. Bogdan Warinschi, Department o Computer Science, University o Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK. A preliminary version o this paper appears in the Proceedings o the 24th ACM Symposium on Principles o Database Systems. Permission to make digital/hard copy o all or part o this material without ee or personal or classroom use provided that the copies are not made or distributed or proit or commercial advantage, the ACM copyright/server notice, the title o the publication, and its date appear, and notice is given that copying is by permission o the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior speciic permission and/or a ee. c 20YY ACM 0004-5411/20YY/0100-0001 $5.00 Journal o the ACM, Vol. V, No. N, Month 20YY, Pages 1 0??.

2 Martín Abadi and Bogdan Warinschi tized versions o the data would be produced, each corresponding to a partial view suitable or distribution to a subset o the clients. This naive scheme is impractical in general. Accordingly, there has been much interest in more elaborate and useul schemes or ine-grained control on access to published documents, particularly or XML documents Bertino et al. 2002; Bertino et al. 2001; Crampton 2004; Damiani et al. 2002; Kudo and Hada 2000; Miklau and Suciu 2003; Yang and Li 2004. This line o research has led to eicient and elegant publication techniques that avoid data duplication by relying on cryptography. For instance, using those techniques, medical records may be published as XML documents, with parts encrypted in such a way that only the appropriate users (physicians, nurses, researchers, administrators, and patients) can see their contents. The work o Miklau and Suciu 2003 is a crisp, compelling example o this line o research. They develop a policy query language or speciying ine-grained access policies on XML documents and a logical model based on the concept o protection. They also show how to translate consistent policies into protections, and how to implement protections by XML encryption Eastlake and Reagle 2002. Roughly, a protection is an XML tree in which nodes are guarded by positive boolean ormulas over a set o symbols {K 1,K 2,...} that stand or cryptographic keys. Protections have a simple and clear intended semantics: access to the inormation contained in a node is conditioned on possession o a combination o keys that satisies the ormula that guards the node. For example, access to a node guarded by (K 1 K 2 ) K 3 requires possessing either keys K 1 and K 2 or key K 3. (See Giord s work or some o the roots o this approach Giord 1982.) Formally, a protection describes a unction that maps each possible set o keys to the set o nodes that can be accessed using those keys, treating the keys as symbols. On the other hand, the use o keys or deriving a partially encrypted document is not symbolic: this process includes replacing the symbols K 1,K 2,... with actual keys, and applying a symmetric encryption algorithm repeatedly, bottom-up, to the XML document in question. While Miklau and Suciu provide a thorough analysis o the translation o policies into protections, they leave a large gap between the abstract semantics o protections and the use o actual keys and encryption. The existence o this gap should not surprise us: an analogous gap existed in protocol analysis or 20 years, until recent eorts to bridge it (e.g., Abadi and Rogaway 2002; Backes et al. 2003; Herzog 2004; Laud 2004; Micciancio and Warinschi 2004). Concretely, the gap means that the protection semantics leaves many problematic issues unresolved. We describe two such issues, as examples: Partial inormation: It is conceivable that even when a node should be hidden according to a protection, the partially encrypted document may in act leak some inormation about the data in that node. Encryption cycles: From the point o view o the abstract semantics, encryption cycles (such as encrypting a key with itsel) are legitimate and do not contradict security. On the other hand, there are encryption algorithms that satisy standard cryptographic deinitions o security but that leak keys when encryption cycles are created (see Section 4). More generally, there are many encryption methods and many notions o security Journal o the ACM, Vol. V, No. N, Month 20YY.

Security Analysis o Cryptographically Controlled Access to XML Documents 3 or them (e.g., Bellare and Rogaway 2005; Dolev et al. 2000; Goldwasser and Micali 1984), and it is not clear which one, i any, provides adequate guarantees or this application nor is it exactly clear what those guarantees might be. The immediate goal o this work is to bridge this gap by reconciling the abstract semantics o protections with a more concrete, computational treatment o security, and to deine and establish precise security guarantees. We do not wish to replace the abstract semantics, which certainly has its place, but rather to complement it. From a broader perspective, our goal is to develop, apply, and promote useul concepts and tools or security analysis in the ield o database theory. These concepts and tools do not pertain to statistical techniques, which have long been known in database research (e.g., Adam and Worthmann 1989; Castano et al. 1995; Ullman 1983), but rather to cryptography. While sophisticated uses o cryptology in database research may have been o modest scope, there is an obvious need or database security, and we believe that cryptology has much to oer. In research on cryptographic protocols, ormal and complexity-theoretic methods have been successul in providing detailed models and in enabling security proos (sometimes automated ones). The same methods are beneicial or a broad class o systems that require security (e.g., Micciancio and Panjwani 2006). Each application, however, can necessitate non-trivial, speciic insights and results. In the techniques that we study, partial and multiple encryptions occur in (large, XML) data instances; we thereore depart rom the situations most typically considered in the cryptography literature, towards data management. It is this speciicity that motivates the present paper. Overview o Results Our analysis is directed at the core o the ramework o Miklau and Suciu, which aims to ensure data protection by an interesting combination o encryption schemes and secret sharing schemes Shamir 1979. As a ormal counterpart to their loose, inormal concept o data secrecy, we introduce a strong, precise cryptographic deinition. The deinition goes roughly as ollows. Consider a protection or an XML document. An adversary is given an arbitrary set o keys, and the liberty o selecting two instantiations or the data in all nodes that occur in the XML document. The only restriction on these instantiations is that they should coincide on the nodes to which the adversary rightully has access according to its keys and the abstract semantics o protections. In other words, the adversary selects two documents that contain the same inormation in the nodes it can access but may dier elsewhere. Then the adversary is given the partially encrypted document that corresponds to one o its two documents, and its goal is to decide which o the two instantiations was used in generating this partially encrypted document. Security means that the adversary cannot do much better than picking at random. It implies that the partially encrypted document reveals no inormation on the data in the nodes that should be hidden rom the adversary, or otherwise this inormation would be suicient to determine which instantiation was used. Technically, we adapt and extend the approach o Abadi and Rogaway 2002, which has been ollowed and developed in several pieces o recent work (e.g., Laud 2004; Micciancio and Panjwani 2005; 2006). The novelties o this paper include the Journal o the ACM, Vol. V, No. N, Month 20YY.

4 Martín Abadi and Bogdan Warinschi application to document access control, signiicant dierences in basic deinitions motivated by this application, and the treatment o secret sharing. First we provide an intermediate symbolic language or cryptographic expressions. We then deine patterns o expressions; intuitively, a pattern represents the inormation that an expression reveals to an adversary. We show how to transorm protections into cryptographic expressions, and use patterns or providing an equivalent semantics or protections. This equivalence is captured in Theorem 1. Going urther, we relate expressions to concrete computations on bit-strings. The most diicult result o this paper is Theorem 2. Inormally, it states that patterns aithully represent the inormation that expressions reveal, even when expressions and patterns are implemented with actual encryption schemes (not symbolically). More precisely, we associate probability distributions with an expression and its pattern by mapping symbols to bit-strings and implementing encryption with a semantically secure encryption scheme Goldwasser and Micali 1984, and prove that these distributions cannot be distinguished by any probabilistic polynomial-time algorithm. Our main theorem, Theorem 4, reconciles the abstract semantics o protections with the actual use o encryption. We establish that i data is hidden according to a protection, then it is secret according to our deinition o secrecy. Contents The next section, Section 2, is mostly a review. Section 3 introduces our ormal language or representing cryptographic expressions and deines an alternative semantics o XML protections. Our main results are in Section 4, which gives concrete interpretations to expressions and relates the ormal semantics o protections to a strong deinition o secrecy. Section 5 considers some variants and extensions. Section 6 concludes. The Appendix contains additional proos. 2. CONTROLLING ACCESS TO XML DOCUMENTS WITH PROTECTIONS In this section we briely recall the key aspects o the work o Miklau and Suciu. We ocus on protections. We describe the derivation o partially encrypted documents rom protections in the next section. We omit the policy query language because, or our purposes, we can discuss the protections generated rom policies rather than the policies themselves. XML Documents and Protections We model XML documents as trees labeled with elements rom a set Data = {D 1,D 2,...}, which we assume to represent XML element names and actual data values, though we make no syntactic distinction between these two possibilities. We treat D 1, D 2,... as atomic data symbols. We use pre-order representations or XML trees, described by the grammar: XML ::= (Data) (Data, XML, XML,...,XML) with terminals in Data { (, ) }. A protection consists o two components: a metadata XML tree obtained rom an XML tree by adding metadata nodes, and a mapping that attaches to each node a positive boolean ormula over Keys = {K 1,K 2,...}. Metadata nodes are nodes that have special kinds o labels, and it is assumed that they can be distinguished Journal o the ACM, Vol. V, No. N, Month 20YY.

Security Analysis o Cryptographically Controlled Access to XML Documents 5 K 1 < hosp > 1 (K1 K 3 ) K 4 2 < nurse > true < phys > 5 K 3 K 4 3 4 < pat id > < pat id > K 2 6 < admin > K 1 1 true < hosp > true K 4 OR 7 K 6 true < phys > 5 AND 8 K 12 6 2 < nurse > K 2 6 K 1 K 1 5 9 K 3 K 2 10 5 K 5 K 11 6 K 3 < pat id > 3 K 4 4 < pat id > < admin > Fig. 1. A tree protection (top) and an equivalent normalized one (bottom). {D 1, (OR, (AND, ({K 1 5} K1, {K 2 5} K1, {K 6 } K5 ), {K 6 } K4, {D 2, ({D 3 } K3, {D 4 } K4 )} K6 ), (D 5, ({D 6 } K2 )))} K1 Fig. 2. Expression associated with the normalized protection o Figure 1. Journal o the ACM, Vol. V, No. N, Month 20YY.

6 Martín Abadi and Bogdan Warinschi rom the standard XML nodes. Their labels are one o the symbols OR and AND, or hold keys in the set Keys or key shares in the set KeyShares deined by: or a ixed parameter n. grammar: KeyShares = {K j i K i Keys,j 1..n} Thus, in summary, protections are generated by the Prot ::= (BData), Cond (BData, Prot, Prot,...,Prot), Cond Cond ::= true alse Keys Cond Cond Cond Cond where BData = Data Keys KeyShares {OR, AND} is the set o node labels. Roughly, the key shares Ki 1, K2 i,..., Kn i are pieces o inormation that together allow the recovery o the key K i, but o which no proper subset suices or computing K i, or even non-trivial partial inormation about K i. For example, the key shares Ki 1,..., Kn 1 i may be random strings o the same length as K i,andki n may be the XOR o Ki 1,..., Kn 1 i with K i. We treat a ramework more general than the original one, which uses a particular way o sharing keys. We assume that the number o shares or each key is some integer constant n (with n 2), and that each key is shared only once. (This assumption holds in the original ramework, where n = 2.) In a protection, the ormula that guards a node describes a condition that needs to be satisied beore a user is granted access to that node (and its children). For example, accessing a node guarded by the ormula K 1 (K 2 K 3 ) requires having key K 1 and at least one o the keys K 2 and K 3. In addition, the user should also satisy all ormulas on the path rom the root to the node. Figure 1 gives an example (adapted rom Miklau and Suciu 2003). The tree at the top is an example o a protection over some XML medical database. Notice that we preserved the original labels we did not replace element names with symbols D 1, D 2,... A user that possesses only key K 3 cannot access any node since the root is guarded by K 1. I the user knows also K 1 then it should be able to access the data in nodes {1, 2, 3, 5}. Normalization Using simple transormations, one can rewrite any protection into an equivalent, normalized protection where all ormulas that guard nodes are atomic, that is, one o true, alse, ork or some K Keys. Normalization requires adding metadata nodes, keys, and key shares. Normalization can also include removing parts guarded by alse, so we assume that alse does not appear in normalized ormulas (departing slightly rom the original deinition but without loss o generality). Normalized protections are important because, in standard encryption schemes, one can encrypt under an atomic key but not under a boolean combination o keys. Normalized protections serve as the basis or producing partially encrypted documents by applying an encryption algorithm repeatedly. Journal o the ACM, Vol. V, No. N, Month 20YY.

Security Analysis o Cryptographically Controlled Access to XML Documents 7 In Figure 1, the tree at the bottom is a normalized protection, equivalent to the protection at the top. In the normalized protection, a user with keys K 1 and K 3 can recover the inormation in nodes 9 and 10, that is, the key shares K 1 5 and K 2 5, and thereore K 5. (Recall that, in Miklau and Suciu 2003, keys are split into only two shares.) The key K 5 can then be used to recover the inormation in node 11, that is, the key K 6 which together with K 1 provides access to the nodes {1, 2, 3, 5} o the original tree. A Semantics o Protections Formally, the semantics o a protection P is the unction Acc P : P(Keys) P(Data) that, given a set o keys T Keys, returns the set o data that can be accessed using the keys in T. Computing the unction Acc P (T ) is an iterative process. The keys in T are used to access new keys (by either obtaining them directly or recovering all their shares), and the process is repeated until a maximal set o keys is obtained. The output o Acc P (T ) is the set o all data contained in nodes that can be accessed using this last set o keys. We reer the reader to Miklau and Suciu 2003 or additional details on the deinition o Acc P (T ). In most o our analysis, we use an equivalent ormulation o the ormal semantics, given in Section 3. On Keys Derived rom Data In the description above and in the analysis that ollows, we do not consider the use o keys derived rom data (or example, rom mother s maiden names and social security numbers). There are at least three obstacles to obtaining security guarantees with such keys. These obstacles are not speciic to a particular scheme, but they do arise in this context. First, the potential lack o entropy in data implies that the resulting keys may oer no security. For example, keys computed rom 9-digit social security numbers can have at most 10 9 values, which an attacker may try in order to recover anything encrypted under those keys. An attacker may even learn the underlying social security number when decryption with one o those values succeeds. Moreover, the key derivations can lose some data entropy. Speciically, Miklau and Suciu compute a 128-bit key by breaking data into 128-bit blocks and XORing those blocks; i a piece o data yields blocks in which only the irst m bits are meaningul and the last 128 m bits contain a ixed pattern, then the resulting key has at most m bits o entropy, no matter how much entropy was initially present in the data. Finally, the resulting keys can be involved in questionable encryption cycles or instance, protecting a mother s maiden name with a social security number and vice versa. 3. FORMAL ANALYSIS In this section we introduce a language or representing cryptographic expressions that use symmetric encryption and secret sharing schemes. We rely on this language or providing a ormal account o the transormation o protections into partially encrypted documents. We also deine patterns o expressions, which capture the Journal o the ACM, Vol. V, No. N, Month 20YY.

8 Martín Abadi and Bogdan Warinschi inormation that expressions reveal to an adversary, in symbolic terms. This deinition is a variant o ones suggested in other contexts Abadi and Rogaway 2002; Herzog 2004; Micciancio and Panjwani 2005. Finally, we establish a relation between the semantics o protections and these patterns. Basically, expressions are useul as an intermediate language between protections and the world o probability distributions on bit-strings. They also serve as a bridge to other work on ormal analysis. That work includes the idea o patterns, as generalizations o expressions. We adapt and extend that idea or the present setting. An Intermediate Cryptographic Language We consider expressions built rom the set o basic data BData (deined above), using tupling and encryption with keys in Keys. The grammar or expressions is: Exp ::= BData (Exp, Exp,...,Exp) {Exp} Keys For example, an element in Exp is the expression ({(D 2,K1)} 2 K2, {(D 1, {K 2 } K3 )} K1 ) It represents the tupling o two ciphertexts. The irst ciphertext is the encryption under key K 2 o data D 2 and key share K1. 2 The second ciphertext is the encryption under key K 1 o the tuple ormed rom data D 1 and the encryption under key K 3 o key K 2. We sometimes omit parentheses in expressions. For example, we may write {D 1,K 1 } K2 rather than {(D 1,K 1 )} K2. Associating Cryptographic Expressions to Protections We use our language o expressions or giving a precise deinition o how to map a normalized protection to (a symbolic representation or) a partially encrypted document. This deinition is recursive. It relies on the intuition that each node o the protection under consideration is (recursively) mapped to a part o the inal, partially encrypted document. I the node is guarded by a key, then the corresponding part o the inal document is encrypted under that key, otherwise it is let in clear. Formally, we deine the unction E : Prot Exp inductively by: E((B,P 1,P 2,...,P l ), true) = (B,E(P 1 ), E(P 2 ),...,E(P l )) E((B,P 1,P 2,...,P l ),K) = {(B,E(P 1 ), E(P 2 ),...,E(P l ))} K where l 0, each P i (or i =1, 2,...,l) is a normalized protection, B is an arbitrary symbol in BData, andk is an arbitrary key in Keys. For example, the expression o Figure 2 corresponds to the second protection in Figure 1, with the element name contained in node i replaced with data symbol D i. Cycles The deinition o expressions allows encryption cycles o the kind discussed in the introduction. For some o our results, it is useul to ocus on a class o acyclic expressions: Deinition 1 Acyclic expressions. A plain occurrence o a key K j in an expression E 1 is one where K j is not the subscript in a subexpression {...} Kj.Key Journal o the ACM, Vol. V, No. N, Month 20YY.

Security Analysis o Cryptographically Controlled Access to XML Documents 9 K i encrypts key K j in expression E i there exists a subexpression {E 1 } Ki in E such that K j occurs plainly in E 1 or Kj k (or some k 1..n) occursine 1. The expression E is acyclic i the graph associated with its encrypts relation is acyclic. For example, {K 1 } K2 and {{K 1 } K2 } K2 are both acyclic expressions, while {K 1 } K1 and ({K 1 1} K2, {K 2 } K1 ) are not. As long as data values are not used as keys, a protection P that results rom the translation o a policy never contains keys in its nodes. In turn, i normalizing P yields P, then E(P ) is an acyclic expression. (This point ollows rom the deinition o normalization Miklau and Suciu 2003.) Recoverable Keys Given an expression E, we write keys(e) or the set o key symbols that occur in E and key symbols whose shares occur in E. Wecallakeyrecoverable i the key occurs in clear (that is, not encrypted), i the key occurs encrypted under recoverable keys, or i each o its shares occurs in clear or encrypted under recoverable keys. More precisely, a key is recoverable i it is in the least set S o expressions such that: E S, or all E 1,...,E m Exp, i(e 1,...,E m ) S, then E 1 S,..., E m S, or all E Exp and K j Keys, i{e } Kj S and K j S, then E S, or all K j Keys, ik 1 j,...,kn j S, then K j S. We write recoverable(e) or the set o all recoverable keys in expression E. For example, with n = 2, or the expression E =({K 1,K 1 2,K 1 6,K 4 } K3, {K 2 2} K2,K 3, {K 5 } K4 ) we have keys(e) ={K 1,K 2,K 3,K 4,K 5,K 6 } and recoverable(e) ={K 1,K 3,K 4,K 5 } while or the expression we have and E =({K 1,K 1 2,K 1 6,K 4 } K3, {K 5 } K2,K 3, {K 2 2} K4 ) keys(e )={K 1,K 2,K 3,K 4,K 5,K 6 } recoverable(e )={K 1,K 2,K 3,K 4,K 5 } Expression Patterns For each expression E, we deine its structure struct(e). Essentially, the structure o an expression is given by its parse tree in which the labels are replaced with resh symbols. We thereore introduce D, K 0,andK j 0 (or j 1..n), all disjoint rom BData, and deine the structure o expressions by: Journal o the ACM, Vol. V, No. N, Month 20YY.

10 Martín Abadi and Bogdan Warinschi struct(k i )=K 0 or all K i Keys, struct(k j i )=Kj 0 or all K i Keys, j 1..n, struct(d i )=D or all D i Data, struct(or) =OR, struct(and) =AND, struct((e 1,E 2,...,E m )) = (struct(e 1 ), struct(e 2 ),...,struct(e m )), struct({e} Ki )={struct(e)} K0 or all K i Keys. Thus, data symbols are replaced with D, key symbols are replaced with K 0, and key shares are replaced with corresponding shares o K 0. For example, the structure o the expression is {({K 1,D 2 } K3,K 1 3)} K2 {({K 0,D} K0,K 1 0)} K0 The encryption cycle in this structure is unimportant or our purposes. Alternative deinitions o expression structure can avoid such cycles without aecting our results. (For example, one such deinition lets the structure o {E} Ki be {struct(e)} K 0, where K 0 is a resh symbol distinct rom K 0.) We write p(e,t) or the pattern that can be observed in expression E using or decryption the keys in T keys(e). The set o patterns is deined like the set o expressions but over extended sets o atomic symbols that include D, K 0,andK j 0 (or j 1..n). The pattern p(e,t) is deined by: p(b,t) =B or all B BData, p((e 1,E 2,...,E m ),T)=(p(E 1,T), p(e 2,T),...,p(E m,t)), p({e} Ki,T)={struct(E)} Ki i K i T, p({e} Ki,T)={p(E,T)} Ki i K i T. The idea that motivates this deinition is that a ciphertext encrypted under an unknown key K i reveals at most the structure o the underlying plaintext and the identity o K i, but not the encrypted value. Section 4 explains a computational counterpart o this idea. We write pattern(e) or the pattern obtained rom E by using or decryption the keys recoverable rom E itsel. We let: pattern(e) =p(e,recoverable(e)) For example, the pattern o the expression ({D 1 } K1, {D 2 } K2,K 1 ) is ({D 1 } K1, {D} K2,K 1 ) Similarly, the pattern o the expression Journal o the ACM, Vol. V, No. N, Month 20YY. ({{D 1 } K2 } K1, {D 2 } K2,K 1 )

Security Analysis o Cryptographically Controlled Access to XML Documents 11 is ({{D} K2 } K1, {D} K2,K 1 ) Finally, when n = 2, the pattern o the expression ({D 1 } K1, {{K 1 } K2 } K3,K3,K 1 3, 2 {({K 1,D 2 } K3,K3)} 1 K2 ) is ({D} K1, {{K 0 } K2 } K3,K 1 3,K 2 3, {({K 0,D} K0,K 1 0)} K2 ) Secrecy, Symbolically Using patterns, we characterize the set o visible data (and thus the set o secret data) in ormal expressions. By analogy with the deinition o Acc P (T ), we deine the set o data accessible in expression E with keys in the set {K 1,K 2,...,K l } keys(e) by: Acc E ({K 1,K 2,...,K l })={D i Data D i occurs in pattern(e,k 1,K 2,...,K l )} The ollowing theorem relates Acc P (T )andacc E (T ), thus justiying the similarity in notation. It states that the set o data that can be accessed with keys K 1, K 2,..., K l according to protection P is precisely the set o data that can be seen in the pattern o (E(P ),K 1,K 2,...,K l ). The purpose o the tupling with the keys K 1, K 2,..., K l is to make those keys immediately recoverable. Theorem 1. Let P be a normalized protection and T keys(e(p )). Acc P (T )=Acc E(P ) (T ). Then When T is empty, the theorem holds because Acc P ( ) andacc E(P ) ( ) both consist o the data that occurs only under keys that can be recovered rom E(P ). More generally, or an arbitrary T, it holds because Acc P (T )andacc E(P ) (T ) both consist o the data that occurs only under keys that can be recovered rom the combination o E(P )andt. We omit a complete proo o this theorem; in the rest o the paper we rely primarily on Acc E (T ) rather than on Acc P (T ). 4. COMPUTATIONAL ANALYSIS In this section we give a concrete interpretation or expressions and patterns as probability distributions on bit-strings. This interpretation is computational in the sense that it relies on computations on bit-strings rather than on symbolic expressions. In particular, encryption is an actual computation on bit-strings, rather than a ormal operation. In this computational world, we give a cryptographic deinition or data secrecy, then prove our main results. We irst recall the deinition o computational indistinguishability (a central ingredient or deining cryptographic secrecy) and those o encryption and secret sharing schemes. Indistinguishability o Distribution Ensembles Let {Dη} 0 η and {Dη} 1 η be two distribution ensembles on bit-strings (sequences o probability distributions on bit-strings, indexed by a security parameter η). We Journal o the ACM, Vol. V, No. N, Month 20YY.

12 Martín Abadi and Bogdan Warinschi write x D R η i to indicate that x is sampled according to Dη.LetA i be an algorithm. The advantage o A in distinguishing between the two ensembles is the quantity: Adv dist D 0,D x 1(A, η) =Pr D R η 0 : A(x, η) =1 Pr x D R η 1 : A(x, η) =1 We say that D 0 and D 1 are computationally indistinguishable, and write D 0 D 1, i or any probabilistic polynomial-time algorithm A the quantity Adv dist D 0,D1(A, η) is negligible as a unction o η. (A unction (η) is negligible i it is smaller than the inverse o any polynomial or all suiciently large inputs η Bellare and Rogaway 2005.) Encryption Schemes An encryption scheme Π consists o algorithms (K, E, D) or key generation, encryption, and decryption, respectively. The key generation algorithm is randomized; it takes as input a security parameter η and returns a key k to be used or both encryption and decryption. We write k K(η) R or the process o generating encryption keys. The encryption algorithm is also randomized. It takes as input a key k and a plaintext m, and outputs a ciphertext c. We write c E(k, R m) or the process o encrypting message m with key k, producing c. Finally, the decryption algorithm takes as input a key k and a ciphertext c. Ic E(k, R m) orakeyk and a plaintext m, then D(k, c) =m. Decryption returns i it does not succeed. We use a standard notion o security or encryption: indistinguishability against chosen plaintext attacks (IND-CPA), also known as semantic security Goldwasser and Micali 1984. We deine it next. For a given encryption scheme Π = (K, E, D) and a bit b, we consider a let-right oracle LR Π,b (η). This oracle is a program that generates a key k (via k K(η)), R and then answers queries o the orm (m 0,m 1 ), where m 0 and m 1 are bit-strings o equal length. The oracle always returns the answer E(k, m b ), that is, the encryption o the message selected by b. Scheme Π is said to be IND-CPA secure i or any probabilistic polynomial-time adversary A with access to the oracle described above, the quantity: Adv ind-cpa Π (A, η) =Pr A LRΠ,0(η) (η) =1 Pr A LRΠ,1(η) (η) =1 is negligible as a unction o η. The probabilities are induced by the randomness used in the adversary and in the key-generation and encryption processes. Intuitively, the adversary A does not know b a priori, and it aims to discover b. The advantage Adv ind-cpa Π (A, η) will not be negligible i A can determine b by interacting with the oracle. A ortiori, the advantage Adv ind-cpa Π (A, η) will not be negligible i A can break encryptions, and thereore determine b by decrypting the oracle s response to a query (m 0,m 1 ) where m 0 m 1. According to this deinition o security, encryption need not hide the length o plaintexts (because m 0 and m 1 are required to be o equal length). Since length may depend on structure, the structure o plaintexts may be revealed as well. Furthermore, encryption need not hide the identity o keys: an adversary may be able to distinguish two encryptions under the same unknown key rom two encryptions under dierent unknown keys. Finally, since the oracles are responsible or generating and using the encryption keys, the adversary cannot guess or invent a Journal o the ACM, Vol. V, No. N, Month 20YY.

Security Analysis o Cryptographically Controlled Access to XML Documents 13 key and submit it or encryption under itsel; hence the deinition oers no guarantee or such cyclic encryptions, which may thereore leak keys. Alternative notions o security can yield stronger guarantees, but correspondingly they are harder to satisy. Section 5 considers some such notions. Secret Sharing Schemes An n-out-o-n secret sharing scheme SS =(S, C) or sharing keys o Π consists o algorithms or share creation and share combination. The randomized share creation algorithm S takes as input a key k and the security parameter η used to generate it, and outputs n shares o k: k 1, k 2,..., k n. The share combination algorithm C takes as input n shares k 1,k 2,...,k n, and attempts to reconstitute the original key. The scheme is correct i C(S(k, η)) = k or any key k and any η. Security o secret sharing means that a proper subset o the shares or a key provides no useul inormation on the key more precisely, we require that this subset cannot be distinguished rom a corresponding subset or another key. Let sh(k) be a sample rom the distribution S(k, η)othen shares output by the sharing algorithm or the key k, andsh(k) S be the restriction o sh(k) to the indexes in some set o indexes S {1, 2,...,n}. We require that, or any proper subset o indexes S {1, 2,..., n} and any probabilistic polynomial-time adversary A, the quantity: Adv ss R SS (A, η) = Pr k 0,k 1 K(η),sh(k 0 ) S(k R 0,η):A(k 0,k 1,sh(k 0 ) S )=1) R Pr k 0,k 1 K(η),sh(k 1 ) S(k R 1,η):A(k 0,k 1,sh(k 1 ) S )=1) is negligible as a unction o η. While this security requirement does not seem to be well-established, it should not be too surprising. Particular schemes may have additional properties, or example that sh(k 0 ) S and sh(k 0 ) S are indistinguishable whenever S and S are proper subsets o indexes o the same size (so key shares may all look alike). We do not need those properties. Computational Interpretation o Expressions Expressions and patterns induce distributions on bit-strings. These distributions are obtained by replacing data symbols with bit-strings and implementing encryption and key sharing with actual encryption and secret sharing schemes. Formally, or any expression or pattern E, given a unction : Data {0, 1} that maps data symbols to bit-string representations o XML element names and values, an encryption scheme Π = (K, E, D), a secret sharing scheme SS =(S, C), and a security parameter η, we deine the distribution E (and thus a distribution ensemble E Π,SS ) using a two-step procedure: (1) In the irst step, each key symbol is mapped to a bit-string. Speciically, we assume that keys(e) ={K 1,...,K m }, and we generate a vector τ rom the distribution K m+1 (η). This is a vector o m + 1 keys, each obtained by running the key generation algorithm on the security parameter. We map K i to τi, and in particular map K 0 to τ0. We then obtain shares o the keys in τ Journal o the ACM, Vol. V, No. N, Month 20YY.

14 Martín Abadi and Bogdan Warinschi D = data D i = (D i ) OR AND K i Π,SS,η = or = and = τi K j i = φij (E 1,E 2,...,E l ) =(E 1, E 2,...,E l {E} Ki = E(τi, E ) ) Fig. 3. Mapping expressions to bit-strings. by running the secret sharing scheme SS; these shares are maintained in an (m + 1)-by-n matrix φ whose rows are obtained by φi S(τi,η). R (2) In the second step, we map each expression (or pattern) E to an interpretation E that we deine inductively in Figure 3. This step assumes constant bit-strings or, and, and data, as well as a tupling operation on bitstrings. The computational interpretation E(P ) associated with the expression E(P ) is the distribution o the partially encrypted document derived rom protection P, generated with encryption scheme Π, secret sharing scheme SS, andη as security parameter. Because encryption may not hide the length o plaintexts, we typically need hypotheses on the lengths o bit-string representations. For simplicity, we assume that those lengths depend only on structure. More precisely, we assume that the. length o E always equals the length o struct(e) In the case where E is a data symbol, this assumption implies that we ocus on unctions : Data {0, 1} that map data symbols to bit-strings o a ixed length: Deinition 2. A valuation is a unction : Data {0, 1} that maps every data symbol to a bit-string o the same length as data. In the case where E is a key symbol, the assumption means that the key generator yields keys o a ixed length or each given security parameter. A similar condition applies to key shares. In the case where E is a tuple, it suices that the length o a tuple be a unction o the lengths o its components. Similarly, in the case where E is an encryption, it suices that the length o a ciphertext be a unction o the length o the underlying plaintext and o the security parameter. These conditions on keys, key shares, tuples, and encryptions hold in most usual implementations. Secrecy, Computationally We use the concept o computational indistinguishability ( ) and the computational interpretation o expressions or giving a computational characterization or secrecy Journal o the ACM, Vol. V, No. N, Month 20YY.

Security Analysis o Cryptographically Controlled Access to XML Documents 15 o data that occurs in expressions. Deinition 3. Let E be an expression, Π an encryption scheme, SS a secret sharing scheme, and S Data. ThesetS is computationally hidden in E (with Π and SS) i or any two valuations 0 and 1 such that it holds that E Π,SS 0 0 (D i )= 1 (D i ) E Π,SS 1. or all D i Data S As explained in the introduction, this deinition indicates that an adversary is allowed to choose two interpretations or the data symbols in the expression E. These interpretations must map data that is not secret to the same bit-strings, but may map other data to dierent bit-strings o the same length. The adversary is then given a bit-string selected rom the distribution determined by one o the two interpretations and its goal is to determine which interpretation was used. Secrecy means that the adversary cannot do much better than guessing. Main Results The technical core o our results is the next theorem. It states that patterns aithully represent the inormation that expressions reveal, even when expressions and patterns are mapped to bit-strings. Speciically, we prove that the distribution ensembles associated with E and pattern(e) are indistinguishable. Theorem 2. Let E be an acyclic expression. I Π is an IND-CPA secure encryption scheme and SS is a secure secret sharing scheme, then or any valuation it holds that E Π,SS pattern(e) Π,SS The proo o this theorem relies on a so-called hybrid argument. It is presented in the Appendix. Next, we build on Theorem 2 in order to establish a link between symbolic and computational notions o data secrecy. Lemma 3. Let E be an acyclic expression and T = {K 1,K 2,...,K l } keys(e) a set o keys. I Π is an IND-CPA secure encryption scheme and SS is a secure secret sharing scheme, then Data Acc E (T ) is computationally hidden in (E,K 1,K 2,...,K l ) with Π and SS. This lemma ollows rom the deinition o Acc E (T ) and Theorem 2. Since Acc E (T ) consists o the data symbols that occur in the pattern o (E,K 1,K 2,...,K l ), we have that: pattern((e,k 1,K 2,...,K l )) Π,SS 0 =pattern((e,k 1,K 2,...,K l )) Π,SS 1 or any two valuations 0 and 1 that coincide on Acc E (T ). We conclude by Theorem 2 and transitivity. Going urther, Theorem 4 relates the abstract semantics o a normalized protection P, as deined by the unction Acc P ( ), to the secrecy o data in the partially encrypted document associated with P. It requires that E(P ) be acyclic, as we would expect or protections derived rom policies (see Section 3). It states that i Journal o the ACM, Vol. V, No. N, Month 20YY.

16 Martín Abadi and Bogdan Warinschi some data is secret according to the abstract semantics o protections, then that data is in act computationally hidden. Thereore, we regard Theorem 4 as the main theorem o this paper. Theorem 4. Let P be a normalized protection such that E(P ) is an acyclic expression. Let T = {K 1,K 2,...,K l } keys(e(p )) be an arbitrary set o keys. I Π is an IND-CPA secure encryption scheme and SS is a secure secret sharing scheme, then Data Acc P (T ) is computationally hidden in (E(P ),K 1,K 2,...,K l ) with Π and SS. This theorem is an immediate corollary o Theorem 1 and Lemma 3. Consequences (Discussion) It is important to understand what our results imply, and also what they do not imply. Next we discuss some o their consequences. We ocus this discussion on the expression o Figure 2, {D 1, (OR, (AND, ({K 1 5} K1, {K 2 5} K1, {K 6 } K5 ), {K 6 } K4, {D 2, ({D 3 } K3, {D 4 } K4 )} K6 ), (D 5, ({D 6 } K2 )))} K1 which here we call E. Theorem 4 implies that i T = {K 1,K 3 } and i 0 and 1 are two valuations that coincide on D 1, D 2, D 3,andD 5, then E Π,SS 0 E Π,SS 1. The two valuations may dier on D 4 and D 6. The theorem implies that no probabilistic polynomialtime adversary can get signiicant partial inormation on the values o D 4 and D 6 (beyond any inormation that it has a priori). In other words, the set {D 4,D 6 } is computationally hidden in E, as one might have expected. The valuations 0 and 1 are allowed to result in some (partial or total) coincidences in values. For instance, we may have 0 (D 4 )= 0 (D 6 ) but 1 (D 4 ) 1 (D 6 ). The adversary cannot distinguish E Π,SS 0 and E Π,SS 1 despite these coincidences. Thus, value repetitions are concealed. In terms o the protection o Figure 1, the adversary cannot even tell whether the value o pat id o node 4 and the value o admin o node 6 are the same. Since the valuations 0 and 1 are universally quantiied, they may be chosen by the adversary. So, while the adversary obtains inormation about a document only by observing it, we allow or the possibility that the adversary may inluence the contents o the document. This possibility seems realistic in many potential applications. For instance, in our particular example, the adversary may be the patient whose record is in node 4, and may thereore have some inluence and prior knowledge on the contents o node 4. On the other hand, the guarantees that the theorem oers are perhaps not as strong as some might expect, in particular with respect to document structure. For instance, the theorem does not exclude that an adversary might be able to distinguish a bit-string representation o E rom a bit-string representation o the expression that we obtain rom E by replacing D 6 with a non-atomic expression. Theorem 4 notwithstanding, an adversary may learn a great deal about the length and even the structure o encrypted material. Journal o the ACM, Vol. V, No. N, Month 20YY.

Security Analysis o Cryptographically Controlled Access to XML Documents 17 Unortunately, this limitation o the theorem is not a shortcoming o our analysis, but rather the result o an intrinsic shortcoming in the technique that we are analyzing. Under the standard deinition o security that we adopt or encryption, one cannot expect to do much better Micciancio 2004. As explained above, encryption need not hide the length o plaintexts. (Indeed, the XML encryption method used by Miklau and Suciu does not.) Moreover, i E 0 and E 1 are expressions with dierent structures, the concrete bit-string implementations E 0 and E 1 may well have dierent lengths, so it is possible that their encryptions could be distinguished. The next section discusses stronger but less standard assumptions on encryption, and also other variants to our deinitions. 5. EXTENSIONS Alternative deinitions are certainly possible, and perhaps attractive. In this section we consider some alternative deinitions and corresponding extensions o our main results. We study additional security properties, addressing more stringent secrecy requirements and some more powerul attacks. Stronger Secrecy Requirements or Encryption In particular, we may wish to require that a ciphertext encrypted under an unknown key reveal nothing about the underlying plaintext not even its structure or its length Abadi and Rogaway 2002. Formally, we may replace the unctions p and pattern o Section 3. For instance, we may deine replacements p and pattern as ollows: p (B,T) =B or all B BData, p ((E 1,E 2,...,E m ),T)=(p (E 1,T), p(e 2,T),...,p (E m,t)), p ({E} Ki,T)={ } Ki i K i T, p ({E} Ki,T)={p (E,T)} Ki i K i T, where is a special symbol that represents undecryptable material, and pattern (E) =p (E,recoverable(E)) The equation p ({E} Ki,T)={ } Ki (or K i T ) relects the idea that encryption does not reveal anything about the encrypted plaintext. For instance, we have that pattern ({D 1 } K1 )=pattern ({({D 2 } K2, {D 3 } K3 )} K1 ). An analogue o Theorem 1 holds with this deinition. On the other hand, the other theorems o the paper are more problematic in this respect. In particular, an analogue o Theorem 2 does not hold. A stronger security requirement on encryption yields that analogue, and a corresponding strengthening o Theorem 4. That requirement is obtained rom the deinition o IND-CPA security by removing the restriction that the bit-strings m 0 and m 1 have equal length in queries (m 0,m 1 ) to the let-right oracle. Going urther, in another ormal variation we may replace p and pattern with the unctions p and pattern deined as ollows: p (B,T) =B or all B BData, p ((E 1,E 2,...,E m ),T)=(p (E 1,T), p(e 2,T),...,p (E m,t)), p ({E} Ki,T)= i K i T, Journal o the ACM, Vol. V, No. N, Month 20YY.

18 Martín Abadi and Bogdan Warinschi p ({E} Ki,T)={p (E,T)} Ki i K i T, thus hiding the occurrence o the key K i when is produced, and pattern (E) =p (E,recoverable(E)) The equation p ({E} Ki,T) = (or K i T ) implies that an adversary that does not know the encryption key K i a priori cannot identiy the use o K i. For instance, we have that pattern (({D 1 } K1, {D 2 } K1 )=pattern (({D 1 } K1, {D 2 } K2 ); this equation indicates that an adversary cannot tell whether the two pieces o data D 1 and D 2 are protected by the same key. Such guarantees might matter in settings where the security policy itsel is sensitive inormation. They do not hold with IND-CPA security, so they do not ollow rom Theorem 4. Again we need a strengthening o IND-CPA security, such as type-0 security Abadi and Rogaway 2002. For brevity, we omit precise statements and proos o the analogues o Theorems 2 and 4 or pattern and pattern. Attacks with Multiple Queries Deinition 3 is concerned with an adversary that attempts to recover secret inormation rom a single document. In this section, we consider more general settings in which the adversary attempts to extract inormation rom several related documents. As a motivating scenario, let us consider an electronic journal that oers several types o possible subscriptions. Each type enables access to a selection o journal sections. The access policy can be enorced by means o a protection: articles (and groups o articles) are encrypted and each subscriber receives an appropriate set o decryption keys. Some aspects o this scenario do not seem to be adequately captured by Deinition 3. In this scenario, an adversary may have access to several related documents (several issues o the journal), and may inluence the contents o one document on the basis o the contents o past ones (or instance, by publishing articles in the journal). User dynamics (new subscriptions and cancelled subscriptions) can cause urther complications. Next we give stronger security deinitions, with such scenarios in mind. Then we prove strengthenings o results o Section 4. These strengthenings do not require additional cryptographic hypotheses, and yield security guarantees with respect to more general, more active adversaries. For our deinitions, we introduce an oracle that enables an adversary to request encrypted versions o documents o its choosing. Formally, we let EKG(η) generate vectors o keys and key shares or security parameter η, and we write E φ,τ, b or the result o the procedure deined in Figure 3 or ixed τ and φ; then or expression E, encryption scheme Π, secret sharing scheme SS, and set S Data, and or any (τ,φ) EKG(η), R we introduce an oracle O(E Π,SS τ,φ,s,η,b) that expects to receive a pair o valuations ( 0, 1 ) that coincide on S and returns a sample rom E φ,τ, b. Thus, the oracle O uses a ixed pair (τ,φ); our results have analogues or the case in which O generates a resh pair (τ,φ) at each query. The oracle O also uses a ixed expression E; we have yet to study the case in which E may vary under control o the adversary. Journal o the ACM, Vol. V, No. N, Month 20YY.

Security Analysis o Cryptographically Controlled Access to XML Documents 19 Deinition 4 is a variant o Deinition 3 in which an adversary has access to such an oracle O: Deinition 4. Let E be an expression, Π an encryption scheme, SS asecret sharing scheme, and S Data. The set S is computationally hidden in E (with Π and SS) against multi-query adversaries i or any probabilistic polynomial-time adversary A the unction Adv dist-m Π,SS,E,S(A, η) = Pr (φ, τ) EKG(η) R Π,SS O(E :A τ,φ,s,η,1) (η) =1 Pr (φ, τ) EKG(η) R Π,SS O(E :A τ,φ,s,η,0) (η) =1 is negligible. The next theorem is an analogue o Theorem 2. Here, an adversary attempts to distinguish between E Π,SS τ,φ, and pattern(e)π,ss τ,φ, by adaptively choosing several valuations, or a randomly chosen pair (τ,φ) EKG(η). R The ormulation o the theorem relies on an oracle O(E, Π, SS, τ, φ, η) that expects to receive valuations as inputs, and that, on input, returns a sample rom distribution E τ,φ,.we call an adversary with access to such an oracle a pattern adversary. Theorem 5. Let E be an acyclic expression. I Π is and IND-CPA secure encryption scheme and SS is a secure secret sharing scheme, then or any probabilistic, polynomial-time pattern adversary A the unction Adv dist-pat Π,SS,E (τ,φ) (A, η) = Pr EKG(η) R :A O(E,Π,SS,τ,φ,η) (η) =1 Pr (τ,φ) EKG(η) R :A O(pattern(E),Π,SS,τ,φ,η) (η) =1 is negligible. The proo o this theorem resembles that o Theorem 2, and it is presented in the Appendix. Using this theorem, we obtain the ollowing result, which relates symbolic secrecy and secrecy with respect to multi-query adversaries. Lemma 6. Let E be an acyclic expression and T = {K 1,K 2,...,K l } keys(e) a set o keys. I Π is an IND-CPA encryption scheme and SS is a secure secret sharing scheme, then Data Acc E (T ) is computationally hidden in (E,K 1,K 2,...,K l ) with Π and SS against multi-query adversaries. In order to prove this theorem, we consider a multi-query adversary A, welets be the set Data Acc E (T ) and let E be the expression (E,K 1,K 2,...,K l ), and we aim to show that Adv dist-m Π,SS,E,S(A, η) is negligible. First we construct two pattern adversaries A 0 and A 1.Forb =0, 1, adversary A b executes A internally, and when A produces a pair o valuations ( 0, 1 ) as a query to its oracle, A b passes b to its oracle and orwards the answer to A. The output o A b is the inal output o A. It ollows rom the construction that or b =0, 1: Pr (τ,φ) EKG(η) R :A O(E,Π,SS,τ,φ,η) b (η) =1 = Pr (τ,φ) EKG(η) R :A O(E Π,SS τ,φ,s,η,b) (η) =1 Journal o the ACM, Vol. V, No. N, Month 20YY.

20 Martín Abadi and Bogdan Warinschi (Here, the probabilities are over the choice (τ,φ) R EKG(η), the coins o adversaries, and those o oracles; or brevity, we omit (τ,φ) R EKG(η) rom ormulas below.) We obtain: Adv dist-m Π,SS,E,S(A, η) =Pr A O(E Π,SS τ,φ,s,η,1) (η) =1 Pr A O(E Π,SS τ,φ,s,η,0) (η) =1 =Pr A O(E,Π,SS,τ,φ,η) 1 (η) =1 Pr A O(E,Π,SS,τ,φ,η) 0 (η) =1 ( ) = PrA O(E,Π,SS,τ,φ,η) 1 (η) =1 Pr A O(pattern(E ),Π,SS,τ,φ,η) 1 (η) =1 + ( ) Pr A O(pattern(E ),Π,SS,τ,φ,η) 1 (η) =1 Pr A O(pattern(E ),Π,SS,τ,φ,η) 0 (η) =1 + ( ) Pr A O(pattern(E ),Π,SS,τ,φ,η) 0 (η) =1 Pr A O(E,,τ,φ,1) 0 (η) =1 = Adv dist-pat Π,SS,E (A 1,η)+ ( ) Pr A O(pattern(E ),Π,SS,τ,φ,η) 1 (η) =1 Pr A O(pattern(E ),Π,SS,τ,φ,η) 0 (η) =1 + Adv dist-pat Π,SS,E (A 0,η) By Theorem 5, the irst and the last summands are negligible. The middle summand is 0 because A s queries ( 0, 1 ) consist o valuations that coincide on the set Acc E (T ), and (by the deinition o Acc E (T )) these are the only symbols that occur in pattern(e ). We conclude that Adv dist-m Π,SS,E,S(A, η) is negligible. Finally, Theorem 1 and Lemma 6 immediately yield the ollowing analogue o Theorem 4: Theorem 7. Let P be a normalized protection such that E(P ) is an acyclic expression. Let T = {K 1,K 2,...,K l } keys(e(p )) be an arbitrary set o keys. I Π is an IND-CPA secure encryption scheme and SS is a secure secret sharing scheme, then Data Acc P (T ) is computationally hidden in (E(P ),K 1,K 2,...,K l ) with Π and SS against multi-query adversaries. Attacks with Selective Decryption We close this section with a brie discussion o security under selective decryption attacks (SD-security or short) Canetti et al. 1996. Consider an attacker that receives a set o ciphertexts and then chooses a subset to be decrypted; the choice may depend on the ciphertexts received. The problem addressed by research on SDsecurity is to show that the remaining, undecrypted plaintexts are still protected. Generally, ormal methods do not give special consideration to selective decryption attacks; basically, selective decryption is not regarded as any more harmul than any other decryption, and it need not endanger undecrypted plaintexts. On the other hand, rom a computational perspective, research on selective decommitment Dwork et al. 2003 (a closely related subject) would suggest that proving the existence o SD-secure encryption schemes would require the development o new cryptographic techniques. This particular discrepancy between ormal and computational views need not be an immediate concern in the present setting, since it need not present the se- Journal o the ACM, Vol. V, No. N, Month 20YY.