Security Analysis for Order Preserving Encryption Schemes



Similar documents
Lecture 10: CPA Encryption, MACs, Hash Functions. 2 Recap of last lecture - PRGs for one time pads

Advanced Cryptography

1 Message Authentication

Introduction. Digital Signature

Security Analysis of DRBG Using HMAC in NIST SP

Order-Preserving Encryption Revisited: Improved Security Analysis and Alternative Solutions

MACs Message authentication and integrity. Table of contents

Chapter 11. Asymmetric Encryption Asymmetric encryption schemes

Lecture 3: One-Way Encryption, RSA Example

arxiv: v1 [math.pr] 5 Dec 2011

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Notes from Week 1: Algorithms for sequential prediction

Ch.9 Cryptography. The Graduate Center, CUNY.! CSc Theoretical Computer Science Konstantinos Vamvourellis

Outline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures

Security Aspects of. Database Outsourcing. Vahid Khodabakhshi Hadi Halvachi. Dec, 2012

Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm By Mihir Bellare and Chanathip Namprempre

1 Construction of CCA-secure encryption

Factoring & Primality

1 if 1 x 0 1 if 0 x 1

RSA Attacks. By Abdulaziz Alrasheed and Fatima

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

Computational Soundness of Symbolic Security and Implicit Complexity

Cryptographic hash functions and MACs Solved Exercises for Cryptographic Hash Functions and MACs

1 Digital Signatures. 1.1 The RSA Function: The eth Power Map on Z n. Crypto: Primitives and Protocols Lecture 6.

An Overview of Common Adversary Models

SYMMETRIC ENCRYPTION. Mihir Bellare UCSD 1

Elements of Applied Cryptography Public key encryption

Chapter 23. Database Security. Security Issues. Database Security

Lecture 13 - Basic Number Theory.

Post-Quantum Cryptography #4

Identity-Based Encryption from the Weil Pairing

CSC474/574 - Information Systems Security: Homework1 Solutions Sketch

The Goldberg Rao Algorithm for the Maximum Flow Problem

Message Authentication Code

Offline sorting buffers on Line

The Order of Encryption and Authentication for Protecting Communications (Or: How Secure is SSL?)

Yale University Department of Computer Science

Ky Vu DeVry University, Atlanta Georgia College of Arts & Science

Information Security in Big Data using Encryption and Decryption

Scheduling Real-time Tasks: Algorithms and Complexity

Talk announcement please consider attending!

CIS 5371 Cryptography. 8. Encryption --

Lecture 5 - CPA security, Pseudorandom functions

Department Informatik. Privacy-Preserving Forensics. Technical Reports / ISSN Frederik Armknecht, Andreas Dewald

I. INTRODUCTION. of the biometric measurements is stored in the database

Victor Shoup Avi Rubin. Abstract

Using Generalized Forecasts for Online Currency Conversion

CIS 6930 Emerging Topics in Network Security. Topic 2. Network Security Primitives

Lecture 9 - Message Authentication Codes

Cryptography and Network Security Chapter 9

International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013

An Efficient and Secure Key Management Scheme for Hierarchical Access Control Based on ECC

Provable-Security Analysis of Authenticated Encryption in Kerberos

Programmable Order-Preserving Secure Index for Encrypted Database Query

Homework # 3 Solutions

The Mathematics of the RSA Public-Key Cryptosystem

Private Approximation of Clustering and Vertex Cover

Non-Black-Box Techniques In Crytpography. Thesis for the Ph.D degree Boaz Barak

Simulation-Based Security with Inexhaustible Interactive Turing Machines

Completion Time Scheduling and the WSRPT Algorithm

MESSAGE AUTHENTICATION IN AN IDENTITY-BASED ENCRYPTION SCHEME: 1-KEY-ENCRYPT-THEN-MAC

Secure Network Communication Part II II Public Key Cryptography. Public Key Cryptography

Adaptive Online Gradient Descent

A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA

Applied Algorithm Design Lecture 5

Lecture 15 - Digital Signatures

6.852: Distributed Algorithms Fall, Class 2

Counter Expertise Review on the TNO Security Analysis of the Dutch OV-Chipkaart. OV-Chipkaart Security Issues Tutorial for Non-Expert Readers

Faster deterministic integer factorisation

Identity-based Encryption with Post-Challenge Auxiliary Inputs for Secure Cloud Applications and Sensor Networks

Privacy-Preserving Aggregation of Time-Series Data

On-Line/Off-Line Digital Signatures

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range

The application of prime numbers to RSA encryption

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur

Cryptography and Network Security

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

Network Security. Computer Networking Lecture 08. March 19, HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23

Lecture 2: Complexity Theory Review and Interactive Proofs

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

24. The Branch and Bound Method

Breaking Generalized Diffie-Hellman Modulo a Composite is no Easier than Factoring

Computational Complexity: A Modern Approach

CryptoVerif Tutorial

Lecture 13: Factoring Integers

1 Formulating The Low Degree Testing Problem

Message Authentication Codes 133

Key Agreement from Close Secrets over Unsecured Channels Winter 2010

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Transcription:

Security Analysis for Order Preserving Encryption Schemes Liangliang Xiao University of Texas at Dallas Email: xll052000@utdallas.edu Osbert Bastani Harvard University Email: obastani@fas.harvard.edu I-Ling Yen University of Texas at Dallas Email: ilyen@utdallas.edu Abstract The development of third-party hosting, IT outsourcing, service clouds, etc. raises important security concerns. It is safer to encrypt critical data that is hosted by a third party. However, a database must be able to process queries on the encrypted data. Many algorithms have been developed to support search query processing on encrypted data, including order preserving encryption (OP E) schemes. Security analysis plays an important role in the design of secure algorithms. It aids in understanding the level of security that is assured by an algorithm. Currently, security analysis of OP E schemes is limited. In [3], the authors defined an ideal OP E object and constructed an OP E scheme SE m,n that is computationally indistinguishable from the ideal object. Thus the security of the proposed OP E scheme is identical to that of the ideal OP E object. However, the security of the ideal object has not been analyzed. In this paper, we study the security of OP E schemes by analyzing the number of bits z h of the plaintext that remain secret from the adversary against a known plaintext attack with h known plaintexts. First, we derive the upper bound z h h for any OP E scheme, where m is the size of the domain. We then derive an asymptotic lower bound on z h c h for the ideal OP E object, where c is a constant and 0 < c < 1. The value of c is then numerically computed for different m and n values, where n is the size of the range. These two bounds prove that z h = Θ( h ). Based on the security analyses, we conclude that the ideal OP E object achieves one-wayness security, i.e., the probability for the adversary to fully recover the plaintext encrypted by the ideal OP E object against an h known plaintext attack is a negligible function of the secure parameter if h = o(m ɛ ), 0 < ɛ < 1, and n = m 3. The results presented in the paper not only help improve our understanding of the security of OP E schemes and guide its parameter selections, but also provide a general method for analyzing their security. Index Terms Order preserving encryption; information theory; average min-entropy; known plaintext attacks. I. INTRODUCTION For over a decade, it has been common practice for companies to outsource their online business logics to Web hosting service providers. Generally, this involves the storage of databases which potentially contain sensitive information as well as the execution of access logics to the databases. The growing ubiquity of cloud computing further pushes forward this paradigm by creating a whole spectrum of third-party hosting business. With the many benefits of outsourcing, such as reduced computation and personnel management costs, major security concerns emerge. For example, if the hosting service provider is compromised, the adversary can retrieve the sensitive data of the client companies. Alternavitely, if there is a change in management of the hosting company, such as reorganization or buyout [8], the potential threat increases due to the additional exposure to multiple management personnel and to lack of established policies regarding the handling of critical information in such situations. The security problems with the outsourced databases can be solved if the critical data are encrypted. The problem of how the database management system (DBMS) can process search queries on encrypted data naturally arises. The important class of methods to realize search is by using order preserving encryption (OP E) [1], [2], [3], [6], [7] schemes. An OP E scheme is a deterministic symmetric-key encryption scheme that preserves the order of the plaintexts. Compared to the methods discussed above, it is not a perfectly secure encryption scheme since ciphertexts inevitably leak the order information of the plaintexts. On the other hand, search queries can be processed efficiently using conventional DBMS techniques, e.g. establishing the B+ tree on ciphertexts. An OP E scheme can therefore be a good choice when it is necessary to simultaneously maintain a reasonable performance for range query processing and to achieve a certain degree of security. In analyzing potential applicability to a problem, it is therefore important to know exactly how much security an OP E scheme can provide, which requires further analysis. There are various constructions of OP E schemes. In [2], the proposed OP E algorithm first generates a sequence of random numbers and then encrypts an integer x to the sum of the first x random numbers. In [6], a sequence of strictly increasing polynomial functions are used to construct the OP E algorithm. The encryption of an integer x is the outcome of the iterative operations of those functions on x. In [7], the OP E algorithm is constructed by using a mapping function composed of partition and identification functions. The partition function divides the range into multiple partitions, and the identification function assigns an identifier to each partition. Then, the mapping function maps an integer x to an identifier. Since different integers may be mapped to the same identifier, the OP E algorithm may output false comparison results. In [1], the authors construct the OP E algorithm following three steps: (1) modeling the input and target distributions, (2) flattening the plaintext database into a flat database, and (3) transforming the flat database into the cipher database. Unfortunately, security analysis for the OP E schemes has

not been widely investigated. In [1], the authors construct an OP E scheme and analyzes its security, but the analysis has some limitations: (1) it assumes that the adversaries can only view ciphertexts, and (2) the analysis is not based on cryptographic analysis, but on experiments, i.e., they use the Kolmogorov-Smirnov test to show that the distribution of the ciphertexts and the target distribution cannot be distinguished. [3] initiates the cryptographic study of OP E schemes. It defines the security of an OP E scheme using the ideal object and constructs the OP E scheme to satisfy the security implied by the ideal object. However, the security of the ideal object has not yet been analyzed. We have proven the one-wayness security of the ideal OPE object proposed in [3]. In this paper, we discuss the proof techniques. The detailed proofs are given in our technical reports [9], [10]. Our contributions include: 1) We follow security definitions in the literature [4], [5] and define the security metric for an OP E scheme as the expected number of bits of a plaintext remaining secret against the known plaintext attack. With this definition, security of OP E scheme can be derived by computing the average min-entropy of the plaintext given the ciphertext and known plaintext/ciphertext pairs. Let z h denote the security of OP E scheme against an h known plaintext attack. 2) Since analytically computing the value of z h is very difficult, we have developed novel methods to compute the lower bound on z h for an ideal OP E object. We first consider the average min-entropy in the case of no plaintext attacks, i.e., we calculate z 0. When there are h > 0 known plaintexts, the domain is divided into h + 1 sub-domains. Within each sub-domain, there is no plaintext attack and the average min-entropy can be derived using the result for z 0. Therefore z h is the expected average min-entropy derived from the average min-entropy of the sub-domains. The lower bound on z h can then be determined considering all possible positions of the h known plaintexts. To compute z 0, we first prove that given a ciphertext y, the probability of a plaintext being x in the domain has a hypergeometric distribution. Therefore computing z 0 for an ideal OP E object is equivalent to computing the average min-entropy of the hypergeometric distribution. Due to its complexity, no existing work has successfully derived the average min-entropy for hypergeometric distribution. In this paper, we have developed mathematical techniques to derive a lower bound on this quantity and obtain a lower bound on z 0. 3) The upper bound on z h can be derived by computing the average min-entropy against any given set of h plaintext/ciphertext pairs. We then consider the potential worst case scenario obtained from the analysis of the lower bound on z h and use it to derive an upper bound for z h. 4) The analysis results of (2) and (3) yield security bounds on what the ideal OP E object can achieve, namely z h = Θ( h ). The results not only help improve our understanding of the security of OP E schemes, but also provide a general method for analyzing the security of specialized encryption schemes where the perfect security is not feasible. 5) The security of an OP E scheme is influenced by the relation between the size of the domain, m, and the size of the range, n. In [3], an appropriate choice of n for a given m is questioned but not answered. Our security analyses reflect the impact of such relation (Lemma III.9). More specifically, it shows that when n = m 3 (and h = o(m ɛ ), 0 < ɛ < 1), the ideal OP E object achieves one-wayness security, i.e., the probability for the adversary to fully recover the plaintext encrypted by the ideal OP E object is a negligible function of the secure parameter. The remainder of the paper is organized as follows. In Section II, we define and discuss the OP E scheme, attack model, and security metric z h. In Section III, we first derive an upper bound on z h for any OP E scheme (Subsection III-A). We then introduce the ideal OP E object (Subsection III-B) and derive a lower bound on z h for the ideal OP E object (Subsection III-C). Finally, we conclude the paper in Section IV. II. DEFINITIONS As defined in [3], an OP E scheme is a deterministic symmetric-key encryption scheme which preserves the order of plaintexts. The formal definition is as follows. Definition II.1 (OP E scheme). For m n, let [m] = {i 1 i m} denote the domain of plaintexts and [n] = {j 1 j n} denote the range of ciphertexts. Suppose that SE m,n = (K m,n, E m,n, D m,n ) is a deterministic symmetrickey encryption scheme, where K m,n : {0, 1} {0, 1} is a key generation algorithm, E m,n : [m] {0, 1} [n] is a deterministic symmetric-key encryption algorithm, and D m,n : [n] {0, 1} [n] is a decryption algorithm such that for x [m] and for any valid key k, D m,n (E m,n (x, k), k) = x. We say that SE m,n is an OP E scheme if for any valid key k, x < x E m,n (x, k) < E m,n (x, k). In [3], the authors attempt to find a security game which can precisely define the security that the OP E scheme can achieve. It is proven that no OP E scheme can satisfy the standard notion of the security game, i.e., indistinguishability under chosen plaintext attacks (IND-CPA), or even certain weakened notions of security games, i.e., indistinguishability under distinct chosen-plaintext attacks (IND-DCPA) and indistinguishability under ordered chosen-plaintext attacks (IND- OCPA). We do not follow the approach of developing increasingly weakened notion of security games for the following reasons: first, these failed attempts illustrate the difficulty in finding a security game that can precisely define the security

that an OP E scheme can achieve (i.e., neither excessive nor overly restrained). Second, even if such a definition is found, it will likely be overly complicated due to an excessive number of weakening conditions. This would render the security notion difficult or even impossible to apply in practice. Instead, we follow a common information theoretic approach of computing the expected number of bits z h of the plaintext that remain secure under a plaintext attacks with h known plaintexts, which is defined to be the average min-entropy of the ciphertext given h known plaintexts. First, we need to establish the attack model for the analysis. The security notions considered in [3], e.g. IND-CPA, IND- DCPA, and IND-OCPA, are all related to chosen plaintext attacks. In the security game of IND-CPA, the adversary is allowed to make queries of the form {(x 0 i, x1 i )}h i=1. Afterwards the left-right-encryption-oracle will return the ciphertexts {E(x b i, k)}h i=1 to the adversary, where b is a randomly selected bit. The security of the encryption scheme depends on how precisely the adversary can predict b. The form of queries in the game of IND-CPA is specialized to facilitate the definition of indistinguishability. IND-DCPA and IND- OCPA consider similar security games except for the fact that they give additional constraints on the queries. Another effective security game against the OP E scheme is to reverse the order of the chosen plaintext attack, i.e., the adversary is given the ciphertext, called the challenge, and subsequently chooses the plaintexts. In this case, the adversary can reverse E m,n (x, k) by the following binary-search chosen plaintext attack. The adversary begins the attack by choosing the midpoint p = m+1 2, and asks the encryption oracle to encrypt p. If E m,n (x, k) = E m,n (p, k), then the adversary knows that x = p. If E m,n (x, k) > E m,n (p, k), then the adversary knows that x > p. She can continue the attack by choosing p+m the plaintext 2. If E m,n(x, k) < E m,n (p, k), then the adversary knows that x < p. Then she continues the attack by choosing the plaintext 1+p 2. Thus, after at most chosen plaintext attacks, the adversary can reverse E m,n (x, k). The security games in these models are too strong and OP E schemes cannot achieve the security level of such security games. We develop a new attack model by considering a common scenario in third party hosting with potential external attacks. Let O denote the owner of a database DB, where DB and its corresponding querying logic are hosted on the Web by a third party Host. DB is encrypted using an OP E scheme to protect its secrecy and O holds the encryption key. DB can be accessed by various clients in CL and O may distribute the encryption key to legitimate clients in CL. The goal is to protect DB from potential attacks. We assume that a public key infrastructure is in place and the identities of individuals in O, CL, Host, and outsiders can be authenticated correctly. Note that it is not possible to protect DB against any key holders in O and CL. At the same time, it is not possible for an individual (attacker) without a key to arbitrarily choose a plaintext and obtain the corresponding ciphertext. An attacker may happen to know some plaintexts and be able to find out the corresponding ciphertexts. Thus, we do not consider chosen plaintext attacks such as those in [3]. Instead, we consider the known plaintext attack model, where the adversary is given a ciphertext E m,n (x, k) (called the challenge) to compromise. The attack model is formally given in Definition II.2. Definition II.2 (attack model). The known plaintext attack model we consider involves an adversary with h pairs of known plaintexts and ciphertexts. Let KP h = {(x i, E m,n (x i, k)) 1 i h} denote the set of h plaintext/ciphertext pairs. Then, the adversary is given a ciphertext E m,n (x, k) (called the challenge). The goal of the adversary is to compromise x from the challenge E m,n (x, k) based on KP h. Next, we need to determine how to generate the challenge E m,n (x, k) in the attack model. Since the encryption algorithm E m,n is deterministic, the adversary can always reverse the ciphertexts E m,n (x i, k), where 1 i h, since x i is a known plaintext. Thus, we assume that x is selected from [m] instead of [m]. Note that in the security definition of conventional deterministic (probabilistic) encryption schemes, it is required that the adversary cannot retrieve any bit of information of any selected x [m] (x [m]) from the corresponding ciphertext against the known plaintext attack. That is to say the choice of x should not affect the security result. However, the OP E scheme cannot reach such security level. Suppose that the adversary knows the plaintexts/ciphertext pairs in the set KP 2 = {(x, E m,n (x, k)), (x + 2, E m,n (x + 2, k))}, where 1 x m 2. Since ciphertext y is encrypted from plaintext x + 1 if and only if E m,n (x, k) < y < E m,n (x + 2, k), the adversary can reverse plaintext x + 1 from E m,n (x + 1, k) based on KP 2. Therefore, worst-case security is not suitable for quantifying the security of the OP E scheme. Hence, we consider average-case security for the OP E scheme instead. We assign weights to the elements in [m], and consider the expected security on [m]. Factors such as data access distribution and adversary s personal interest could affect weight assignments on [m]. However, without prior information of the application environment, there is no way to tell which data is more/less important. Thus, in this paper, we assume that the elements in [m] are evenly weighted, i.e., x is uniformly selected from [m]. The security analysis based on this assumption can be the basis for further analysis considering non-evenly weighted [m] for the choice of the challenge. According to the attack model discussed above, the security of OP E schemes can be measured by the average minentropy defined in [5], where, the average min-entropy of X (conditioned on Y ) H (X Y ) is used to measure the leakageresilient degree (i.e., the average number of secure bits) of a random variable X if some additional information Y is leaked. We present the concept of min-entropy H (X) and average min-entropy H (X Y ) (conditioned on Y ) as follows. Definition II.3 (min-entropy and average min-entropy [4],

[5]). Let X be a random variable. The min-entropy of X is H (X) = ax Pr[X = x]. x Let (X, Y ) be a pair of random variables. The average minentropy of X conditioned on Y is H (X Y ) = log y H (X Y =y) Pr[Y = y]2 Now we define the security metric based on the average min-entropy in the following Definition II.4. Definition II.4 (security metric). Consider the attack model defined in Definition II.2, where x is selected uniformly randomly from [m] = [m] {x i 1 i h}. Let X m denote a random variable uniformly distributed over [m] = [m] {x i } h i=1, Y m,n = E m,n (X m, k) denote the ciphertext with respect to the random variable X m, and KP h = {(x i, E m,n (x i, k))} h i=1 denote the set of h known plaintext/ciphertext pairs. The security of OP E scheme SE m,n is z h = H (X m Y m,n, KP h ), where H denotes the average min-entropy. III. SECURITY ANALYSIS OF OP E SCHEMES We study the security of OP E schemes by deriving the upper and lower bounds on z h. In Subsection III-A, we derive an upper bound on z h for any OP E scheme SE m,n = (K m,n, E m,n, D m,n ) based on a specific known plaintext attack. We then consider the lower bound on z h based on the ideal OP E object SEm,n(K m,n, Em,n, Dm,n) introduced in Subsection III-B. However, since the known plaintext/ciphertext set KP h = {(x i, Em,n(x i, k)) 1 i h} is not determined, it is difficult to derive the lower bound on z h = H(X m Y m,n, KP h ) directly. Instead, we take the following approach to get the lower bound on z h. First, we consider the case h = 0 and derive the lower bound on z 0 (Subsubsection III-C1). Let E h denote the event that the adversary reverses the ciphertext based on KP h. Then there is a 1-to-1correspondence between Pr(E h ) and z h, i.e., Pr(E h ) = 2 z h. Therefore, the upper bound on z 0 can be transformed to the lower bound on Pr(E 0 ). Also, note that KP h cuts the domain into h + 1 segments and the range into h + 1 segments so that the OP E algorithm encrypts the plaintexts from each sub-domain to the corresponding sub-range. Hence, we apply the lower bound on Pr(E 0 ) to each sub-domain and sub-range pair in order to get the lower bound on Pr(E h ) (Subsubsection III-C2). Finally, we get the upper bound on z h by reversing the one-to-one correspondence between Pr(E h ) and z h. A. Upper Bound on z h for Any OP E Scheme In the following lemma, we give an upper bound on z h for any OP E scheme. In this lemma and for the remainder of this paper, the base of the logarithm operator log is 2 and the base of the natural logarithm operator ln is e. Lemma III.1. For any OP E scheme SE m,n = (K m,n, E m,n, D m,n ), z h h.1 B. The Ideal OP E Object We introduce the ideal OP E object as defined in [3]. The set of strictly increasing functions is denoted as SIF m,n = {f : [m] [n] x < x f(x) < f(x )}. Let F m,n denote a uniform random variable on SIF m,n. The definition of ideal OP E object is as follows. Definition III.2 (ideal OP E object). Let SE m,n(k m,n, E m,n, D m,n) denote the ideal OP E object. To initialize the encryption scheme, it uniformly randomly chooses a function f SIF m,n and sets the key to be k = f. Given a plaintext x, the encryption algorithm returns ciphertext y = f(x), i.e., E m,n(x, f) = f(x). Given a valid ciphertext y = E m,n(x, f), the decryption algorithm returns f 1 (y), i.e., D m,n(y, f) = f 1 (y) = f 1 (E m,n(x, f)) = f 1 (f(x)) = x. For values of m and n, for example, n = m 2 and m = Θ(2 1024 ), the ideal OP E object is computationally infeasible since it involves choosing f uniformly randomly on the set SIF m,n. Since every set of m elements selected from [n] uniquely determines a strictly increasing function f SIF m,n, SIF m,n = ( ) n m. Then SIFm,m 2 = ( ) m 2 m, which is on the order of m m. Since generating a uniformly random variable on a set of order N takes time log N, in such a case it will take time exponential in the secure parameter to generate the ideal OP E object. In [3], Boldyreva, et al. define that an OP E scheme SE m,n (K m,n, E m,n, D m,n ) is secure if for any x [m], the ciphertext E m,n (x, k) is computationally indistinguishable from the ciphertext Em,n(x, f) obtained from the ideal encryption object. They then prove that the constructed scheme SE m,n is secure because E m,n (x, k) and Em,n(x, f) are computationally indistinguishable. We outline one of Boldyreva s constructions in the following. The encryption function E m,n maps a plaintext x to its corresponding ciphertext by a process similar to binarysearch on the ciphertext space [n] with the searched points being mapped back to the plaintext space [m] using the hypergeometric distribution. More specifically, given y [n] and x [m], the probability that the encryption scheme maps y to x, i.e., E m,n (x, k) y < E m,n (x + 1, k) is given by ( y n y n 1. x)( m x)( m) However, while in [3], the authors reduce the security of SE m,n to security of the ideal object, they do not analyze the security of the ideal OP E object. As an obvious counter example, the ideal object is not secure when n = m. Indeed, there exists no secure OP E scheme when n = m because the encryption algorithm is necessarily the identity function. In [3], the authors left open the questions of how to measure the security of ideal object and how to choose n given m. We 1 All proofs in this paper are omitted due to the space limitation

show that by choosing n m 3 > 1, the probability for the adversary to fully recover a plaintext is a negligible function of the secure parameter if the number h of known plaintext/ciphertext pairs satisfies h = o(m ɛ ), 0 < ɛ < 1. C. Lower Bound on z h for the Ideal OP E Object We take the following steps to derive the lower bound on z h for the ideal OP E object. In Subsubsection III-C1 we analyze the special case where the adversary has no knowledge of any plaintext/ciphertext pairs, i.e., h = 0. To do so, we first derive a formula for z 0, namely z 0 = i 1)( m i) n j log n 1 j [n] max i [m]. We then prove that ( m 1) n 1 there exists a constant 0 < c < 1 such that z 0 c for all n > m 2 > 1. Thus, in the case h = 0, the probability for the adversary to recover x is at most 2 z0 = 2 c = m c. Then, in Subsection III-C2, we consider the case where the adversary has knowledge of h plaintext/ciphertext pairs. We derive an upper bound on Pr(E h ), i.e., the expected probability for the adversary to fully recover a plaintext given its ciphertext and h known plaintext/ciphertext pairs. Note that the known plaintext/ciphertext pairs will split the domain and range into intervals. We first prove a lemma giving an upper bound on the expected number of short intervals, and use the previous result to bound the probability on the remaining long intervals. Finally, we use these results to derive a lower bound on z h. In Subsubsection III-D, we numerically compute the values of c = z0. 1) The Case h = 0: We begin by proving a lower bound on z 0 and the corresponding upper bound on Pr(E h ). To do so, we first give the formula of z 0 in the following Lemma III.3, which is the simplification of the average min-entropy formula of the hypergeometric random variable. Lemma III.3. For ideal OP E object, z 0 = log n 1 j [n] max i [m] i 1)( m i) n j ( m 1) n 1. We then prove four technical lemmas. Note that there is a max function over i [m] in the formula of z 0. In Lemma III.4 we show that the maximum value can be achieved only if i [ mj n+1, mj n+1 + 1]. Lemma III.5 gives a bound on each term of the hypergeometric distribution. Lemma III.4. Given j, m, n, achieves the maximum value only if i [ mj n+1, mj i 1)( m i) n j ( m 1) n 1 n+1 + 1]. 1 Lemma III.5. Let 2 < σ < 1, j [ n m, n n σ m ], and σ i [ mj n+1, mj n+1 + 1]. Then for every ɛ > 0 there exist c ɛ,σ,1, c ɛ,σ,2, and m ɛ,σ such that )( n j ) c ɛ,σ,1 m 1 2 i 1 m i ) c ɛ,σ,2 m 1 σ 2, for n > m 2 and m m ɛ,σ. ( n 1 m 1 Summation will prove z 0 c with the given conditions n > m 2 and m > m c. We will use the conclusion of the next lemma to show that we can in fact choose m c = 1. Lemma III.6. For any m c > 0, there exists n c > 0 and 0 < c mc,n c < 1 such that for m m c and n n c. z 0 c mc,n c The lemma allows us to conclude the proof of z 0 c for n > m 2 > 1. Note that the set {(m, n) 1 < m m c, m 2 z < n < n c } is finite and that 0 > 0 for (m, n) in that set. Since we have already obtained two nonzero lower z bounds on 0, the first for the case n > m2 and m > m c and the second for the case n n c and 1 < m m c, we can choose c to be the minimum of 1) the bound in the first case, 2) the bound in the second case, z 3) the set of values 0 for (m, n) in the finite set of remaining values. In other words, we can choose m c = 1. We now summarize the lower bound on z 0 in the following Theorem III.7. Theorem III.7. For ideal OP E object, there exists a constant 0 < c < 1 such that for n > m 2 > 1, z 0 c. According to information theory, we have the following Corollary III.8 based on Theorem III.7. It gives an upper bound on the probability for the adversary to reverse x from the ciphertext E m,n(x, f). Corollary III.8. Let f be chosen uniformly randomly from SIF m,n and let x be chosen uniformly randomly from [m]. Let E 0 denote the event that the adversary obtains x from the ciphertext E m,n(x, f). Then for n > m 2 > 1, Pr (E 0 ) 2 c = m c. 2) The General Case: We now consider the case of h known plaintext attacks with the set of plaintext/ciphertext pairs KP h = {(x i, y i )} h i=1, where y i = Em,n(x i, f), 1 i h. In this case, the plaintexts x i will cut the domain into h + 1 segments [1, x 1 ), (x 1, x 2 ),..., (x h 1, x h ), (x h, m], and the ciphertexts y i will similarly cut the range into h + 1 segments [1, y 1 ), (y 1, y 2 ),..., (y h 1, y h ), (y h, n]. Since the encryption algorithm Em,n is order-preserving, it encrypts the plaintexts from the sub-domains [x i + 1, x i+1 1] to the sub-ranges [y i + 1, y i+1 1], where 0 i h and x 0 = y 0 = 0, x = m + 1, y = n + 1. We will proceed by applying Corollary III.8 to each pair of [x i + 1, x i+1 1] and [y i + 1, y i+1 1], 0 i h. In order to do so, we first give the following lemma. It analyzes the relationship of the distance between a pair of plaintexts x and x with the distance between the corresponding pair of ciphertexts Em,n(x, f) and Em,n(x, f), in particular that for n m 3, Em,n(x, f) Em,n(x, f) 1 is greater than (x x 1) 2 with a dominant probability. Lemma III.9. Suppose that n m 3. Let x [m] and 1 δ m x 1. Let y = E m,n(x, f) and choose δ to satisfy

y + δ + 1 = E m,n(x + δ + 1, f), where f is chosen uniformly randomly from SIF m,n. Then Pr(δ δ 2 ) 1 m 2. Now we can apply Corollary III.8 to each sub-domain [x i + 1, x i+1 1] and sub-range [y i + 1, y i+1 1] to get an upper bound on Pr(E h ). Proposition III.10. Let f be chosen uniformly randomly from SIF m,n, where n m 3. Assume that the adversary knows h plaintexts/ciphertexts pairs Em,n(x i, f), 1 i h. Let x be chosen uniformly randomly from [m] = [m] {x i } h i=1 and let E h denote the event that the adversary obtains x from the ciphertext Em,n(x, f) based on KP h. Then ( ) c h + 1 Pr(E h ) + 1 m h m 2. The upper bound on Pr(E h ) can be translated into the corresponding lower bound on z h for ideal OP E object in the following theorem. Theorem III.11. For ideal OP E object, there exists a constant 0 < c < 1 such that for n m 3 > 1, z h c h h + 1 1 (ln 2)m 2 c. D. Numerically compute the value of c Here we include a graph showing numerically computed values of c = z0 as a function of m. We include the cases n = m 2 and n = m 3. These estimates translate into estimates for z h, the number of bits of information that are guaranteed to remain secret from the adversary in the case of an attack with h known plaintext/ciphertext pairs. The corresponding probability for the adversary to recover the plaintext from the ciphertext without any known plaintext/ciphertext pairs is ( m h c = z0/ ) c 0.46 0.45 0.44 0.43 + 1 m 2. n = m 2 n = m 3 0 100 200 300 400 500 m As can be seen from Fig. 1, for 20 m 500, the value of c is well over 0.4, indicating that more than 40% of the bits of a plaintext are protected from the adversary, rendering it unlikely for the adversary to recover the complete plaintext despite the order preserving nature of the encryption scheme. A more precise analysis of the values of c for large m would greatly enhance our understanding of the security of the algorithm for typical values of m, such as m = 2 1024. We have proved in Theorem III.7 that c c as m for both n = m 2 and n = m 3. We conjecture that c 0.5 in both cases. IV. CONCLUSION In this paper, we have studied the security of the OP E schemes by analyzing the expected number of bits z h of the plaintext remaining secret from the adversary against known plaintext attacks. First, we derived an upper bound on z h for any OP E scheme against a known plaintext attack. Then, we derive a lower bound on z h for the ideal OP E object. These two inequalities bound the security that the ideal OP E object can achieve. In the security analysis, we also derive a nontrivial upper bound on Pr(E h ), i.e., the probability for an adversary to fully recover the plaintext x under a known plaintext attack with h known plaintext/ciphertext pairs. The results show that although the adversary may retrieve some information about the plaintext x, the probability for the adversary to fully recover the plaintext x is a negligible function of if the number h of known plaintexts/ciphertext pairssatisfies h = o(m ɛ ), 0 < ɛ < 1, and n m 3. REFERENCES [1] R. Agrawal, J. Kiernan, R. Stikant, and Y. Xu, Order-preserving encryption for numeric data, ACM SIGMOD International Conference on Management of Data 2004, pp. 563-574. [2] G. Bebek. Anti-tamper database research: Inference control techniques, Technical Report EECS 433 Final Report, Case Western Reserve University, November 2002. [3] A. Boldyreva, N. Chenette, Y. Lee, A. O Neill, Order-preserving symmetric encryption, Eurocrypt 2009, pp. 224-241. [4] Y. Dodis, L. Reyzin, A. Smith, Fuzzy extractors: How to generate strong keys from biometrics and other noisy data, SIAM Journal on Computing, 2008, 38(1):97C139. [5] S. Dziembowski, K. Pietrzak, Leakage-Resilient Cryptography, FOCS 08, 2008, pp. 293-302. [6] S. C. Gultekin Ozsoyoglu,David Singer, Anti-tamper databases: Querying encrypted databases, Conference on Database and Applications Security, August 2003. [7] H. Hacigumus, B. R. Iyer, C. Li, and S. Mehrotra, Executing SQL over encrypted data in the database-service-provider model, ACM SIGMOD Conference on Management of Data, June 2002. [8] Nancy C. Lee and Jim Finn, Oracle buys PeopleSoft, http://www.oracle.com/corporate/press/2004 dec/acqisition.html. [9] Liangliang Xiao, I-Ling Yen, Dongdai Lin, Security Analysis for an Order Preserving Encryption Scheme, Tech Report UTDCS-06-10, 2010, http://utdallas.edu/ xll052000/opeproof-tr1.pdf, revised version: http://utdallas.edu/ xll052000/opeproof-tr2.pdf. [10] Liangliang Xiao, Osbert Bastani, I-Ling Yen, Security Analysis for Order Preserving Encryption Schemes, Tech Report UTDCS-01-12, 2012, http://utdallas.edu/ xll052000/opeproof-tr3.pdf. Fig. 1. Numerically Computed c = z 0 / Against m.