Department Informatik. PrivacyPreserving Forensics. Technical Reports / ISSN Frederik Armknecht, Andreas Dewald


 Clyde Hicks
 2 years ago
 Views:
Transcription
1 Department Informatik Technical Reports / ISSN Frederik Armknecht, Andreas Dewald PrivacyPreserving Forensics Technical Report CS April 2015 Please cite as: Frederik Armknecht, Andreas Dewald, PrivacyPreserving Forensics, FriedrichAlexanderUniversität ErlangenNürnberg, Dept. of Computer Science, Technical Reports, CS , April FriedrichAlexanderUniversität ErlangenNürnberg Department Informatik Martensstr Erlangen Germany
2
3 PrivacyPreserving Forensics Frederik Armknecht, Andreas Dewald IT Security Infrastructures Dept. of Computer Science, University of Erlangen, Germany Abstract In many digital forensic investigations, data needs to be analyzed. However, this poses a threat to the privacy of the individual whose s are being examined and in particular becomes a problem if the investigation clashes with privacy laws. This is commonly addressed by allowing the investigator to run keyword searches and to reveal only those s that contain at least some of the keywords. While this could be realized with standard cryptographic techniques, further requirements are present that call for novel solutions: (i) for investigationtactical reasons the investigator should be able to keep the search terms secret and (ii) for efficiency reasons no regular interaction should be required between the investigator and the data owner. We close this gap by introducing a novel cryptographic scheme that allows to encrypt entire boxes before handing them over for investigation. The key feature is that the investigator can noninteractively run keyword searches on the encrypted data and decrypt those s (and only those) for which a configurable number of matches occurred. Our implementation as a plugin for a standard forensic framework confirms the practical applicability of the approach. ACKNOWLEDGMENTS We want to thank Michael Gruhn for his cooperation and help in this project. This report presents the full version of the paper published at the DFRWS USA 2015 conference [1] and includes a more detailed security proof and further additional information. I. INTRODUCTION Digital forensics denotes the application of scientific methods from the field of computer science to support answering questions of law [2], [3], which usually includes the acquisition, analysis and presentation of digital evidence. In such investigations, privacy is protected by individual domestic laws prohibiting unreasonable searches and seizures, such as the Fourth Amendment to the United States Constitution. Formally, privacy is a human right, which means following the definition of the U.N. universal declaration of human rights that [n]o one shall be subjected to arbitrary interference with his privacy, [...] or correspondence [...] [and] [e]veryone has the right to the protection of the law against such interference or attacks [4, Article 12]. In many nations, domestic laws are very stringent and regarding investigations within a company further require companies to have works councils and data protection commissioners to protect the privacy of employees and customers. Especially the analysis of data is subject to controversial discussions, especially when the private use of the corporate account is allowed. It is in this duality between forensics and privacy where special requirements to forensic tools arised [5]. A. Motivation Whenever a company becomes victim of accounting fraud or is involved in cartellike agreements, this usually prompts largescale investigations. In such investigations, mostly management documents and s are reviewed, because especially communication data often provides valuable pieces of evidence. However, it also contains very sensitive and private information. As stated above, the company may be legally obligated by domestic law to protect the privacy of the employees during the invesigation as far as possible. Current methods that are applied to this end usually make use of filtering to prevent all s from being read: The investigator is allowed to run keyword searches and only those s that contain at least some of the keywords are displayed. In this way, the investigator shall find such s that are relevant to the case, but as few unrelated s as possible to preserve privacy. The conceptually most simple realization is that the data remains at the company and that the investigator sends the search queries to the company who returns the search results. However, this comes with two problems. First, the company would learn all search queries while for investigationtactical reasons, it may be necessary to keep them secret. Second, the need for constantly contacting the company s IT puts further burden on both
4 sides. Therefore, common practice is that the investigator is granted access to the entire data and conducts the search. There, other privacy problems arise: Because of the risk to overlook the socalled smoking gun when not reading every , investigators may be tempted to sidestep the filter mechanism. A prominent example of such deviation from the search order is the case United States v. Carey [6], even though in this case no filtering was actually used at all. B. Contributions Our contribution is a novel approach for privacypreserving forensics, which allows for noninteractive threshold keyword search on encrypted s (or generally text). That is a user holding the encrypted text can autonomously search the text for selected keywords. The search process reveals the content of an if and only if it contains at least t keywords that the investigator is searching for. Otherwise, the investigator learns nothing  neither about the content of the nor whether any of the selected keywords are contained. Our scheme allows for privacypreserving forensics using the following process: 1) Company s IT extracts requested boxes. 2) Company (IT or data protection officer) encrypts the s by applying our scheme. 3) Encrypted data is extradited to third party investigators. 4) Investigators are able to run keyword searches on the encrypted data and can decrypt those (and only those) s that contain at least t of the keywords that they are looking for. This way, the investigators are able to conduct their inquiry in the usual way, while not threatening privacy more than necessary. We want to emphasize that the investigator holds the entire data and can search within the data without further interaction with the company. In particular, the company learns nothing about the search queries executed by the investigators. For the investigator, nothing changes and keyword search can be performed as usual, because the decryption is handled transparently in the background after a successful keyword search, as demonstrated later in Section VI. The only overhead introduced for the company is to determine the threshold t, i.e. decide how many relevant keywords must be matched for decryption, and to encrypt the data. Besides choosing the threshold t, the company may further decide, which keywords should be relevant for the case (i.e. whitelisting keywords for which the investigator might obtain a match), or in the contrary decide which words shall not allow the investigator to decrypt s, such as pronouns, the company s name and address, salutations or other simple words (i.e. blacklisting keywords). Note that both approaches can be applied in our scheme, however, we further refer to the blacklisting approach, as we assume it to be more practical in most cases, because a blacklist can easily be adopted from case to case once it has been built. A whitelist has to be determined strongly depending on the case, but there might be cases where it is worthwile to apply the whitelist approach. After configuring the threshold t and the white/blacklist (once), the company can just invoke the encryption algorithm on the data and hand it over for investigation. The proposed scheme uses only established cryptographic building blocks: a symmetric key encryption scheme, a hash function, and a secret sharing scheme. The facts that for each of these allegedly secure implementations do exist and that the security of the overall scheme solely relies on the security of these building blocks allow for concrete realizations with high confidence in the security. Last but not least, we present an implementation of our scheme as a plugin for the forensic framework Autopsy [7]. Experiments and performance measurements conducted on real folders demonstrate the practical applicability of our approach. C. Outline The remainder of the paper is structured as follows: We introduce our solution and give a high level description of the scheme and algorithms in Section II. In Section III, we provide an indepth explanation of the scheme and Section VI shows our prototype implementation and evaluates the practical applicability. In Section VII, we give an overview of related work and conclude our paper in Section VIII. II. THE PROPOSED SCHEME This section provides a description of our scheme, focusing on explaining the motivation behind the design, the overall structure and work flow. For readability, we omit several details on the concrete realization of the components. These will be delivered in Section III. Because an investigator should only be able to decrypt and read such s that match his search terms, the investigator should not be able to learn anything about the encryption key of other s when successfully decrypting one . Thus, we use a cryptographic scheme that protects each individually. In order
5 Identifier P C K W w n t s F Meaning Plaintext Ciphertext Secret key Set of keywords Single Keyword Number of keywords Threshold Share of the secret key Mapping keywords to shares TABLE I QUICK REFERENCE OF USED IDENTIFIERS. to encrypt an entire mailbox, this scheme is applied to each of the mailbox. The overall situation with respect to one can be described as follows: A plaintext P is given together with a list of keywords W := {w 1,..., w n }. (for quick lookup, a complete list of the used identifiers is given in Table I). In our context, P would be the that needs to be protected and W denotes the set of words within this . For security reasons, this list excludes trivial words like the, or other general words that can be configured using the already mentioned blacklist. The scheme provides two functionalities: a protection algorithm Protect and an extraction algorithm Extract. The task of the protection algorithm is to encrypt the given plaintext P to a ciphertext C to prevent an outsider to extract any information about the plaintext. However, any user who knows at least t 1 keywords from W should be able to extract the original plaintext P from the ciphertext C. To this end, in addition the protection algorithm outputs some helper data. The helper data enables a user who knows at least t keywords to extract the plaintext P from the ciphertext C by applying the extraction algorithm. The scheme uses three established cryptographic primitives: a symmetric key encryption scheme, a hash function, and a secret sharing scheme. We explain these on the go when needed in our description. A. The Protection Mechanism The protection algorithm (see Algorithm 1) has the task to encrypt an in a way that allows for a keyword search on the encrypted data. In a nutshell, it can be divided into the following three different steps: a) Step P.1: Encrypt Message: To protect the content of P, it is encrypted to C using an appropriate encryption scheme(line 2). To this end, a secret key K is used which is randomly sampled (line 1). If the encryption scheme is secure, the plaintext becomes Algorithm 1 The Protection Algorithm Protect Input: A plaintext P, a set of n keywords W, and an integer threshold t 1. Output: A ciphertext C, a mapping F (representing the helper data) {Step P.1: Encrypt Message} 1: Sample a random key K {0, 1} lk 2: Encrypt the plaintext P to ciphertext C under the secret key: C := Enc K (P ) {Step P.2: Split the Secret Key into Shares} 3: Invoke the Split algorithm on K with inputs t (= threshold) and n (= number of shares) to get n shares of K: (s 1,..., s n ) := Split(K, n, t) where s 1,..., s n {0, 1} lk {Step P.3: Connect Keywords to Shares} 4: Create a mapping F such that F (w i ) = s i for all i = 1,..., n. 5: return Ciphertext and mapping (C, F ) accessible if and only if the secret key can be determined. The two remaining steps ensure that if a user knows t keywords in W, he can reconstruct K and thus decrypt the plaintext P from C. b) Step P.2: Split Secret Key into Shares: Recall that we want only users who know at least t correct keywords from W (i.e. keywords that occur in the and that are not blacklisted), to be able to extract P. This thresholdcondition is integrated in this step, while in the subsequent (and last) step, the connection to the keywords in W is established. More precisely, in this step a secret sharing scheme (as proposed by Shamir [8], for example) is used to reencode the secret key K into n shares s 1,..., s n (line 3). This operation, referred to as Split, ensures that (i) the secret key K can be recovered (using an algorithm Recover) from any t pairwise distinct shares s i1,..., s it but (ii) if less than t shares are available, no information about K can be deduced at all. Observe that the number of shares is equal to the number n of keywords in W, that is the keywords within the except those in the blacklist. So far, only standard cryptographic techniques have
6 been used in a straightforward manner. It remains now to chain the shares derived in this step to the keywords in W. This can be seen as one of the main novel contributions of the proposed scheme. c) Step P.3: Connect Keywords to Shares: In this final step, a mapping F is constructed which maps each keyword w i to one share s i. That is F (w i ) = s i for all i = 1,..., n. This ensures that a user who knows t distinct keywords w i1,..., w it W can derive the corresponding shares s i1 = F (w i1 ),..., s it = F (w it ) using F. Given these, the secret key K is recovered by K := Recover(s i1,..., s it ) that allows to decrypt P := Dec K (C). The calculation of this mapping F is more complicated and is explained in the details in Section III. The output of the protection mechanism is the ciphertext C = Enc K (P ) and information that uniquely specifies the mapping F. Observe that F actually represents the helper data mentioned before. For efficiency reasons, it is necessary that F has a compact representation. In the next section, we now explain how a forensic investigator might reconstruct the plaintext of an P from (C, F ), if and only if this contains the t keywords he is searching for. B. The Extraction Mechanism The task of the extraction mechanism (see Alg. 2)) is to allow an investigator to decrypt the encrypted , if this contains the terms he is searching for. Formally speaking, the mechanism allows a user who has knowledge of t correct keywords to extract the original plaintext P from (C, F ). Like for the protection mechanism, the extraction mechanism is divided into three steps. Each step is more or less responsible for reverting one of the steps of Protect. This needs to be done in reversed order (i.e., the first step E.1 reverts P.3, and so on). Assume now that the ciphertext C and helper data F is given and the user aims to recover the original plaintext using t keywords w 1,..., w t, which are the words he is searching. The three steps are the following: Step E.1: Compute Possible Shares From the keywords w 1,..., w t, we compute t possible shares by applying the mapping F (line 2). That is s i := F (w i ) for i = 1,..., t. Step E.2: Determine Possible Secret Key In this step, the user aims to recover the secret key K from the shares s 1,..., s t. To this end, he invokes the corresponding algorithm Recover from the secret sharing scheme on s 1,..., s t and receives a value K (line 4). Observe that by assumption on Recover, Algorithm 2 The Extraction Algorithm Extract Input: A ciphertext C, a mapping F, t keywords w 1,..., w t Output: A plaintext P {Step E.1: Compute Possible Shares} 1: for all i = 1,..., t do 2: Apply the mapping F to keyword w i to get a possible share, i.e., 3: end for s i := F (w i). {Step E.2: Determine Possible Secret Key} 4: Given the t candidates for shares of K, the corresponding recovery algorithm of the underlying secret sharing scheme is applied to get a candidate for the secret key: K := Recover(s 1,..., s t). {Step E.3: Try to Decrypt Ciphertext} 5: Apply the decryption algorithm to C with key candidate and get a plaintext candidate P : P = Dec K (C). 6: if P is a valid plaintext then 7: return plaintext P 8: else 9: return 10: end if it holds that only if those t keywords really occured in the and were not blacklisted, the output of F will be t correct shares of the secret key. Otherwise (when there were less than t or no matches in this ) K is a random value and the user does not learn which of the used search terms were correct, if any. More formally, it holds that K = K if and only if {s 1,..., s t} {s 1,..., s n }. Step E.3: Decrypting the Encrypted Plaintext In the final step, the user aims to extract P from C. That is he computes P = Dec K (C). According to the common requirements on secure encryption algorithms, it holds that P = P if K = K. Otherwise should be some meaningless data that is not obviously related to P. We assume an efficient algorithm that can reject false plaintexts (explained later) and outputs in such cases. Otherwise the P
7 final output is P. This means, in the last step the algorithm is able to determine if K was valid and otherwise does not display the useless data to the investigator. III. DETAILED DESCRIPTION OF THE COMPONENTS In this section, we provide both, a generic description based on appropriate cryptographic primitives, but also suggest a concrete choice of algorithms and parameters. In principle the following three mechanisms need to be realized: An encryption/decryption scheme that allows to efficiently identify wrongly decrypted plaintexts (steps P.1, E.3) A secret sharing scheme (steps P.2, E.2) A method to create mapping F (steps P.3, E.1) In the course of this section, we explain those three building blocks in the given order. A. Encryption/Decryption Scheme Reasonable choices are encryption schemes that (i) have been thoroughly analyzed over a significant time period and (ii) are sufficiently efficient. A good candidate is AES [9], being the current encryption standard. Although AES has been designed to encrypt 128 bit blocks only, it can encrypt larger chunks of data if used in an appropriate mode of operation like the Cipher Block Chaining (CBC) Mode [10]. That is a plaintext P is divided into several 128bit blocks P = (p 1,..., p l ) with p i {0, 1} 128 for each i (if the bit size of P is not a multiple of 128, some padding is used at the end). With respect to the key size, the AES standard supports 128 bit, 192 bit, and 256 bit. As according to the current state of knowledge, 128 bit security is still sufficient, we propose to use AES128. That is in the following the keylength is l K = 128 and Enc refers to AES128 in CBCmode; other choices are possible. Given a plaintext P, it is first transformed into a sequence (p 1,..., p l ) of 128bit blocks. To allow for identifying wrongly decrypted plaintexts (line 6), we prepend a 128bit block p 0 which consists of a characteristic padding pad, e.g., the allzero bitstring pad = 0,..., 0. Then, given a 128bit key K {0, 1} 128, the sequence of 128bit blocks (p 0, p 1,..., p l ) is encrypted to (c 0,..., c l ). More precisely, the CBCmode involves a randomly chosen initial value IV that will be chosen at the beginning. Then, the ciphertext blocks c i are computed as follows: c 0 := Enc K (pad IV) (1) c i := Enc K (p i c i 1 ), i = 2,..., l (2) The output of the encryption process is (IV, c 0,..., c l ). With respect to the decryption procedure, it holds that pad := Dec K (c 0 ) IV (3) p i := Dec K (c i ) c i 1, i = 2,..., l (4) Observe that also a legitimate user may be forced to retry several selections of keywords until he is able to access the plaintext. Hence, it is desirable that the decryption procedure can be aborted as soon as possible if a wrong key is tried. This is the reason for the introduction of the characteristic padding pad. Given the encrypted blocks (c 0,..., c l ) and a key candidate K, a user can find out if K = K by checking (3). Only if this is fulfilled, the decryption process is continued. Otherwise, it is aborted and the next in the data set is tested for a valid match of all the t search terms. B. Secret Sharing Scheme The concept of secret sharing is well known in cryptography since long. In our proposal, we adopt the secret sharing scheme (SSS) proposed by Shamir [8]. As its concrete form has impact on the realization of the mapping construction, we recall its basic ideas here: Let K {0, 1} 128 denote the secret key. In the following, we treat K as an element of the finite field GF(2 128 ). Next, let the threshold value t be given and an integer n which should represent the number of shares (equally to the number of different, nonblacklisted keywords in the ). We aim to derive n shares s 1,..., s n such that any choice of t shares allows for reconstructing K. To this end, let n + 1 pairwise different values x 0,..., x n GF(2 128 ) be given. More precisely, the values x 1,..., x n will be the result of computations explained in Section IIIC. Then, x 0 is chosen randomly in GF(2 128 ) \ {x 1,..., x n }. These values will later play the role of xcoordinates. Next, a polynomial p(x) of degree t 1 is chosen such that p(x 0 ) = K. Finally, the n shares are computed by s i := (x i, y i ) GF(2 128 ) GF(2 128 ) with y i = p(x i ). We denote this overall procedure as (s 1,..., s n ) := Split(K, x 1,..., x n, t) (5) Observe that each share represents one evaluation of the polynomial p at a different point. It is well known that a polynomial of degree t 1 can be interpolated from any t evaluations. Hence, recovery is straightforward: Given t distinct shares s ij = (x ij, p(x ij ), the first step is to interpolate the polynomial p(x). Once this is accomplished, the secret is determined by evaluating p(x) at x 0, that is K := p(x 0 ). However, if less shares
8 are known or if at least one of them is incorrect (that is has the form (x, y ) with y p(x)), the derived value is independent of K. Hence, this realization fulfills the requirements mentioned above. This procedure is denoted by C. Creating the Mapping K := Recover(s i1,..., s it ). (6) It remains to discuss how a mapping F can be constructed that maps the keywords w i {0, 1} to the secret shares s i {0, 1} 256. We propose a mapping F that is composed of two steps as follows: The first step uses a cryptographically secure hash function H : {0, 1} {0, 1} 256. For example, one might instantiate H with SHA256 [11] or the variant of SHA3 [12] that produces 256 bit outputs. In addition, a random value salt {0, 1} is chosen. Then, the first step is to compute the hash values h i of the concatenated bitstring salt w i, that is h i = H(salt w i ) i = 1,..., n. (7) Observe that h i {0, 1} 256. In the following, we parse these values as (x i, z i ) GF(2 128 ) GF(2 128 ). The values (x 1,..., x n ) will be the xcoordinates as explained in Section IIIB. This requires that they are pairwise distinct. As each x i is 128 bits long, it holds for any pair (x i, x j ) with i j that the probability of a collision is if the hash function is secure. Due to the birthday paradox, the overall probability for a collision is roughly n 2 / This probability can be neglected in practice as n is usually by several magnitudes smaller than In case of the improbable event that two values collide, this step is repeated with another value for salt. Next, we assume that the secret sharing scheme has been executed (see eq. (5)) based on the values x 1,..., x n determined by the results from (7). That is we have now the situation that n hash values h i = (x i, z i ) GF(2 128 ) GF(2 128 ) are given and likewise n shares s i = (x i, y i ) GF(2 128 ) GF(2 128 ). We complete now the construction of F. First, a function g is constructed which fulfills g(x i ) = y i z i i = 1,..., n. (8) We elaborate below how g can be efficiently realized. Given such a function g allows to realize a bijective mapping G : GF(2 128 ) GF(2 128 ) GF(2 128 ) GF(2 128 ) (x, y) (x, g(x) y) This mapping G can be easily inverted and hence is bijective. Moreover, it maps a hash value h i = (x i, z i ) to G(h i ) = G(x i, z i ) = (x i, g(x i ) z i ) (9) (8) = (x i, (y i z i ) z i ) (10) = (x i, y i ) = s i. (11) Now F is defined as the composition of H and G, that is F = G H. (12) Algorithm 3 summarizes the steps of F. As we suggest the use of a known and widely analyzed hash function H, we can omit its specification in the output of Protect. In fact, the only information that is necessary for evaluating F is the function g(x). The realization that we propose below requires only n field elements which is optimal in the general case. As we suggest the use of GF(2 128 ), this translates to a representation using 128 n bits. The bijectivity of F ensures that only correct hash values map to valid shares. Algorithm 3 The Mapping F Input: A keyword w {0, 1}, a hash function H : {0, 1} {0, 1} 256, and a polynomial g(x) Output: A possible share s = (x, y) 1: Compute h := H(salt w) with h {0, 1} 256 2: Parse h = (x, z) GF(2 128 ) GF(2 128 ) 3: Compute s := (x, z g(x)) 4: return Possible share s It remains to explain how g : GF(2 128 ) GF(2 128 ) can be efficiently realized. Observe that any mapping from a finite field to itself can be represented by a polynomial. Hence, an obvious approach would be to use the tuples (x i, y i z i ) to interpolate a polynomial p(x) of degree w such that p(x i ) = y i z i for i = 1,..., n. However, the asymptotical effort for interpolation is cubic in the degree (here n) and our implementations revealed that this approach is too slow even for the average case. Therefore, we derived another approach: The core idea is to split the input space (here: GF(2 128 )) into several distinct sets and to define for each set an individual subfunction. That is given an input x, the function g is evaluated on x by first identifying the subset of GF(2 128 ) that contains x and then to apply the corresponding subfunction to x. This approach requires to derive the corresponding subfunctions which we accomplish by plain interpolation. However, as each
9 set now contains significantly less values x i than w, the overall effort is significantly reduced. To see why, let l denote the number of sets and assume for simplicity that each set contains n/l values x i. Then the overall effort to interpolate these l subfunctions of degree n/l each is l (n/l) 3 = n 3 /l 2. That is compared to the straightforward approach of interpolating g directly, our approach is faster by a factor of roughly l 2. For the splitting of GF(2 128 ), i.e., the set of all 128 bit strings, we suggest to use the k least significant bits where k 1 is some positive integer 1. More precisely, for a value x GF(2 128 ), we denote by x[1... k] the first k least significant bits (LSBs). This gives a natural splitting of GF(2 128 ) into 2 k pairwise disjunct subsets. Two elements x, x GF(2 128 ) belong to the same subset if and only if their k LSBs are equal, that is x[1... k] = x [1... k]. Let X := {x 1,..., x n } denote the n values determined by (7). For I {0, 1} k, we define X I := {x X x[1... k] = I}. (13) This yields a separation of X into 2 k pairwise distinct subsets: X = I {0,1} k X I. For each set X I, a corresponding subfunction g I is determined which fulfills g I (x i ) = y i z i for all x i X I. (14) As already stated, this is accomplished by simple interpolation. That is if n I := X I denotes the size of X I, the function g I can be realized by a polynomial of degree n I which will be close to n/2 k on average. This completes the construction of g and hence of G as well. The description of g requires only to specify the subfunctions g I, that is g ˆ=(g (0...00), g (0...01),..., g (1...11) ) (15) where each index represents a kbit string. As each g I is of degree n I, it can be described by n I coefficients in GF(2 128 ). Because of I {0,1} k n I = n it follows that g can be fully specified by n field elements. The evaluation of g is as explained above. Given an input x GF(2 128 ), first I := x[1... n] is determined, and then g I (x) is computed which represents the final output. That is g(x) := g I (x) where I := x[1... n]. IV. SECURITY ANALYSIS In this section, we analyze the security of our scheme. To this end, we investigate if the confidentiality of the protected message P is ensured against an attacker who 1 In our realization, k ranges from 1 to 8. knows the ciphertext C and the mapping F. In the following, we assume that the deployed cryptographic primitives are secure, i.e., the encryption scheme, the secret sharing scheme, and the hash function. For example, this means that an attacker cannot determine the plaintext P from ciphertext C in reasonable time, and that guessing sufficient shares is impractical as well. As for all these cryptographic primitives established realizations do exist which haven t shown any weaknesses up to date, this is a reasonable assumption. Moreover, if one deploys our scheme with a primitive which turns out to be insecure later on, it is (conceptually) easy to replace it by another concrete scheme that is still considered to be secure. Assume now that an attacker A has access to a ciphertext C and a specification of F and aims to recover plaintext P. As we assume the encryption scheme to be secure, the probability for a successful attack is negligible if F is not taken into account. Hence, we focus on attackers who aim to exploit knowledge of F somehow. Recall that the purpose of F is to connect the keywords to the shares of K and is in particular independent of the plaintext. Thus, the knowledge of F can only be helpful for recovering the secret key (what of course would eventually allow to reveal the plaintext). Recall that F is only indirectly related to K. The purpose of F is to have a 1:1mapping between the keywords in W and the shares connected to the key. Moreover, as the secret sharing scheme is assumed to be secure, A can only be successful if she can determine t correct shares. Hence security analysis boils down to the question of the effort to find t correct shares while possibly using the knowledge of F. We now explain why knowledge of F does not help for determining any shares. It holds for the used secret sharing scheme that for any chosen secret K any combination (x, y) {0, 1} 128 {0, 1} 128 can potentially be a share. As the mapping G is bijective, there exists for each choice of (x, y) a corresponding input (x, z) such that G((x, z)) = (x, y). Moreover, recall that according to the specification of the scheme, the input (x, z) does not represent the keywords directly but its hash value h. If the hash function H is secure, any (x, z) can be a hash value with an inifinite number of preimages. Hence, the corresponding preimage (x, z) of a share candidate (x, y) likewise doesn t tell if (x, y) can be a share. Summing up, neither the domain nor the range of F or G yields any information about the used shares. What about the structure of F resp. G? Recall that G is specified by subfunctions g I of g where the degree of g I tells an
10 upper bound of the number of x I X I (see (14)). As we suggest that each subfunction g I should have degree of at least t 1 (end of Section III), an attacker cannot decide for any X I if it contains xcoordinates of the shares. We have seen that the knowledge of F cannot be exploited for recovering any of the shares. Moreover, guessing the shares directly would be practically impossible and in fact takes more effort than guessing the key directly. Thus, the only attack that may be potentially relevant are socalled dictionary attacks. In a dictionary attack, the attacker A has access to a dictionary of N keywords W 1,..., W N and tries different combinations of t keywords until she successfully recovered the secret key K (and hence, eventually the initial plaintext as well). In the remainder of this section, we analyze the effort for such an attack. To this end, we make the assumption that each keyword combination is equally likely in principle. That is from an attacker s point of view, there exist no combinations of keywords that for whatever reason have a higher probability to be correct than others. For sure, this may not hold in practice. For example, the language of the initial plaintext and the context of this message may be taken into account for such an attack. But as this is hard to model for the general case, we consider it as out of scope of this work at hand. A further assumption we make is that the dictionary contains all keywords associated to P, that is W {W 1,..., W N }. The reasons is that security analysis should evaluate security against attackers that are as powerful as possible. In the following, we call a selection of t keywords W 1,..., W t as good if it allows to recover the plaintext by applying the Extract algorithm, i.e., P := Extract(C, F, W 1,..., W t). (16) By construction of the scheme, any choice of t keywords in W is good. That is, the number of good choices is at least ( ) n t. Theoretically, there could be more good combinations. Let p(x) the polynomial that has been used in the secret sharing scheme for determining the shares (x i, y i ), i = 1,..., n. That is it holds that p(x i ) = y i for each i. Assume now that by coincidence it happens for a keyword W W that (x, y ) := F (W ) for which p(x ) = y holds as well. Then any combination of t keywords taken from W {W } would likewise be good. In particular the number of good combinations would be at least ( ) n+1 t in such a case. While such effects can happen in principle, they are extremely unlikely. Due to the security of the involved hash function H and the bijectivity of G, randomly chosen inputs W {0, 1} to F lead to uniformly distributed outputs (x, y ) GF(2 128 ) GF(2 128 ). Hence, the probability of the event that p(x ) = y is If the dictionary contains N n keywords that do not belong to W, the expected number of keywords W that by conincidence give a correct share is N n 2 1 as N has to be for practical reasons. 2 In a similar manner, one can argue that the probability that a combination of t keywords that are not all contained in W gives the correct plaintext is Also here one cannot expect that this practically happens. Summing up, the number of good combinations is exactly ( ) n t with overwhelming probability. Hence, the dictionary attack translates to the following procedure: The) attacker continuously selects randomly one of the combinations W 1,..., W t and applies ( N t the Extract algorithm until he recovers the plaintext, i.e., until he hits by chance one of the ( ) n t good combinations. Observe that an attacker is forced to execute the complete Extract algorithm. Omitting step 1 (applying F to W 1,..., W t for getting a combination of possible shares (x 1, y 1 ),..., (x t, y t) would force the attacker to predict the hash values of W i, being infeasible if the hash function is secure. As discussed above any value (x, y) GF(2 128 ) GF(2 128 ) can be a valid share. Even if all xcoordinates x 1,..., x t would fall into the same subset X I, this combination could be possible (from an attacker s point of view) as the corresponding subfunction g I has degree t. That is there is no indication that X I may contain less than xcoordinates of the valid shares. The only event that may tell an attacker that a combination is invalid is if two xcoordinates coincide, i.e., x i = x j for i j. However, the probability for this is again and hence can be ignored. Summing up, at the end of step 1 an attacker cannot exclude yet that the considered combination of keywords may be good. Now step 2 gives a value K GF(2 128 ) that may be the correct key. Whether this is the case or not cannot be decided at this stage as the secret key does not contain any redundant information that may allow for distinguishing it from random values. Hence, the attacker has to perform the third and last step as well for figuring out whether the considered combination of keywords was good. Altogether this shows that the only possible attack is a plain dictionary attack. We evaluate the effort of a successful attack for our sample cases in Section VIB7. 2 Already N = 2 50 would imply a dictionary of several Terabytes and is still smaller than by multiple magnitudes.
11 A. Security Model V. SECURITY ANALYSIS In this section, we analyze the security of our scheme against a malicious investigator. Before we provide a formal proof of the security of the scheme, we define the underlying notion of security. 1) Preliminaries: To simplify the discussion, we express security asymptotically with respect to some security parameter λ. That is we assume that all parameters have been chosen according to λ and that all algorithms, including the attacker, has a runtime polynomial in λ. Moreover, the key length l K is chosen such that 2 lk is negligible in λ. If not otherwise stated, the term negligible will refer to a function that is negligible in λ. Rephrasing all notions and arguments for concrete security parameters is straightforward. In the following, we assume some distribution D on the set of possible plaintexts. Whenever we say that a plaintext is chosen, we mean that it is sampled according to D. The distribution D may be known to an attacker. Moreover, we assume that the size of the plaintext and the ciphertext are known to the attacker. In general, we consider a set of public parameters that are known to all involved parties, including any possible attacker. This set will contain at least the full specification of the scheme, the value t chosen for the threshold, and possibly the distribution D, but may also comprise further information. In our discussion, we also slightly change the notation for the scheme. As already explained, a description of F can be given by specifying the underlying hash function H and the function g. As H will be fixed, we say that Protect simply outputs (C, g) instead of (C, F ) to better reflect the actual values that have been generated by Protect. Likewise, Extract gets as input g instead of F. 2) Attacker model: An attacker A will always refer to a PPT (probabilistic polynomialtime Turing machine) that plays the role of the investigator and aims to gain knowledge about the plaintexts from given ciphertexts. More precisely, as each plaintext is individually encrypted using a different key, security definition restricts to one given ciphertext and the probability that the attacker can recover information about the underlying plaintext. In real world scenarios, this information may help to guess the keywords of another plaintext. This effect is captured by the distribution D. That is the distribution may change depending on the knowledge of the attacker and hence affects the security of the next ciphertext. This is also one of the main reasons why we aim for a security definition that is independent of D. Recall that an investigator needs to be able to run keyword search on the encrypted plaintexts. That is for a given ciphertext, the investigator can execute the extraction algorithm Extract with some chosen keywords w 1,..., w t to possibly decrypt the ciphertext. Consequently, a malicious auditor will have at least the same possibility as well, making for example dictionary attacks unavoidable (see also Sec. VC). In such an attack, the attacker tries different (plausible) keywords until she manages to decrypt the ciphertext. Informally, security means that no attacker can do significantly better. In other words, for any attacker A with access to Extract it does not affect the probability of success significantly whether (C, g) are known as well or not. This means that revealing C and g to an adversary does not induce any further security risks. Observe that this represents the optimal level of security unless one restricts the capabilities of a (possibly honest) investigator. To capture this basic functionality, we introduce an oracle E which is defined with respect to some fixed (C, g) and simply executes Extract on (C, g). That is the oracle E accepts as input t keywords w 1,..., w t and returns Extract(C, g, w 1,..., w t). We express this oracle by E (C,g) but omit the index (C, g) in most cases as it is clear from the context. In the following we assume that any attacker has blackbox access to E (C,g) for the given tuple (C, g). We denote this by A E. Moreover, an attacker may get as additional input some information about (C, g). This is likewise formalized by introducing an information oracle I. Similar to E, the oracle I is defined for a fixed tuple (C, g) which we often do not write down explicitly. Moreover, I = I τ is defined with respect to a deterministic function τ which operates on inputs (C, g). Upon request, I τ returns τ(c, g). For the definition of security, only the two extreme cases are interesting: either τ simply outputs the complete tuple (C, g) or outputs nothing, i.e., outputs. In the first case, the information oracle is denoted by I id while it is simply omitted in the second case. Observe that an attacker that does not learn the full input (C, g) from I can never be more powerful than an attacker who has the full knowledge. Hence, if we show that in both extreme cases the success probability is practically the same, it holds in particular for any attacker who gets partial information on (C, g) only. Summing up, any attacker A has access to two oracles: the extraction oracle E, representing the basic functionality, and an information oracle I τ which simply returns τ(c, g) to A. We express this by A E,Iτ.
12 3) Security Definition: As already stated, the fact that a (malicious) investigator should be able to do keyword search on the data inevitably allows for dictionary attacks. This cannot be prevented, independent of how the data is protected and no matter whether the scheme is interactive or not. Consequently, the best one can ask for is that using a protection scheme restricts an attacker to attacks that only use this functionality. That is using the protection scheme and revealing (C, g) provides no significant advantage to an adversary. For a formal definition, we consider the following security game between a hypothetical challenger and an attacker A E,Iτ. The challenger samples some plaintext P, runs the protection algorithm to get (C, g), and then runs the attacker who gets blackbox to E(C, g) and I τ. The security game now works as follows for an attacker A E,Iτ : 1) Choose a plaintext P (according to the distribution D) and extract the set W of keywords from P. 3 2) Execute Protect(P, W, t) and get (C, g) 3) Run A E(C,g),Iτ (C,g). The attacker wins the game if she outputs P. We denote by Adv(A E,I ) the success probability of A to win the game where the probability is taken with respect to the random coins of the challenger and the attacker. An experienced reader might wonder why winning requires to recover the whole plaintext. In cryptography, usually a weaker winning condition is considered, e.g., predicting the output of a Boolean function on input of P. The reason that we aim for full plaintext recovery is the following. As already elaborated, an unavoidable, basic attack is a dictionary attack where an attacker aims to decrypt the ciphertext by guessing correct keywords. 4 Our goal is to show that no further attacks are possible. As the dictionary attack aims to output the full plaintext, we have to adopt this as the winning condition. We are now ready to give the formal security definition. As already motivated, one cannot avoid that an attacker, i.e., a malicious investigator, misuses access to E to recover the plaintext. The only option would be to restrict the capabilities of an investigator which would severely change the overall scenario. Hence, the best we can ask for is that knowing (C, g) does not increase the probability of success significantly compared to an attacker who makes only keyword searches. Usually, one would formalize this by requesting that any attacker who 3 We assume that an attacker knows this procedure as well. 4 We call keywords correct or good if they are contained in the plaintext and hence potentially allow to decrypt the ciphertext. has access to (C, g) can be simulated by an attacker without this knowledge. However, one technical problem arises. While one can show that the knowledge of (C, g) does not help to recover the plaintext, an attacker may use this knowledge in such a way that makes it impossible to simulate it. One example is that after an attacker has recovered P, it selects several combinations of t keywords from P and checks if g always yields eventually the same key. This cannot be simulated by an algorithm that has access to E only. However, it is still possible to come up with a meaningful definition of security that can also be shown. The core idea is to show that for getting the maximum success probability, an attacker does not need to know (C, g). More precisely, the maximum success probability among the set of attackers that know (C, g) is practically the same as for the set of attackers that do not know (C, g). This is formalized as follows: Definition 1. A scheme as defined in section II is secure if it holds that max { Adv(A E,I id ) } { } max Adv(A E ) (17) A E,I id A E is negligible in the security parameter λ. Here, max A E,I id means the maximum over all attackers A that have access to the indicated oracles and so on. Observe that attackers considered in the second expression have no access to any information oracle and hence learn nothing about (C, g). Thus, the definition expresses that the best an attacker can do with the knowledge of (C, g) is practically the same as the best one can do without this knowledge. B. Proof of Security 1) Main Theorem: In this section, we prove the security of the scheme described in section II according to the security definition given in definition 1. Recall that the proposed scheme deploys three standard cryptographic components: a hash function, an encryption scheme, and a secret sharing scheme. While the latter provides informationtheoretic security, the first two can be only computational secure. To abstract from any potential weaknesses of the concrete choices of the encryption function and the hash function, we follow the common approach and model these as an ideal cipher and a random oracle, respectively. That is the hash function H is modeled by an oracle that accepts as input bit strings x {0, 1} of arbitrary length and outputs the corresponding hash value. The crucial point is that a user
13 knows only these hash values he has asked for. That is a user can only know H(x) if he has explicitly asked for. Otherwise the value is completely undetermined. Observe that we assume that E has access to H as well. Similarly, an ideal cipher Π with a key size l K can be seen as a collection of 2 lk permutations indexed by the set {0, 1} lk, selected uniformly at random from the set of all such families of permutations. While in concrete instantiations the cipher will be used in an appropriate mode of operation, this may introduce weaknesses as well. Hence, we assume that the block size of the permutation fits to the length of the plaintext which is fixed and publicly known. A user can make forward and backward requests to the ideal cipher. In a forward request, it sends a key K {0, 1} lk and a plaintext P and receives Π K (P ) while in a backward request, it sends a key K and a ciphertext C and gets Π 1 K (C ). Slightly deviating from the standard definition, we assume in addition that Π also internally stores the triple (K, P, C) that has been used to generate the challenge ciphertext. We emphasize that apart of internally storing this information, Π represents an ideal cipher. A final assumption we make is that decrypting a ciphertext under a different key, i.e., a key different from the one used to create the ciphertext, yields an valid plaintext only with negligible probability. This can be achieved for example by using a characteristic padding of a size linear in λ. We are now ready for the main theorem: Theorem 1. In the random oracle and ideal cipher model, the scheme described in section II is secure according to the security definition given in definition 1. 2) Proof: The remainder of this section contains the proof of theorem 1. We show this by a sequence of attacker types A i that differ on the oracles they can access. We show for any two subsequent types (A i, A i+1 ) that max Ai Adv(A i ) max Ai+1 Adv(A i+1 ) is negligible. The first attacker type (expressed by A 0 ) has access to E, I id, H, and Π, while the last type of attacker (expressed by A 5 ) can only access E. a) Attacker Type A 1 : Let A E,Iid,H,Π 0 be an arbitrary attacker with access to the four oracles E, I id, H, and Π. We construct now from A 0 an attacker A 1 that has access to a restricted oracle Π instead of Π. This oracle Π works as follows. It accepts as inputs keys K {0, 1} lk. It first checks if K = K. If this is the case, it outputs P. Otherwise the output is. In other words, Π tells if the given key is equal to the key that has been used to created C. The attacker A 1 invokes I id once in the beginning to learn C. From this point on, it uses A 0 in a blackbox manner. All requests to E and H, and I are simply forwarded. If A 0 makes a request Π involving a key K, it first forwards K to Π. If the output is not, A 1 learns the plaintext P, i.e., the decryption of C. If the request by A 0 was a forward query including P or a backward query including C, A 1 returns C and P, respectively. All other requests are answered by A 1 by consistent lazy sampling. The final output of A 1 is simply the final output of A 0. Although A 0 has access to the ideal cipher Π, her goal is to decrypt C = Π K (P ). By definition all outputs of Π under a different key K K are informationtheoretic independent from the queries using K and hence do not provide any useful information to A 0. Moreover, also the queries using K are perfectly simulated as the only restriction Π K (P ) = C is respected. Consequently A 1 perfectly simulates an ideal cipher to A 1. Moreover, A 1 wins if A 0 wins and vice versa. It follows that ( Adv A E,Iid,H,Π 0 ) = Adv (A E,Iid,H,Π 1 ). (18) In particular we have max A0 Adv(A 0 ) max A1 Adv(A 1 ). On the other hand, any attacker that has only access to the restricted oracle Π can be simulated by one that has access to the full ideal cipher Π. This implies that max A0 Adv(A 0 ) max A1 Adv(A 1 ). Concluding it holds 5 max { Adv ( A E,Iid,H,Π 0 )} { = max Adv (A E,Iid,H,Π 1 )}. b) Attacker Type A 2 : We consider now an attacker type that has access to a restricted information oracle I 2 nd instead of I id. The oracle I 2 nd simply outputs the second entry of the tuple, that is I 2 nd(c, g) = g. Observe that such an attacker does not learn C but knows it size according to the security model. Let A 1 = A E,Iid,H,Π 1 be an arbitrary attacker of the previous type. We construct now an attacker A 2 = A E,I 2nd,H,Π 2 of the current type. The attacker A 2 serves as simulator for A 1 as follows. At the beginning, A 2 invokes I 2 nd to learn g and then runs A 1 in a blackbox manner. All requests of A 1 and the answers are simply forwarded. The only exception is the request to I id. In this case, A 2 simply samples some ciphertext C of the correct size and returns (C, g) to A 1. Recall that A 1 has no access to the full ideal cipher Π and does not see the parameters of Π. In other 5 We omit the index of max from now on for the sake of readability.
14 words, Π at most reveals the plaintext but never the used ciphertext. Hence, the situation is indistinguishable from the case where A 1 receives the correct ciphertext C. It follows that Adv(A E,Iid,H,Π 1 ) = Adv(A E,I 2nd,H,Π 2 ). (19) Using the same arguments as above, it follows that { )} max Adv (A E,Iid,H,Π { )} = max Adv. 1 (A E,I 2nd,H,Π 2 c) Attacker Type A 3 : The next attacker type differs from the previous one that an attacker has no longer access to Π. Again, let A 2 be any attacker of the previous type. We construct an attacker A 3 = A E,I 2 nd,h 3. The attacker A 3 invokes A 2 and forwards all queries and answers with respect to the other oracles. However, to simulate Π, she performs some actions in addition. Whenever A 2 made a query w to the random oracle H and received some response h, the attacker A 3 appends this tuple (w, h) internally in some initially empty list L RO. In addition A 3 keeps a list L Keys which is empty at the beginning. L Keys will contain all keys that can be derived from the answers received by H. That is as soon as L RO = t, that is L RO = [(w 1, h 1),..., (w t, h t )], the attacker A 3 uses the mapping F and the secret sharing scheme to compute the corresponding key K and inserts (K, w 1,..., w t) into L Keys. Whenever a new tuple (w l+1, h l+1) is appended to L RO, the attacker A 3 computes for each possible combination (w i 1,..., w i t 1, w l+1 ), 1 i 1 < i 2... < i t 1 l, the corresponding K and inserts (K, w i 1,..., w i t 1, w l+1 ) into L Keys, sorted by the first entry. Observe that L RO, i.e., the number( of queries ) to H, is polynomial in λ and that L Keys = LRO t O( L RO t ), hence polynomial in λ as well. This shows that maintaining these lists can be done efficiently. Now, if A 2 wants to make a request K to Π, A 3 proceeds as follows. It first checks if K represents one of the first entries of the tuples stored in L Keys. If this is not the case, A 3 simply outputs. Otherwise, that is K is contained in L Keys, then A 3 derives the corresponding Hinputs w i 1,..., w i t, invokes E with these inputs, and forwards the return to A 2. Now, A 3 perfectly simulates Π for A 2 except for the following two cases: (i) running Extract with wrong keywords, i.e., which do not yield K, results nonetheless in a valid plaintext (and not in ), and (ii) A 2 managed to derive the correct key K but without using t correct keywords. The probability for the first event is negligible by assumption. Next, consider a key candidate K that has not been constructed from using H and F. The probability that K = K is hence 2 lk. To see why recall that the only component that is connected to K is g. As the candidate key cannot be constructed from the H responses, either the key has been purely guessed or derived from the secret sharing scheme where at least one share has been guessed. In any case, the probability of correctly yielding K is 2 lk, which is negligible in λ by assumption. Hence, it follows that Adv(A E,I 2 nd,h,π 2 ) Adv(A E,I 2 nd,h 3 ) negl(λ). (20) Consequently it holds as well { max Adv 2 { ( max Adv (A E,I 2 nd,h,π A E,I 2 nd,h 3 )} )} negl(λ). d) Attacker Type A 4 : The next attacker type differs from the previous one by having no longer access to the information oracle I 2 nd. Again, let A 3 = A E,I 2nd,H 3 denote an attacker of the previous type. We construct now an attacker A 4 = A E,H 4 of the current type. At the beginning, A 4 randomly constructs a mapping g according to some randomly chosen set of keywords. Then, A 4 runs A 3 and in principle forwards all requests and answers except of the request to I 2 nd. If A 3 makes a request to I 2 nd, the attacker A 4 simply returns g to A 3. If A 3 produces some output P, then so does A 4. However, in addition it might happen that A 4 aborts the execution of A 3 before A 3 stops and produces an output on her own. Before we explain this situation in detail, we first explain the reasoning behind it. Observe that by construction, the function g maps the coordinate x i to y i z i where the z i represent the second part of the hash value of the correct keywords and y i the second part of the share. Obviously, even if one knows g(x i ), any value for y i (or alternatively z i ) is possible. In particular, g alone does not restrict the set of possible shares. Moreover, any collection of t keywords can be potentially correct. Even if all these have hash values that share the same xcoordinate, it may be a correct as each component function has degree at least t. Thus, g and g show exactly the same structure. Moreover, as H is preimage resistant, knowing g does not help the attacker in choosing correct keywords. Hence, the success probability of finding a collection of t correct keywords is not reduced if one replaces g by g.
15 If A 3 has a collection of t correct keywords w 1,..., w t and runs the extraction procedure using g, she will get a key K which is different to K with high probability. But this doesn t matter. As K is not used anymore by any of the oracles, A 3 cannot notice the difference. Nonetheless, providing g instead of g does not yield a perfect simulation. If an attacker runs the extraction algorithm using g twice with two different collection of correct keywords, the output will be two different keys with high probability. However, running E twice with the same sets of keywords will yield the correct plaintext in both cases. This brings us to the situation where A 4 may stop the execution of A 3 and produce her own output. Observe that the situation described above can only take place if A 3 has requested H with at least t correct keywords. Similar to the discussion of the previous attacker type, A 4 uses a list L RO where she collects all requests w of A 3 to H. Moreover, A 4 runs E for any combination of t keywords in L RO. More precisely, it runs E the first time when L RO contains t entries. From this point on, whenever a new request to H is made, A 4 runs E on any combination of t elements in L RO that contains the new requested value. As already explained, the overall effort is polynomial in λ. Moreover, if E returns to one of these requests the plaintext P (instead of ), A 4 halts and outputs P. If A 3 requests to H does not involve t correct keywords, it cannot observe that g has been replaced and the simulation is perfect. Moreover, A 4 will not abort and hence both attackers have the same success probability. Otherwise, if A 3 requested t correct keywords to H, the attacker A 4 will succeed for sure. It follows that Adv(A 3 ) Adv(A 4 ) and hence max{adv(a 3 )} max{adv(a 4 )}. On the other hand, it trivially holds that max{adv(a 3 )} max{adv(a 4 )}. Consequently: max { Adv ( A E,I 2 nd,h 3 )} = max { Adv ( A E,H 4 )}. e) Attacker Type A 5 : We come now to the final attacker type. As opposed to the previous attacker type, now an attacker has no longer access to the random oracle H. Again, let A 4 = A E,H 4 be any attacker of the previous type. We construct now an attacker A 5 which does the following. It runs A 4 and forwards all requests to and answers from E. With respect to requests to H, these are answered by consistently sampling random responses. Observe that A 5 perfectly simulates a random oracle. Moreover, A 4 cannot distinguish the simulated random oracle from H as E only tells which tuples of keywords are correct but says nothing about the internal values. Obviously, A 4 and A 5 have the same success probability. Likewise, it follows that { ( )} max Adv = max { Adv ( A E )} 5 A E,H 4 We have reached the final attacker type. Observer that we have shown for each two subsequent attacker types that the maximum success probability is either equal or differ by a negligible function only. This shows the claim of the theorem, namely max { Adv ( A E,Iid,H,Π 0 This concludes the proof. C. Dictionary Attacks )} max { Adv ( A E )} 5 negl(λ). We have seen that the only attack that may be potentially relevant are socalled dictionary attacks, formalized by attackers that have only access to the E oracle. In a dictionary attack, the attacker A has access to a dictionary of N keywords W 1,..., W N and tries different combinations of t keywords until she successfully recovered the secret key K (and hence, eventually the initial plaintext as well). In the remainder of this section, we analyze the effort for such an attack. To this end, we make the assumption that each keyword combination is equally likely in principle. That is from an attacker s point of view, there exist no combinations of keywords that for whatever reason have a higher probability to be correct than others. For sure, this may not hold in practice. For example, the language of the initial plaintext and the context of this message may be taken into account for such an attack. But as this is hard to model for the general case, we consider it as out of scope of this work at hand. A further assumption we make is that the dictionary contains all keywords associated to P, that is W {W 1,..., W N }. The reasons is that security analysis should evaluate security against attackers that are as powerful as possible. In the following, we call a selection of t keywords W 1,..., W t as good if it allows to recover the plaintext by applying the Extract algorithm, i.e., P := Extract(C, F, W 1,..., W t). (21) By construction of the scheme, any choice of t keywords in W is good. That is, the number of good choices is at least ( ) n t. Theoretically, there could be more good combinations. Let p(x) the polynomial that has been used in the secret sharing scheme for determining the shares (x i, y i ), i = 1,..., n. That is it holds that p(x i ) = y i for each i. Assume now that by coincidence it happens
16 for a keyword W W that (x, y ) := F (W ) for which p(x ) = y holds as well. Then any combination of t keywords taken from W {W } would likewise be good. In particular the number of good combinations would be at least ( ) n+1 t in such a case. While such effects can happen in principle, they are extremely unlikely. Due to the security of the involved hash function H and the bijectivity of G, randomly chosen inputs W {0, 1} to F lead to uniformly distributed outputs (x, y ) GF(2 128 ) GF(2 128 ). Hence, the probability of the event that p(x ) = y is If the dictionary contains N n keywords that do not belong to W, the expected number of keywords W that by conincidence give a correct share is N n as N has to be for practical reasons. 6 In a similar manner, one can argue that the probability that a combination of t keywords that are not all contained in W gives the correct plaintext is Also here one cannot expect that this practically happens. Summing up, the number of good combinations is exactly ( ) n t with overwhelming probability. Hence, the dictionary attack translates to the following procedure: The) attacker continuously selects randomly ( N t one of the combinations W 1,..., W t and applies the Extract algorithm until he recovers the plaintext, i.e., until he hits by chance one of the ( ) n t good combinations. Observe that an attacker is forced to execute the complete Extract algorithm. Omitting step 1 (applying F to W 1,..., W t for getting a combination of possible shares (x 1, y 1 ),..., (x t, y t) would force the attacker to predict the hash values of W i, being infeasible if the hash function is secure. As discussed above any value (x, y) GF(2 128 ) GF(2 128 ) can be a valid share. Even if all xcoordinates x 1,..., x t would fall into the same subset X I, this combination could be possible (from an attacker s point of view) as the corresponding subfunction g I has degree t. That is there is no indication that X I may contain less than xcoordinates of the valid shares. The only event that may tell an attacker that a combination is invalid is if two xcoordinates coincide, i.e., x i = x j for i j. However, the probability for this is again and hence can be ignored. Summing up, at the end of step 1 an attacker cannot exclude yet that the considered combination of keywords may be good. Now step 2 gives a value K GF(2 128 ) that may be the correct key. Whether this is the case or not cannot be decided at this stage as the secret key does not contain any redundant information that may allow for 6 Already N = 2 50 would imply a dictionary of several Terabytes and is still smaller than by multiple magnitudes. distinguishing it from random values. Hence, the attacker has to perform the third and last step as well for figuring out whether the considered combination of keywords was good. Altogether this shows that the only possible attack is a plain dictionary attack. We evaluate the effort of a successful attack for our sample cases in Section VIB7. VI. PRACTICAL IMPLEMENTATION In this section, we show the prototype implementation of the encryption tool and the integration of the search and decryption routines in the standard forensic framework Autopsy [7] (Version 3) exemplarily. We further evaluate the performance of the approach on sample cases. A. Realization We want to sketch how our protoype implementation works. For brevity, we omit implementation details, as all the important algorithms are already explained in Sections II and III and the implementation is straight forward. The encryption Software is implemented platform independent as a Python script. The command syntax is as follows: python ppef_gen.py <pst mh mbox rmbox> <in> <out> <thr> <blist> We support different inbox formats, like the mbox format which is amongst others used by Mozilla Thunderbird, the pst format that is used by Microsoft Outlook, the MH (RFC822) format and Maildir format as used by many UNIX tools. For encryption, the threshold parameter t needs to be specified and the blacklist can be provided as a simple textfile. As for the encryption tool, the decryption software is implemented as a platform independent Python script. For better usability in real world cases, we also integrated it into Autopsy by implementing an Autopsyplugin called PPEF (Privacy Preserving Forensics). To perform his analysis on the resulting encrypted ppef file, the investigator can just add an encrypted ppef file to a case in Autopsy as a logical file and in step 3 of the import process, select the PPEF ingest module in order to handle the encrypted data, as shown in Fig. 1 on the next page. After successfully importing a ppef file, the file is added to the data sources tree. When selecting the file, Autopsy first shows the randomlooking encrypted data in the hexdump, as can be seen from Figure 2. The investigator can then use the PPEF search form to run keyword searches on the encrypted data using our
17 Fig. 1. Importing encrypted PPEF file to Autopsy. Fig. 3. Viewing the successfully decrypted matches. prototypes to evaluate the performance of our solution. Fig. 2. Performing a keyword search on the encrypted ppef file in Autopsy. previously explained extract algorithm. The search bar is shown in Fig. 3. Note that the number of keywords that have to be entered is obtained from the ppef file automatically. When clicking on the PPEF Search button, the searching algorithm is run transparently in the background and successfully decrypted s (i.e. s that match the entered keywords) are automatically imported to Autopsy. The s are parsed using Autopsy s standard mbox parser, so that s are added within the Messages tree and can be filtered, sorted, and viewed using the standard mechanisms of the tool, as also displayed in Fig. 3 (the contents of the s have been sanitized for privacy reasons). After shortly demonstrating the practical realization, we next use those B. Practical Evaluation In order to evaluate the practical usage of our scheme using the developed prototype, we use a data set of different real world boxes. We evaluate the performance of the prototype for encryption, search and decryption. All measurements where performed on one core of a laptop with Intel R Core TM i53320m CPU at 2.60 GHz and 16 GB RAM. 1) The data set: Our data set consists of the following 5 mailboxes: The Apache httpd user mailing list from 2001 to today consisting of s (labeled Apache) One mailbox of 1590 work s (Work) Three mailboxes of private persons with 511, 349, and 83 s (A, B, C) 2) Blacklist: As example blacklist, the 10, 000 most common words of English, German, Dutch and French [13] are used. In a real case, the blacklist may of course be choosen less restrictive, but this is a generic choice used for our performance evaluation. The keyword threshold t was set to 3. In real cases, a more case specific blacklist must be created. It must consider words such as the company name, which likely can be found in every , signatures, and generally terms often used in s, as long as they are not especially relevant to the case. We focus on the encryption scheme itself and consider building an appropriate blacklist to be a separate problem that is highly dependant on the specific case and can not be solved in general. 3) Data set details: The lexicographic details of the sample data can be seen in Table II. On average, s
18 Apache Work A B C s: Words: Max Avg Med σ Σ Keywords: Max Avg Med σ Σ TABLE II LEXICOGRAPHIC PROPERTIES OF SAMPLE MAILBOXES. for the IV and padding prepended to the plaintext to determin correct decryption and between 1 and 16 bytes PKCS#7 padding to the AES 128 blocksize. Altogether, the size increase in kilobytes of the raw s within the mailboxes compared to our ppef format can be seen in Table IV. This leads to an average storage overhead of 5.2 %. Apache Work A B C Size Raw [KB] 376, , , , 486 6, 676 Size PPEF [KB] 418, , , , 821 6, 806 Overhead 11.2 % 0.4 % 3.0 % 0.7 % 1.9 % TABLE IV SEARCH/DECRYPTION PERFORMANCE. of the sample set are composed of around 220 words. Further, disregarding multiple occurrences of the same word and via applying the blacklist, the number of resulting keywords per is reduced to an average of 35 keywords. 4) Encryption performance: Table III shows statistics about the time it takes to encrypt the sample s. It also lists the total time it takes to encrypt each sample mailbox. The rate to encrypt the mailboxes is on average only around s per second. This value is still acceptable because the encryption needs to be done only once during an investigation since our approach is noninteractive. Apache [s] Work A B C Min Max Avg Med σ Σ TABLE III ENCRYPTION PERFORMANCE OF PROTOTYPE. 5) Encryption storage overhead: Because the sole encryption of the s does not add much on the size of the data, the main factor of storage overhead is the storage for the coefficients of our polynomials used to map hashes to shares. Their number is determined directly from the keywords, with a slight overhead introduced by padding the coefficients of the subfunctions to degree t. This introduces a total average overhead of bytes for the polynomial coefficients, plus 16 byte salt per . The overhead introduced by the AES encryption is between 33 to 48 bytes consisting of 16 bytes each Apache [s] Work A B C Min Max Avg Med σ Σ TABLE V SEARCH/DECRYPTION PERFORMANCE. 6) Search and decryption performance: As the prototype always tries to decrypts each , the search and decryption is not dependent on a keyword hit and hence the times listed for search in Table V include the decryption time of successful keyword hits. Because the rate the mailboxes are searched is on average only around 98 s per second, searches take a while, but searches are still feasible. Besides timing measurements, the prototype was also evaluated with regard to functional correctness, by performing sample keyword searches for keyword combinations only included in specific s and keyword combinations not included in any s at all. The found keyword matches corresponded perfectly with the expected results, i.e., only the s actually containing the keywords were decrypted, or none at all, which also practically confirms the correctness of our scheme. 7) Attack performance: We now discuss the effort needed by an attacker to execute a dictionary attack on an entire box, such that on average, a fraction of π of all s are decrypted where 0 π < 1. The strongest possible attack to our scheme is a bruteforce dictionary attack, as long as the underlying ciphers are secure. The effort of such an attack is expressed in the
19 π N Apache Work A B C , , , , , , , , , , , , , , , ,000 27, ,000 22, ,000 3, TABLE VI ATTACK TIME IN YEARS TO DECRYPT THE ENTIRE MAILBOX WITH A PROBABILITY OF π USING A WORDLIST WITH N WORDS. number q of trials an attacker needs to make. To this end, we use the following sufficient condition to decrypt a single with probability π: ( n ) t q log 2 (1 π)/ log 2 1 ) (22) ( N t This involves several highly contextdependent parameters, making a general statement impossible. But to give an intuition nonetheless, we estimate the attack effort in the cases considered in our evaluation. For each of the cases, TABLE VI displays the effort for different scenarios: We considered the cases that each contains the measured n avg keywords for each mailbox. With respect to the dictionary size, we take as orientation the Second Edition of the 20volume Oxford English Dictionary. It contains full entries for 171,476 words in current use [14]. So, we first choose N := 171, 476. For the success rate π, we chosen the value 0.99 as this ensures for an attacker with a high probability that almost all s are decrypted. Using this parameters, we can calculate the number of tries q an attacker has to perform for the respective mailboxes. For example for the Apache mailbox, this results in q = queries to brute force a single mail. We multiply this by the number of mails in the mailbox and in order to give a better intuition, calculate the time needed for the attack based on the measured average query speed as presented in Section VIB6 before. This leads to a time of years needed for this attack on the Apache mailbox, as shown in TABLE VI. Of course, an attacker might reach some speedup by implementing the decryption software using the C language instead of python, as we did for our prototype. The fastest attack in our setting is the one on mailbox C, as this is the smallest mailbox in our dataset and the attack would take 5, years. As one might argue that not all those words in current use might really be used within a particular company and the attacker might have some kind of auxiliary information about which words are commonly used, for example from a sample set of already decrypted s, we also calculate the effort for a worse scenario, where only 50 % of the words in current use from the Oxford English Dictionary are used and the attacker exactly knows which words belong to this 50 % so he can run an optimized attack with N := 85, 738. The effort needed for this attack is obviously lower than before, as shown in the second line of Table VI. From those figures, we can see that in this case, the attack takes years for our smallest mailbox C and even longer for the other ones. Next, we consider an attacker that wants to decrypt a random half of the corpus (i.e. he does not care, which s are decrypted and which are not, just the number of successfully decrypted s is important). If the attacker wants do decrypt a specific half of all the mails (those that contain some important information), the attack performance was identical to that of full attack from our first scenario, because due to our scheme, the attacker does not know which encrypted data belongs to which portion of the corpus prior to decryption. This means, the mails decrypted after this attack might not reveal the information an attacker is looking for, but it may be sufficient and obviously takes far less time, as shown in Table VI in the third line for π = We further consider a very strong attacker, who combines both previously mentioned attack scenarios: On the one hand, he has prior knowledge about the vocabulary used within the company, so he ran reduce his wordlist by 50 %, and on the other hand, it is sufficient for him, if only half of the s can be decrypted (N := 85, 738 and π = 0.5). The respective figures are shown in the fourth line of Table VI. Finally, we also want to evaluate what happens, if
20 the attacker possesses even more prior knowledge about the words used within the s. Therefor we set the number N of words needed for a successful brute force attack to , which corresponds to the vocabulary in daily speech of an educated person and words, beeing the number of words used in the vocabulary of an uneducated person. If we combine this with the restrictions before, we see that the worst case drops to 0.16 years (1.92 months) to decrypt half of the mails of mailbox C that contains only 83 s with a propability of 50% if the attacker exactly known which restricted vocabulary is used within the s and none of those words has been blacklisted. For the larger mailboxes it still takes more than 3 years even in this worst case scenario. We consider this case to be rather unrealistic, because besides the high prior knowledge of the attacker, mailboxes in companys usually contain far more than 83 s and we see that even for mailbox B with only 349 s the attack is rather costly (3.5 years). Thus we feel that our solution provides good privacy protection even in very bad scenarios. VII. RELATED WORK There is some related work regarding forensic analysis of s, which describe the analysis of data using data mining techniques [15], [16]. Others perform similar techniques on mail collections to identify common origins for spam [17]. However, those works have in common that they work solely on the raw data and do not consider privacy. Regarding privacy issues, there are some papers that argue broadly on different aspects of privacy protection in forensic investigations in general [18], [19], [20]. Agarwal et al. [21] give a very good overview of privacy expectations especially with in work environments. Hou et al. [22], [?] address a related area to our work when proposing a solution for privacy preserving forensics for shared remote servers. However, the techniques are very different to our work. First, they deploy publickey encryption schemes for encrypting the data which incurs a higher computational overhead and larger storage requirements compared to our scheme that uses secretkey schemes only. Second, the proposed solutions require frequent interaction between the investigator and the data holder while our solution is noninteractive. Similarly, Olivier [23] introduces a system to preserve privacy of employees when surfing the web, while allowing to track the source of malicious requests for forensics. To the best of our knowledge, there has been no solution to ensure privacy in forensic investigations by cryptographic means so far. Regarding this work s cryptographic part, to the best of our knowledge, no existing solution is adequate for what we achieve. We, however, review directions that are the most related. First, searchable encryption schemes, e.g., [24], [25], [26], encrypt files in such ways that they can be searched for certain keywords. More precisely, the scenario is that one party, the data owner, encrypts his data and outsources it to another party, the data user. The data owner now can create for keywords appropriate search tokens that allow the data user to identify the encrypted files that contain these keywords. The identified ciphertexts are given to the data owner who can decrypt them. Such schemes differ in two important aspects from the proposed scheme. First, most schemes enable search for single keywords while only few support conjunctive queries, e.g., see [27], [28], [29]. Second, they require interaction between the data owner and data user while in our scenario the investigator needs to be able to autonomously search and decrypt the encrypted files without requiring the data owner (e.g., the company) to do this. VIII. CONCLUSION AND FUTURE WORK In this section, we summarize our findings, discuss limitations, and show up possible future work. A. Summary In the course of this paper, we described a modified process of forensically analyzing data in a corporate environment. This process protects the privacy of the owners as far as possible while still supporting a forensic investigation. In order to achieve this goal, the modified process needs to make use of a novel cryptographic scheme that is introduced in this paper. Our scheme encrypts a given text in such a way, that an investigator can reconstruct the encryption key and hence decrypt the text, if (and only if) he guesses a specific number of words within this text. This threshold t of the number of words that have to be guessed can be chosen when applying our scheme. As a further configuration of the scheme one can configure a list of words that shall not provide the key reconstruction. This kind of blacklist is meant to contain words that occur in almost every text, such as pronouns or salutations. As a proof of concept, we implemented a prototype software that is able to encrypt inboxes of different formats to a container file. To show the feasibility of the practical application of our approach, we further implemented a plugin to the wellknown open source forensic framework Autopsy that enables the framework
Lecture 3: OneWay Encryption, RSA Example
ICS 180: Introduction to Cryptography April 13, 2004 Lecturer: Stanislaw Jarecki Lecture 3: OneWay Encryption, RSA Example 1 LECTURE SUMMARY We look at a different security property one might require
More informationVictor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract
Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,rubing@bellcore.com Abstract In this paper, we investigate a method by which smart
More information1 Digital Signatures. 1.1 The RSA Function: The eth Power Map on Z n. Crypto: Primitives and Protocols Lecture 6.
1 Digital Signatures A digital signature is a fundamental cryptographic primitive, technologically equivalent to a handwritten signature. In many applications, digital signatures are used as building blocks
More information1 Construction of CCAsecure encryption
CSCI 5440: Cryptography Lecture 5 The Chinese University of Hong Kong 10 October 2012 1 Construction of secure encryption We now show how the MAC can be applied to obtain a secure encryption scheme.
More informationLecture 10: CPA Encryption, MACs, Hash Functions. 2 Recap of last lecture  PRGs for one time pads
CS 7880 Graduate Cryptography October 15, 2015 Lecture 10: CPA Encryption, MACs, Hash Functions Lecturer: Daniel Wichs Scribe: Matthew Dippel 1 Topic Covered Chosen plaintext attack model of security MACs
More information1 Recap: Perfect Secrecy. 2 Limits of Perfect Secrecy. Recall from last time:
Theoretical Foundations of Cryptography Lecture 2 Georgia Tech, Spring 2010 Computational Hardness 1 Recap: Perfect Secrecy Instructor: Chris Peikert Scribe: George P. Burdell Recall from last time: Shannon
More information1 Message Authentication
Theoretical Foundations of Cryptography Lecture Georgia Tech, Spring 200 Message Authentication Message Authentication Instructor: Chris Peikert Scribe: Daniel Dadush We start with some simple questions
More information1 Pseudorandom Permutations
Theoretical Foundations of Cryptography Lecture 9 Georgia Tech, Spring 2010 PRPs, Symmetric Encryption 1 Pseudorandom Permutations Instructor: Chris Peikert Scribe: Pushkar Tripathi In the first part of
More informationLecture 9  Message Authentication Codes
Lecture 9  Message Authentication Codes Boaz Barak March 1, 2010 Reading: BonehShoup chapter 6, Sections 9.1 9.3. Data integrity Until now we ve only been interested in protecting secrecy of data. However,
More informationTitle Goes Here An Introduction to Modern Cryptography. Mike Reiter
Title Goes Here An Introduction to Modern Cryptography Mike Reiter 1 Cryptography Study of techniques to communicate securely in the presence of an adversary Traditional scenario Goal: A dedicated, private
More informationCryptographic hash functions and MACs Solved Exercises for Cryptographic Hash Functions and MACs
Cryptographic hash functions and MACs Solved Exercises for Cryptographic Hash Functions and MACs Enes Pasalic University of Primorska Koper, 2014 Contents 1 Preface 3 2 Problems 4 2 1 Preface This is a
More informationMidterm Exam Solutions CS161 Computer Security, Spring 2008
Midterm Exam Solutions CS161 Computer Security, Spring 2008 1. To encrypt a series of plaintext blocks p 1, p 2,... p n using a block cipher E operating in electronic code book (ECB) mode, each ciphertext
More informationMACs Message authentication and integrity. Table of contents
MACs Message authentication and integrity Foundations of Cryptography Computer Science Department Wellesley College Table of contents Introduction MACs Constructing Secure MACs Secure communication and
More informationAdvanced Cryptography
Family Name:... First Name:... Section:... Advanced Cryptography Final Exam July 18 th, 2006 Start at 9:15, End at 12:00 This document consists of 12 pages. Instructions Electronic devices are not allowed.
More informationMessage Authentication Code
Message Authentication Code Ali El Kaafarani Mathematical Institute Oxford University 1 of 44 Outline 1 CBCMAC 2 Authenticated Encryption 3 Padding Oracle Attacks 4 Information Theoretic MACs 2 of 44
More informationIntroduction. Digital Signature
Introduction Electronic transactions and activities taken place over Internet need to be protected against all kinds of interference, accidental or malicious. The general task of the information technology
More informationSimulationBased Security with Inexhaustible Interactive Turing Machines
SimulationBased Security with Inexhaustible Interactive Turing Machines Ralf Küsters Institut für Informatik ChristianAlbrechtsUniversität zu Kiel 24098 Kiel, Germany kuesters@ti.informatik.unikiel.de
More informationSecure Computation Without Authentication
Secure Computation Without Authentication Boaz Barak 1, Ran Canetti 2, Yehuda Lindell 3, Rafael Pass 4, and Tal Rabin 2 1 IAS. E:mail: boaz@ias.edu 2 IBM Research. Email: {canetti,talr}@watson.ibm.com
More informationProvableSecurity Analysis of Authenticated Encryption in Kerberos
ProvableSecurity Analysis of Authenticated Encryption in Kerberos Alexandra Boldyreva Virendra Kumar Georgia Institute of Technology, School of Computer Science 266 Ferst Drive, Atlanta, GA 303320765
More informationFuzzy IdentityBased Encryption
Fuzzy IdentityBased Encryption Janek Jochheim June 20th 2013 Overview Overview Motivation (Fuzzy) IdentityBased Encryption Formal definition Security Idea Ingredients Construction Security Extensions
More informationCOM S 687 Introduction to Cryptography October 19, 2006
COM S 687 Introduction to Cryptography October 19, 2006 Lecture 16: NonMalleability and Public Key Encryption Lecturer: Rafael Pass Scribe: Michael George 1 NonMalleability Until this point we have discussed
More informationDigital Investigation
Digital Investigation 14 (2015) S127eS136 Contents lists available at ScienceDirect Digital Investigation journal homepage: www.elsevier.com/locate/diin DFRWS 2015 US Privacypreserving email forensics
More informationCryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 11 Block Cipher Standards (DES) (Refer Slide
More informationSecurity Aspects of. Database Outsourcing. Vahid Khodabakhshi Hadi Halvachi. Dec, 2012
Security Aspects of Database Outsourcing Dec, 2012 Vahid Khodabakhshi Hadi Halvachi Security Aspects of Database Outsourcing Security Aspects of Database Outsourcing 2 Outline Introduction to Database
More informationMESSAGE AUTHENTICATION IN AN IDENTITYBASED ENCRYPTION SCHEME: 1KEYENCRYPTTHENMAC
MESSAGE AUTHENTICATION IN AN IDENTITYBASED ENCRYPTION SCHEME: 1KEYENCRYPTTHENMAC by Brittanney Jaclyn Amento A Thesis Submitted to the Faculty of The Charles E. Schmidt College of Science in Partial
More informationError oracle attacks and CBC encryption. Chris Mitchell ISG, RHUL http://www.isg.rhul.ac.uk/~cjm
Error oracle attacks and CBC encryption Chris Mitchell ISG, RHUL http://www.isg.rhul.ac.uk/~cjm Agenda 1. Introduction 2. CBC mode 3. Error oracles 4. Example 1 5. Example 2 6. Example 3 7. Stream ciphers
More informationUniversal Hash Proofs and a Paradigm for Adaptive Chosen Ciphertext Secure PublicKey Encryption
Universal Hash Proofs and a Paradigm for Adaptive Chosen Ciphertext Secure PublicKey Encryption Ronald Cramer Victor Shoup December 12, 2001 Abstract We present several new and fairly practical publickey
More informationAuthenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm By Mihir Bellare and Chanathip Namprempre
Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm By Mihir Bellare and Chanathip Namprempre Some slides were also taken from Chanathip Namprempre's defense
More informationCSA E0 235: Cryptography (29/03/2015) (Extra) Lecture 3
CSA E0 235: Cryptography (29/03/2015) Instructor: Arpita Patra (Extra) Lecture 3 Submitted by: Mayank Tiwari Review From our discussion of perfect secrecy, we know that the notion of perfect secrecy has
More informationOverview of Cryptographic Tools for Data Security. Murat Kantarcioglu
UT DALLAS Erik Jonsson School of Engineering & Computer Science Overview of Cryptographic Tools for Data Security Murat Kantarcioglu Pag. 1 Purdue University Cryptographic Primitives We will discuss the
More informationSYMMETRIC ENCRYPTION. Mihir Bellare UCSD 1
SYMMETRIC ENCRYPTION Mihir Bellare UCSD 1 Syntax A symmetric encryption scheme SE = (K,E,D) consists of three algorithms: K and E may be randomized, but D must be deterministic. Mihir Bellare UCSD 2 Correct
More informationComputational Soundness of Symbolic Security and Implicit Complexity
Computational Soundness of Symbolic Security and Implicit Complexity Bruce Kapron Computer Science Department University of Victoria Victoria, British Columbia NII Shonan Meeting, November 37, 2013 Overview
More informationMessage Authentication Codes 133
Message Authentication Codes 133 CLAIM 4.8 Pr[Macforge A,Π (n) = 1 NewBlock] is negligible. We construct a probabilistic polynomialtime adversary A who attacks the fixedlength MAC Π and succeeds in
More informationCryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur Lecture No. #06 Cryptanalysis of Classical Ciphers (Refer
More informationCryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur
Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur Module No. # 01 Lecture No. # 05 Classic Cryptosystems (Refer Slide Time: 00:42)
More informationFactoring & Primality
Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount
More informationLecture 13: Message Authentication Codes
Lecture 13: Message Authentication Codes Last modified 2015/02/02 In CCA security, the distinguisher can ask the library to decrypt arbitrary ciphertexts of its choosing. Now in addition to the ciphertexts
More informationIntroduction to Security Proof of Cryptosystems
Introduction to Security Proof of Cryptosystems D. J. Guan November 16, 2007 Abstract Provide proof of security is the most important work in the design of cryptosystems. Problem reduction is a tool to
More informationComputer Science A Cryptography and Data Security. Claude Crépeau
Computer Science 308547A Cryptography and Data Security Claude Crépeau These notes are, largely, transcriptions by Anton Stiglic of class notes from the former course Cryptography and Data Security (308647A)
More informationNetwork Security CS 5490/6490 Fall 2015 Lecture Notes 8/26/2015
Network Security CS 5490/6490 Fall 2015 Lecture Notes 8/26/2015 Chapter 2: Introduction to Cryptography What is cryptography? It is a process/art of mangling information in such a way so as to make it
More informationA NOVEL APPROACH FOR VERIFIABLE SECRET SHARING BY USING A ONE WAY HASH FUNCTION
A NOVEL APPROACH FOR VERIFIABLE SECRET SHARING BY Abstract USING A ONE WAY HASH FUNCTION Keyur Parmar & Devesh Jinwala Sardar Vallabhbhai National Institute of Technology, Surat Email : keyur.beit@gmail.com
More information36 Toward Realizing PrivacyPreserving IPTraceback
36 Toward Realizing PrivacyPreserving IPTraceback The IPtraceback technology enables us to trace widely spread illegal users on Internet. However, to deploy this attractive technology, some problems
More informationTalk announcement please consider attending!
Talk announcement please consider attending! Where: Maurer School of Law, Room 335 When: Thursday, Feb 5, 12PM 1:30PM Speaker: Rafael Pass, Associate Professor, Cornell University, Topic: Reasoning Cryptographically
More informationThe Data Encryption Standard (DES)
The Data Encryption Standard (DES) As mentioned earlier there are two main types of cryptography in use today  symmetric or secret key cryptography and asymmetric or public key cryptography. Symmetric
More informationDiagonalization. Ahto Buldas. Lecture 3 of Complexity Theory October 8, 2009. Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach.
Diagonalization Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach. Ahto Buldas Ahto.Buldas@ut.ee Background One basic goal in complexity theory is to separate interesting complexity
More informationDr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010
CS 494/594 Computer and Network Security Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010 1 Introduction to Cryptography What is cryptography?
More informationMultiInput Functional Encryption for Unbounded Arity Functions
MultiInput Functional Encryption for Unbounded Arity Functions Saikrishna Badrinarayanan, Divya Gupta, Abhishek Jain, and Amit Sahai Abstract. The notion of multiinput functional encryption (MIFE) was
More information1 Signatures vs. MACs
CS 120/ E177: Introduction to Cryptography Salil Vadhan and Alon Rosen Nov. 22, 2006 Lecture Notes 17: Digital Signatures Recommended Reading. KatzLindell 10 1 Signatures vs. MACs Digital signatures
More informationNetwork Security. Computer Networking Lecture 08. March 19, 2012. HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23
Network Security Computer Networking Lecture 08 HKU SPACE Community College March 19, 2012 HKU SPACE CC CN Lecture 08 1/23 Outline Introduction Cryptography Algorithms Secret Key Algorithm Message Digest
More informationLecture 5  CPA security, Pseudorandom functions
Lecture 5  CPA security, Pseudorandom functions Boaz Barak October 2, 2007 Reading Pages 82 93 and 221 225 of KL (sections 3.5, 3.6.1, 3.6.2 and 6.5). See also Goldreich (Vol I) for proof of PRF construction.
More informationCS 4803 Computer and Network Security
pk CA User ID U, pk U CS 4803 Computer and Network Security Verifies the ID, picks a random challenge R (e.g. a message to sign) R I want pk U Alexandra (Sasha) Boldyreva PKI, secret key sharing, implementation
More informationYALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE
YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CPSC 467a: Cryptography and Computer Security Notes 1 (rev. 1) Professor M. J. Fischer September 3, 2008 1 Course Overview Lecture Notes 1 This course is
More informationThe Trip Scheduling Problem
The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems
More informationSECURITY EVALUATION OF EMAIL ENCRYPTION USING RANDOM NOISE GENERATED BY LCG
SECURITY EVALUATION OF EMAIL ENCRYPTION USING RANDOM NOISE GENERATED BY LCG ChungChih Li, Hema Sagar R. Kandati, Bo Sun Dept. of Computer Science, Lamar University, Beaumont, Texas, USA 4098808748,
More informationMTAT.07.003 Cryptology II. Digital Signatures. Sven Laur University of Tartu
MTAT.07.003 Cryptology II Digital Signatures Sven Laur University of Tartu Formal Syntax Digital signature scheme pk (sk, pk) Gen (m, s) (m,s) m M 0 s Sign sk (m) Ver pk (m, s)? = 1 To establish electronic
More informationA lowcost Alternative for OAEP
A lowcost Alternative for OAEP Peter Schartner University of Klagenfurt Computer Science System Security peter.schartner@aau.at Technical Report TRsyssec1102 Abstract When encryption messages by use
More informationMAC. SKE in Practice. Lecture 5
MAC. SKE in Practice. Lecture 5 Active Adversary Active Adversary An active adversary can inject messages into the channel Active Adversary An active adversary can inject messages into the channel Eve
More informationEntangled Encodings and Data Entanglement
An extended abstract of this paper is published in the proceedings of the 3rd International Workshop on Security in Cloud Computing SCC@AsiaCCS 2015. This is the full version. Entangled Encodings and Data
More informationMathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
More informationDeveloping and Investigation of a New Technique Combining Message Authentication and Encryption
Developing and Investigation of a New Technique Combining Message Authentication and Encryption Eyas ElQawasmeh and Saleem Masadeh Computer Science Dept. Jordan University for Science and Technology P.O.
More informationIdentityBased Encryption from the Weil Pairing
Appears in SIAM J. of Computing, Vol. 32, No. 3, pp. 586615, 2003. An extended abstract of this paper appears in the Proceedings of Crypto 2001, volume 2139 of Lecture Notes in Computer Science, pages
More informationLecture 9. Lecturer: Yevgeniy Dodis Spring 2012
CSCIGA.3210001 MATHGA.2170001 Introduction to Cryptography Macrh 21, 2012 Lecture 9 Lecturer: Yevgeniy Dodis Spring 2012 Last time we introduced the concept of Pseudo Random Function Family (PRF),
More informationSECURITY IN NETWORKS
SECURITY IN NETWORKS GOALS Understand principles of network security: Cryptography and its many uses beyond confidentiality Authentication Message integrity Security in practice: Security in application,
More informationThe Order of Encryption and Authentication for Protecting Communications (Or: How Secure is SSL?)
The Order of Encryption and Authentication for Protecting Communications (Or: How Secure is SSL?) Hugo Krawczyk Abstract. We study the question of how to generically compose symmetric encryption and authentication
More informationOutline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures
Outline Computer Science 418 Digital Signatures Mike Jacobson Department of Computer Science University of Calgary Week 12 1 Digital Signatures 2 Signatures via Public Key Cryptosystems 3 Provable 4 Mike
More informationLecture 2: Universality
CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal
More informationCryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #10 Symmetric Key Ciphers (Refer
More informationClient Server Registration Protocol
Client Server Registration Protocol The ClientServer protocol involves these following steps: 1. Login 2. Discovery phase User (Alice or Bob) has K s Server (S) has hash[pw A ].The passwords hashes are
More informationAuthentication and Encryption: How to order them? Motivation
Authentication and Encryption: How to order them? Debdeep Muhopadhyay IIT Kharagpur Motivation Wide spread use of internet requires establishment of a secure channel. Typical implementations operate in
More informationNetwork Security. Modes of Operation. Steven M. Bellovin February 3, 2009 1
Modes of Operation Steven M. Bellovin February 3, 2009 1 Using Cryptography As we ve already seen, using cryptography properly is not easy Many pitfalls! Errors in use can lead to very easy attacks You
More informationProvably Secure Cryptography: State of the Art and Industrial Applications
Provably Secure Cryptography: State of the Art and Industrial Applications Pascal Paillier Gemplus/R&D/ARSC/STD/Advanced Cryptographic Services FrenchJapanese Joint Symposium on Computer Security Outline
More informationarxiv:1112.0829v1 [math.pr] 5 Dec 2011
How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly
More informationCapture Resilient ElGamal Signature Protocols
Capture Resilient ElGamal Signature Protocols Hüseyin Acan 1, Kamer Kaya 2,, and Ali Aydın Selçuk 2 1 Bilkent University, Department of Mathematics acan@fen.bilkent.edu.tr 2 Bilkent University, Department
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 2226 Jan 2007 1 Introduction In this course we will be looking
More informationCS 6903 Modern Cryptography February 20th, 2010
CS 6903 Modern Cryptography February 20th, 2010 Lecture 5: Constructions of Symmetric Encryption Scheme Instructor: Nitesh Saxena Scribe: SherinSyriac, AvisDsouza, SaiTeja Can we use the Pseudo Random
More informationCS 161 Computer Security
Song Spring 2015 CS 161 Computer Security Discussion 11 April 7 & April 8, 2015 Question 1 RSA (10 min) (a) Describe how to find a pair of public key and private key for RSA encryption system. Find two
More informationRemotely Keyed Encryption Using NonEncrypting Smart Cards
THE ADVANCED COMPUTING SYSTEMS ASSOCIATION The following paper was originally published in the USENIX Workshop on Smartcard Technology Chicago, Illinois, USA, May 10 11, 1999 Remotely Keyed Encryption
More informationChapter 4. Symmetric Encryption. 4.1 Symmetric encryption schemes
Chapter 4 Symmetric Encryption The symmetric setting considers two parties who share a key and will use this key to imbue communicated data with various security attributes. The main security goals are
More informationCryptography: Authentication, Blind Signatures, and Digital Cash
Cryptography: Authentication, Blind Signatures, and Digital Cash Rebecca Bellovin 1 Introduction One of the most exciting ideas in cryptography in the past few decades, with the widest array of applications,
More informationCh.9 Cryptography. The Graduate Center, CUNY.! CSc 75010 Theoretical Computer Science Konstantinos Vamvourellis
Ch.9 Cryptography The Graduate Center, CUNY! CSc 75010 Theoretical Computer Science Konstantinos Vamvourellis Why is Modern Cryptography part of a Complexity course? Short answer:! Because Modern Cryptography
More informationSecurity Analysis for Order Preserving Encryption Schemes
Security Analysis for Order Preserving Encryption Schemes Liangliang Xiao University of Texas at Dallas Email: xll052000@utdallas.edu Osbert Bastani Harvard University Email: obastani@fas.harvard.edu ILing
More informationCryptography CS 555. Topic 3: Onetime Pad and Perfect Secrecy. CS555 Spring 2012/Topic 3 1
Cryptography CS 555 Topic 3: Onetime Pad and Perfect Secrecy CS555 Spring 2012/Topic 3 1 Outline and Readings Outline Onetime pad Perfect secrecy Limitation of perfect secrecy Usages of onetime pad
More informationAnalysis of PrivacyPreserving Element Reduction of Multiset
Analysis of PrivacyPreserving Element Reduction of Multiset Jae Hong Seo 1, HyoJin Yoon 2, Seongan Lim 3, Jung Hee Cheon 4 and Dowon Hong 5 1,4 Department of Mathematical Sciences and ISaCRIM, Seoul
More informationLecture 2: Complexity Theory Review and Interactive Proofs
600.641 Special Topics in Theoretical Cryptography January 23, 2007 Lecture 2: Complexity Theory Review and Interactive Proofs Instructor: Susan Hohenberger Scribe: Karyn Benson 1 Introduction to Cryptography
More informationCS Asymmetrickey Encryption. Prof. Clarkson Spring 2016
CS 5430 Asymmetrickey Encryption Prof. Clarkson Spring 2016 Review: block ciphers Encryption schemes: Enc(m; k): encrypt message m under key k Dec(c; k): decrypt ciphertext c with key k Gen(len): generate
More informationDatabase design theory, Part I
Database design theory, Part I Functional dependencies Introduction As we saw in the last segment, designing a good database is a non trivial matter. The E/R model gives a useful rapid prototyping tool,
More informationSecure LargeScale Bingo
Secure LargeScale Bingo Antoni MartínezBallesté, Francesc Sebé and Josep DomingoFerrer Universitat Rovira i Virgili, Dept. of Computer Engineering and Maths, Av. Països Catalans 26, E43007 Tarragona,
More informationProofs in Cryptography
Proofs in Cryptography Ananth Raghunathan Abstract We give a brief overview of proofs in cryptography at a beginners level. We briefly cover a general way to look at proofs in cryptography and briefly
More informationNetwork Security. Abusayeed Saifullah. CS 5600 Computer Networks. These slides are adapted from Kurose and Ross 81
Network Security Abusayeed Saifullah CS 5600 Computer Networks These slides are adapted from Kurose and Ross 81 Public Key Cryptography symmetric key crypto v requires sender, receiver know shared secret
More informationLecture 15  Digital Signatures
Lecture 15  Digital Signatures Boaz Barak March 29, 2010 Reading KL Book Chapter 12. Review Trapdoor permutations  easy to compute, hard to invert, easy to invert with trapdoor. RSA and Rabin signatures.
More informationComputing Range Queries on Obfuscated Data
Computing Range Queries on Obfuscated Data E. Damiani 1 S. De Capitani di Vimercati 1 S. Paraboschi 2 P. Samarati 1 (1) Dip. di Tecnologie dell Infomazione (2) Dip. di Ing. Gestionale e dell Informazione
More informationNonBlackBox Techniques In Crytpography. Thesis for the Ph.D degree Boaz Barak
NonBlackBox Techniques In Crytpography Introduction Thesis for the Ph.D degree Boaz Barak A computer program (or equivalently, an algorithm) is a list of symbols a finite string. When we interpret a
More information6.857 Computer and Network Security Fall Term, 1997 Lecture 4 : 16 September 1997 Lecturer: Ron Rivest Scribe: Michelle Goldberg 1 Conditionally Secure Cryptography Conditionally (or computationally) secure
More informationOutline. CSc 466/566. Computer Security. 8 : Cryptography Digital Signatures. Digital Signatures. Digital Signatures... Christian Collberg
Outline CSc 466/566 Computer Security 8 : Cryptography Digital Signatures Version: 2012/02/27 16:07:05 Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2012 Christian
More informationNetwork Security: Cryptography CS/SS G513 S.K. Sahay
Network Security: Cryptography CS/SS G513 S.K. Sahay BITSPilani, K.K. Birla Goa Campus, Goa S.K. Sahay Network Security: Cryptography 1 Introduction Network security: measure to protect data/information
More informationAn example of a computable
An example of a computable absolutely normal number Verónica Becher Santiago Figueira Abstract The first example of an absolutely normal number was given by Sierpinski in 96, twenty years before the concept
More informationYale University Department of Computer Science
Yale University Department of Computer Science On Backtracking Resistance in Pseudorandom Bit Generation (preliminary version) Michael J. Fischer Michael S. Paterson Ewa Syta YALEU/DCS/TR1466 October
More informationContents. Foundations of cryptography
Contents Foundations of cryptography Security goals and cryptographic techniques Models for evaluating security A sketch of probability theory and Shannon's theorem Birthday problems Entropy considerations
More informationData Encryption A B C D E F G H I J K L M N O P Q R S T U V W X Y Z. we would encrypt the string IDESOFMARCH as follows:
Data Encryption Encryption refers to the coding of information in order to keep it secret. Encryption is accomplished by transforming the string of characters comprising the information to produce a new
More information