Automated Analysis of Security APIs. Amerson H. Lin

Size: px
Start display at page:

Download "Automated Analysis of Security APIs. Amerson H. Lin"


1 Automated Analysis of Security APIs by Amerson H. Lin Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2005 c Massachusetts Institute of Technology All rights reserved. Author... Department of Electrical Engineering and Computer Science May 18, 2005 Certified by... Ronald L. Rivest Professor Thesis Supervisor Accepted by... Arthur C. Smith Chairman, Department Committee on Graduate Students

2 2

3 Automated Analysis of Security APIs by Amerson H. Lin Submitted to the Department of Electrical Engineering and Computer Science on May 18, 2005, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Attacks on security systems within the past decade have revealed that security Application Programming Interfaces (APIs) expose a large and real attack surface but remain to be a relatively unexplored problem. In 2000, Bond et al. discovered APIchaining and type-confusion attacks on hardware security modules used in large banking systems. While these first attacks were found through human inspection of the API specifications, we take the approach of modeling these APIs formally and using an automated-reasoning tool to discover attacks. In particular, we discuss the techniques we used to model the Trusted Platform Module (TPM) v1.2 API and how we used Otter, a theorem-prover, and Alloy, a model-finder, to find both APIchaining attacks and to manage API complexity. Using Alloy, we also developed techniques to capture attacks that weaken, but not fully compromise, a system s security. Finally, we demonstrate a number of real and near-miss vulnerabilities that were discovered against the TPM. Thesis Supervisor: Ronald L. Rivest Title: Professor 3

4 4

5 Acknowledgments Firstly, I would like to thank my advisor, Professor R. Rivest, for his guidance, advice and ideas. Secondly, I would also like to appreciate the rest of the Cryptographic API research group: Ben Adida, Ross Anderson, Mike Bond and Jolyon Clulow for their suggestions and comments on earlier drafts. In particular, I would like to express special gratitude to Mike Bond for working so closely with me. Next, I would also like to thank the Cambridge-MIT Institute for funding this research project and providing ample opportunities for collaboration. Lastly, I want to dedicate this thesis to my friends and family who have given me love and support throughout this past year. 5

6 Contents 1 Introduction Problem with Security APIs General Approach Our Contributions Previous Work A Taste of Security API vulnerabilities Automated Analysis with Otter Techniques and Tools Input/Output API Chaining Modeling Stateful APIs Managing API Complexity Partial Attacks Tools Otter and Alloy Otter a Theorem Prover Overview First-Order Logic Resolution Set-of-support Strategy Attacks Modeled with Otter VSM

7 4.2.2 IBM 4758 CCA Alloy Analyzer a Model Finder Overview Relational Logic Alloy Models Attacks Modeled with Alloy Triple-mode DES PIN Decimalisation Trusted Platform Module History of Trusted Computing Trusted Platforms Use Scenarios TPM Design Features Trusted Platform Module API Security Analysis of the TPM Initialization Key Management Protected Storage Delegation Authorization Model Integration Insecure TPM Implementations Insecure Implementation X Insecure Implementation Y Insecure Implementation Z

8 9 Conclusions and Future Work Conclusions Tool Comparison TPM security Future Work A Partial Attacks 95 A.1 Triple-mode DES B TPM Models 98 B.1 Authorization B.2 Key Management and Protected Storage List of Figures 5-1 Alloy Analyzer - model of encryption ECB ECB OFB triple-mode Roots of trust in a TPM-enabled PC architecture TPM Storage key hierarchy Key-handle switching attack

9 List of Tables 5.1 Set and logical operators used in Alloy TPM Permanent Flags TPM Key Types Delegation Table fields Family Table fields Otter statistics for the integrated model

10 Chapter 1 Introduction In this connected age, an information security system such as an online banking application has to not only offer services to customers and clients but must also give them assurance of privacy and security across various usage scenarios and threat models. This complexity of this task often results in the system relying on a specialized hardware or software module for security services through a security Application Programming Interface. An Application Programming Interface (API) is a set of commands that package and expose complex functionality to external clients in a simple way. These include commands that effect the features directly as well as any other initialization and management commands. Clients can use these commands as building blocks for software applications without having to worry about the underlying details, which are abstracted from them. For example, a client can simply call the command ConnectToHost(host) to establish a http connection without having to deal with opening the network sockets and performing checksums on the received packets. A good example of a software API is the Java Standard Development Kit API. A security API has the same goal of abstraction, except that the commands in a security API are mostly aimed at providing security services and come with multiple implicitly or explicitly stated security policies. A security policy is a constraint on the behaviour of the command that guarantees security. For example, an API command in a banking system could be SecureTransfer(Integer x, Account acc1, Account 10

11 acc2), which transfers x dollars from acc1 to acc2. There is an implicit assumption here that the data sent to the bank cannot be tampered with and the command may achieve this by opening a Secure-Sockets Layer connection with the bank, although this information is abstracted from the user. This assumption represents a security policy of this command. A good example of a software security API is the Microsoft Cryptographic API. 1.1 Problem with Security APIs In the past decade, the increasing awareness of the importance of security not only fueled the development of security modules but also encouraged better security engineering many security standards were established, the Trusted Computing [1] initiative was started and Microsoft even made security its biggest concern. As a result, cryptographic primitives, such as key generation and hashing algorithms, encryption and decryption schemes, have undergone rigorous analysis and improvement. In fact, based on past lessons, security engineers know to use established primitives like RSA. Protocols, sequences of steps that setup a working environment such as a key exchange, have also undergone much scrutiny and development, especially after the famous attack on the Needham-Schroeder protocol in 1995 [15]. In the same manner, system designers are careful to use established protocols such as Kerberos or SSL. Despite these efforts, information security systems reliant on security modules are constantly being broken. We believe that this is often due to poor security API design, which has not received a similar degree of attention as compared to cryptographic primitives and protocols. In fact, Anderson in Why Cryptosystems Fail [2] pointed out that many of the failures of retail banking systems were not caused by cryptanalytic or technical weaknesses but by implementation errors and management failures. Our conviction began when Bond and Anderson found API level attacks on many crypto-processors like the Visa Security Module (VSM) and the IBM 4758 Common 11

12 Cryptographic Architecture (CCA) [17] in They exploited the VSM s key-typing system, where key types were used to distinguish keys intended for different uses. By manipulating the key-type information, they were able trick the system into accepting an invalid key. They also found a similar attack on the IBM 4758 CCA, which is used in bank networks. In this case, the 4758 XORed key-typing information into key material. Using the XOR properties, they were able to tweak the key-typing information to pass off an invalid key as a valid one for a particular operation. In a short span of a few years, more attacks were found on the Crysalis and PRISM systems [6], giving more evidence that the problem with security APIs was indeed a real one. The crux of the problem, it seemed, was that security policies could be subverted when API commands were chained in a totally unexpected manner. For a system with a handful of commands, protecting against these attacks would not be a problem. But what about complex real systems that have operations working across 4-5 modules? Another problem was that the security policies for the system were not translated down to API-level requirements that could be checked easily by the software developers. We believe that the addition of security statements at API-level documentation is a good start to solving the security API problem. Despite the likelihood of vulnerability, security APIs have not received due academic attention. This thesis explores this problem with formal automated analysis. 1.2 General Approach Many of the earlier security API vulnerabilities were found through careful human inspection and analysis, as Bond and Anderson did with the VSM. With larger systems containing more API operations, this requires thinking through an exponentially large search space and verification becomes a tedious and unforgiving process. The most obvious response is to employ mathematics and formal methods together with automation. But hasn t this already been done? Many ask if security API analysis is all that 12

13 different from security protocol analysis. We argue that APIs present a tremendously more complex problem since API operations could potentially use cryptographic primitives, initiate security protocols, interact with the network and operating system and even manipulate non-volatile storage. Furthermore, security APIs tend to have extensive key management features, something that protocols generally do not. Therefore, we argue that the analysis of APIs is necessarily more complex and will require greater considerations than those of protocols. Nevertheless, studying the analysis of security protocols gave us a wide variety of techniques and tools to choose from. The formal methods used in analyzing security protocols range from belief-based approaches like the BAN logic model [18], to model-checking approaches that started with the Communicating Sequential Processes (CSP) [11] and Dolev-Yao models [8], to proof-based approaches such as induction in reasoning about sequences of steps. We needed a language that could describe API commands and a tool that could search through chains of these operations, returning the exact execution trace if an attack was found. This led us to current choice of Otter, a theorem-prover and Alloy, a modelfinder. Theorem-provers find a series of logical statements (i.e. a proof), derived from an initial set of axioms, that prove a particular statement. We model API commands as logical implications and attacks as goal statements. The tool then searches for a proof of the goal statement. If it terminates and finds one, the series of implications will represent the sequence of API calls for the attack. Model-finders work very differently; a model-finder takes a set of constraints as a model and attempts to find an instance that satisfies it. We model the commands and the goal as constraints. If the tool terminates and finds a satisfying instance, the objects in the instance will reveal the sequence of API calls that constitute the attack. However, all automated analysis can only be done within bounds. Otter s proof search may terminate with no proof or may not even terminate at all. Likewise, Alloy requires a user-specified bound and not finding a satisfying instance within those bounds does not preclude an attack. In either case, we can only state that an attack is unlikely and nothing stronger a fundamental shortcoming of bounded 13

14 automated analysis. Despite this deficiency, we were able to model some previously known attacks in Otter. Then, we saw success with this approach when Otter found an extremely complicated near-miss attack on the IBM 4758 CCA that was prevented only because the product did not fully conform to its advertised behaviour. Later in this thesis, we will present other successes of this general approach. 1.3 Our Contributions Our contribution to the Cryptographic API group is three-fold: exploring the Alloy model-finder, understanding the Trusted Platform Module (TPM) API and developing new API formal analysis techniques. Alloy: This lightweight model-finder was designed to analyze small specifications of software systems. We used it to analyze feature level policies, manage API complexity and even model information leakage, tasks that do not play to the strengths of Otter. Trusted Platform Module: The TPM is the Trusted Computing Group s hardware security device, the answer to Trusted Platform Computing. This was a worthwhile target since giants like National Semiconductor and Infineon had already started producing TPM chips and IBM had already incorporated them into their Thinkpad laptops. While we were unable to find attacks consisting of long chains of commands, our analysis revealed some security flaws in the design and documentation. New Techniques: With a new tool and a new target API, we were able to develop a number of new approaches in analyzing security APIs. These include an extension to the API chaining technique and techniques to model stateful APIs, manage API complexity and model partial attacks attacks that weaken, but not fully compromise, the security of the system. Interestingly, we find that in terms of finding attacks, the discipline of understanding an API sufficiently to model it with a technique is often more effective than the technique itself. 14

15 The rest of this thesis is structured as follows. Chapter 2 reviews previous modeling work done by this research group. Chapter 3 describes the approaches we used in our analyses. Chapters 4 and 5 describe the automated-reasoning tools Otter and Alloy respectively. Chapter 6 introduces the TPM and chapter 7 presents our analysis on the TPM s API, with some of our findings reported in chapter 8. Finally, we conclude in chapter 9 with some discussion and suggestions for future work. 15

16 Chapter 2 Previous Work BAN logic, introduced by Burrows, Abadi and Needham in 1989 [18], started the work of formal analysis on security protocols. Meadows further developed the notion of automated verification with the NRL protocol analyzer [19]. Then in 1995, Lowe s successful attack and fix on the Needham-Schroeder protocol [15] highlighted the necessity for protocol verification. Despite these efforts in analyzing security protocols, because of the differences between protocols and APIs already noted, verification of security APIs did not begin until Bond and Anderson attacked a number of security APIs in retail banking crypto-processors in 2000, including the VISA hardware security module (VSM) and the IBM 4758 Common Cryptographic Architecture (CCA) [17]. The attacks were made possible through an unexpected chain of API calls. In 2004, research teams in both Cambridge University and MIT, jointly known as the Cryptographic API group, began to direct research efforts towards analyzing security APIs using automated reasoning tools. 2.1 A Taste of Security API vulnerabilities For a flavour of the nature of security API vulnerabilities, let us take a look at the attacks on the VISA Security Module and the IBM 4758 CCA, which were found by Bond et al. [6]. We first consider the VSM, a cryptoprocessor with a concise API set designed to 16

17 protect Personal Identification Numbers (PINs) transmitted over the banking network supported by VISA. It employs a key-typing system, where key-types are used to distinguish keys meant for different uses. The VSM s Encrypt Communcations Key command takes in a key and types it as a communication key. TheTranslate Communications Key command takes a communication key and encrypts it under a terminal master key a master key of an Automated Teller Machine (ATM). The VSM s failure to distinguish between a PIN derivation key and a terminal master key made it possible to type a bank account number as a communications key using Encrypt Communications Key and then encrypt it under the PIN derivation key using Translate Communications Key, giving the attacker the bank account number encrypted under the PIN derivation key the PIN associated with that bank account! Now we consider the IBM 4758 CCA, an API implemented by the majority of IBM s banking products. It has 150 functions and supports a wide range of banking operations. In particular, it uses a key-typing mechanism where keys are typed by encrypting them under a key control vector XORed with the master key. A key control vector is a simple string containing type information. At the same time, it supports a Key Part Import transaction that allows a module to combine partial keys to form the final key, thereby allowing a policy of shared control. The problem is that Key Part Import also uses the XOR operation as part of its computation, allowing adversaries to XOR cleverly chosen values with partial keys and then using the Key Part Import operation to re-type the key into a different role. Constructing these keys formed the starting point for a number of attacks [6]. 2.2 Automated Analysis with Otter While the previous attacks were found by hand, effort in modeling API operations in first-order logic (FOL) showed that attacks could be found more efficiently by machine. For example, the general decrypt operation could be stated as the implication If A knows the value of DATA encrypted under key K, A knows the value of the key K, he can decrypt to get DATA, which can be subsequently written as the FOL 17

18 clause: A(E(DAT A, K)) A(K) A(DAT A) where the predicate A(x) represents that x is a value known to the attacker and E(x, y) represents to the value of x encrypted under y. Finally, we assert that some values y are secret by writing A(y). Then, the theorem-prover will try find a contradiction to A(y), thereby producing an attack. This method proved to be extremely valuable when we were able to verify known attacks on the IBM 4758 but the real success came when Otter discovered a variant of the powerful PIN derivation attack on the IBM Again, the same weakness: that the XOR operation was used both in key typing and secret sharing, was exploited. This variant required 12 steps and involved 8 different commands, attesting to the importance of the role of automated verification in discovering intricate and complicated attacks. More detail on this modeling approach, including techniques to reduce search space explosion, can be found in chapter 4. 18

19 Chapter 3 Techniques and Tools Any practical automated reasoning approach consists of a technique and a tool that effectively executes the technique. Our aim was to use established techniques, with a few enhancements, in a less established field of research. These include representing values symbolically and modeling the attacker s knowledge set. This way, we could harness the power of techniques that had been proven to work. Next, we had to find tools to execute these techniques. Our choice of Otter, a theorem-prover, and Alloy, a model-finder, allowed us to achieve success with our techniques, especially since the tools complemented one another. This chapter summarizes these techniques and the tools used to effect them. 3.1 Input/Output API Chaining This technique stems from the work that Youn et al did on the IBM 4758 CCA [24], where the first-order logic model of the 4758 s API commands allowed the tool to chain commands together in a many-to-many fashion. Basically, the model works by defining a set of attacker knowledge, a set of possible actions and a set of desired goals. The attacker is then allowed to feed any of his known values into the commands and add any output values to his set of knowledge. This framework captures the attacker s initial knowledge, his ability to use the system API methods, his ability to perform typical primitive operations (e.g. XOR, concatenation, encrypt, decrypt) and the 19

20 scenarios that constitute attacks. This way, an exploration of the entire search space would yield any potential API-level vulnerabilities. We extend this idea by noticing that APIs tend to re-use data structures for various commands. For example, all encrypted blobs, be they encryptions of data or keys, may be returned to the user in the same data structure. This can create opportunities for input/output confusion: an API command expecting an encrypted key could be passed an encryption meant for another command. Our new approach captures these possibilities by under-constraining the inputs and outputs to each API call. 3.2 Modeling Stateful APIs Many systems maintain some kind of state in memory. The problem with modeling state is that it greatly increases the search space of the automated tool. If there are k bits of state, the search space could potentially increase by a 2 k factor. On the other hand, a stateless model of a stateful API gives the attacker more power; any real attacks will still be found but the analysis may also yield attacks that are possible in the model but impossible in the real system. Therefore, capturing state in our models is important because it will constrain the attacker s abilities. However, many systems retain a lot of state information. We propose that a way of managing this complexity is to model state based on how it will constrain the attacker s abilities. Firstly, we can do away with all state that is not security-sensitive, state that does not have security related dependencies. A good example is statistical data that is kept solely for performance benefits. However, it is important to note that deciding whether state is security-sensitive requires an in-depth understanding of the API s threat model. It is possible that a security-insensitive flag may affect the operation of a security-sensitive command, thereby making it actually securitysensitive. Secondly, we can partition all security-sensitive state objects into either loose or bound state. In vague terms, the attacker can independently manipulate any state that is not bound to the rest of the system - this state is described as loose. 20

21 All other state is considered bound. Loose-state objects are objects that can take on any value regardless of the values in the remaining state objects. In other words, they are not bound to the rest of the configuration and the attacker can independently manipulate their values. A good example is a system flag that can be flipped at any time whatsoever. Boundstate objects, on the other hand, take on values that are bound to the rest of the configuration, values that the attacker cannot freely manipulate. Consider two flags in the system bound by the following policy: the second can be set only if the first is set. In this case, the first is loose-state while the second flag is bound-state. It is important to note that deciding whether state objects are loose or bound also requires understanding the threat model of the API. Later in this thesis, we present a method for modeling state effectively by grouping types of state together and representing them in Otter using multiple predicates. Then, all the state is moved through each API command with an over-arching state predicate that captures these multiple predicates. This way, commands affecting a small portion of the state only need to be concerned with the one or two predicates. This technique can be seen in section Managing API Complexity A key component to analyzing a system and its API is managing complexity. Automated tools, however powerful, still suffer from search-space explosion. Although some tools are intelligent enough to automatically prune their search-space (e.g. by recognizing symmetries), the best solution is still for the user to manage complexity at the level of the model understanding what can be left out without compromising the analysis. The first strategy that we present is targeting. Based on experience analyzing security systems, we find that vulnerabilities usually occur across: Trust boundaries: Multiple systems functioning in the real world represent different trust domains. In a company, for example, one system could belong 21

22 to the legal department and another to the IT department, which has a very different security policy. However, the two departments may need to share sensitive information. Such functionality is a likely vulnerability since it is always difficult for the sender to maintain control over information once it is in the hands of the receiver. Module boundaries: Modules are a natural way to manage complex systems and many systems use some form of key-typing to bind key material to a particular module. This is because an operation may need to know whether a particular key is a master key, a communications key or a storage key. However, key material that is passed between modules could be maliciously manipulated to produce an attack. The VSM and IBM 4758 CCA attacks discussed in the last chapter revealed such weaknesses. Version boundaries: A commercial system that rolls out upgrades or patches must still support old versions for legacy reasons and it is not uncommon that the development teams are different for two different versions. Poor documentation and the necessary shoe-horning in new versions required to ensure backward compatibility make it likely that security policies are overlooked in the process. In general, enforcing security policies across boundaries often requires a complex solution and complex solutions are often seen to break. Understanding these boundaries will allow the user to first, breakdown the modeling problem into smaller problems without the fear of excluding too many possible attacks and second, focus only on subsections of an API. The second strategy is one familiar to mathematicians: reductions. Many system APIs are unnecessarily complex because they offer multiple ways to perform the same task. While it is true that these multiple ways could potentially offer more opportunities for vulnerabilities, it may also be the case that these multiple ways are actually equivalent. In these cases, the model need only consider the simplest one. A proof by hand of their equivalence may be sufficiently tedious and complicated to 22

23 deter designers. In chapter 7, we demonstrate how an automated tool can help us with these reductions. 3.4 Partial Attacks Partial attacks can be described as the weakening of security, which differ significantly from the attacks we were trying to model earlier, where security was completely compromised. Partial attacks often lead the way for brute-force attacks to succeed much more quickly. Our work has been focused on attacks that discover more-than-intended information about a secret, or what we term information leakage. Our general technique is to create models of systems that operate on reduced-size secrets so that the automated tool can reason about the secrets directly examples include using 8-bit numbers instead of 64-bit numbers, single-digit Personal Identification Numbers (PINs) instead of 4-digit PINs. Within this technique, we have also developed two frameworks for modeling systems in order to discover such weaknesses: Entropy Reduction: In this approach, the API of the system is modeled such that the automated tool is able to reason about the input/output (I/O) characteristics of each method. I/O characteristics can range from linearlydependent output to completely random output. Subsequently, systems that use encryption to protect secrets can be analyzed for information leakage by analyzing the I/O characteristics after a chain of operations. By describing I/O properties that represent a reduction in entropy, we can then ask the tool to find a sequence of events (i.e. an attack) that leads to such a characteristic. We used this technique to analyze the ECB ECB OFB triple-mode block-cipher, which will be described in greater detail in section 5.2. Possibility Sets: This involves modeling APIs using variables with values that belong to a possibility sets, representing the attacker s possible guesses. Information leakage can clearly be observed if the number of possibilities, say for a 23

24 password, reduces more quickly than it should with a series of cleverly chosen API calls. We employed this approach to model the discovery of PINs with repeated API calls to the IBM 4758 cryptoprocessor, where each call leaked enough information for the attacker to perform an efficient process of elimination. This is also detailed in section Tools Otter and Alloy Our tool suite now includes both a theorem-prover, Otter, and a model-finder, Alloy. Both tools are based on first-order logic, so quantification is done only over atoms. Otter takes a set of logical statements and tries to prove a goal statement based on them. Its first application was to produce the derivations and proofs of mathematical theorems. Alloy is a declarative model-finder, based on first-order relational logic, designed to be a lightweight tool for reasoning about software and safety-critical systems. It takes a set of constraints and attempts to find a model that satisfies these constraints using a powerful SAT solver. Before understanding in detail how Otter and Alloy work in sections 4 and 5, let us first describe their relative strengths: State: State can be modeled in Otter by tagging each clause with a state predicate that changes with each operation. This can be cumbersome when the amount of state increases. In Alloy, state is modeled by adding an extra dimension to all the relations. While this is a natural use of Alloy, the model requires an extra frame condition for each additional state object since all objects have to be explicitly constrained to stay the same across states when not acted upon by any operation. Chaining: Otter was naturally designed for derivation and thus is very well suited to finding API-chaining attacks. Also, once an attack is found, Otter s proof trace will tell us exactly how the attack occurred. On the other hand, when Alloy finds an instance containing a violation of a policy, the instance 24

25 only shows the objects that constitute the violation and not how the violation arose. In most cases, however, discovering the attack trace is relatively easy. Composition: Otter is excellent at API-chaining within a single system but when an analysis is required against multiple systems, these systems have to be explicitly created in the model. In Alloy, however, the model-finding approach ensures that all possible combinations of multiple systems (up to a specified bound) are searched. We will cover Otter and Alloy in greater detail, including examples and sample code, in the next two chapters. 25

26 Chapter 4 Otter a Theorem Prover 4.1 Overview The traditional theorem-prover uses a process that starts with a set of axioms, produces inferences rules using a set of deductive rules and constructs a proof based on a traditional proof technique. Otter (Organized Techniques for Theorem-proving and Effective Research) is a resolution-style theorem-prover based on the set-of-support strategy, designed to prove theorems written in first-order logic (FOL) by searching for a contradiction. While the main application of Otter is pure mathematical research, we found its deduction system to be very suitable in modeling APIs and searching through chains of API sequences. In this overview of Otter, we will use a simple model of encryption as an illustration of the modeling approach. We will also describe some techniques for managing search-space explosion in Otter First-Order Logic At the core of Otter lies FOL, resolution and the set-of-support strategy. We will describe these concepts one at a time. To get a grasp of what FOL is, let us consider where it falls in the three basic categorizations of logic: zeroth-order logic, first-order logic and higher-order logic. Zeroth-order logic, also known as propositional logic, is reasoning in terms of true or false statements. First-order logic augments it by adding 26

27 three notions: terms, predicates and quantifiers. These allow us to reason about quantified statements over individual atoms 1. Higher-order logic further extends the expressiveness of first-order logic by allowing quantification over atoms and sets of atoms. The following terminology used in Otter s FOL language syntax will be useful in understanding this chapter: constant any symbol not starting with {u,v,w,x,y,z} variable any symbol starting with {u,v,w,x,y,z} term a term is either a variable, constant or a function symbol f(t 1,, t n ) applied to term(s). predicate a predicate P expresses a property over term(s) and is either true or false atom an atom is a predicate applied to term(s) (e.g. P(x)) literal a literal is an atom or the negation of an atom (e.g. P(x)) clause a clause is a disjunction of literals (e.g. P(x) Q(x) R(x)). A clause can have no literals, in which case it is the empty clause. With this language, we can express many cryptographic operations in a symbolic way. For example, in most of our Otter models, we use the predicate U(x) to express that the user knows value x and the binary predicate E(x, k) to represent the encryption of x under key k. Now that we can read statements written in FOL, we can start to describe Otter s reasoning process Resolution Resolution [21] was developed as a method for quickly finding a contradiction within a set of clauses by searching to see if the empty clause can be derived. When this happens, the set of clauses cannot be satisfied together (since the empty clause is 1 Most of mathematics can be symbolized and reasoned about in FOL 27

28 always false) and the set of clauses is termed unsatisfiable. The following algorithm describes how resolution works: 1. find two clauses p = (p 1 p n ) and q = (q 1 q n ) containing the same predicate literal, negated in one but not in the other. Let us say predicates p i and q j are the same. 2. unify the two predicates, producing θ, the substitution that will make p i equal q j symbolically (e.g. f(x) and f(v) unify with the substitution x/v). 3. perform a disjunction of the two clauses p, q and discard the two unified predicates: (p 1 p i 1 p i+1 p n q 1 q j 1 q j+1 q n ) 4. substitute any remaining variables using θ, producing the final resolvent 5. add the resolvent to the set of clauses Resolution aims to derive the empty clause {}, which can only happen when two completely complementary clauses are resolved (e.g. P(x) and P(x)). It is easy to see that when that happens, step 3 will produce a clause with no literals. It is also important to note that resolution is a sound inference rule and resolution-refutation, which proves a theorem by negating the goal statement to be proved and producing a proof-by-contradiction on the goal statement, is also complete. In Otter s FOL language, we write statements as clauses that contain a disjunction (OR) of literals. Consider the encryption operation written as a disjunction of literals: -U(x) -U(KEY) U(E(x,KEY)) where the predicate U(x) represents that the user knows value x and the term E(x, y) represents the value of x encrypted under y. In Otter s syntax, there is an implicit universal quantification over variables. Therefore, this says that if the user knows any value x and the constant KEY, he can encrypt x with KEY. With all statements written in this form, Otter tries to prove that this set of clauses is unsatisfiable using 28

29 resolution. When it does find a proof, Otter prints out the trace of the conflicting clauses. Otter supports a number of inference rules, including binary-resolution and hyper-resolution. Binary-resolution is resolution performed on clauses with one positive and one negative literal (e.g. P(x) Q(x)). By clashing P(x) with P(x), we can resolve to obtain Q(x). Hyper-resolution generalizes this process to clauses with multiple negative literals and one positive literal and is thus more efficient Set-of-support Strategy Otter s basic deductive mechanism is the given-clause algorithm, a simple implementation of the set-of-support strategy [14]. Otter maintains four lists for this algorithm: usable contains clauses that are available for making inferences sos contains clauses that are waiting to participate in the search 2 (i.e. they must be transferred into the usable list first). passive contains clauses used for forward subsumption and contradiction. demodulators contains equalities used to rewrite newly inferred clauses The given-clause algorithm is basically a loop: while (sos is not empty) AND (empty clause has not been derived) 1) select the best clause in sos to be the given_clause 2) move given_clause from sos to usable 3) infer all possible clauses using the given_clause 4) apply retention tests to each newly inferred clause 5) place the retained clauses in sos In step (3), it is important to note that any inferences made must use the given clause as a parent. In step (4), the retention tests include demodulation and subsumption. 2 sos stands for set-of-support 29

30 Demodulation is the rewriting of newly inferred clauses into a canonical form (e.g. rewriting add(add(1,2),3) to add(1,add(2,3)) ) and subsumption is the discarding of newly inferred clauses in favour of a more general clause (e.g. discarding P(1) since P(x) had already been derived) 4.2 Attacks Modeled with Otter This section describes previous work done on modeling the Visa Security Module (VSM) and IBM 4758 Common Cryptographic Architecture (CCA) security APIs using Otter [24]. The VSM and IBM 4758 CCA are hardware-based cryptoprocessors used in bank Automated Teller Machine (ATM) networks to enable two important secure-banking features: first, communication between the bank and the ATM and second, Personal Identification Number (PIN) verification. Underlying these features are a few building blocks: symmetric-key encryption, a unique master key, key-sharing and key-typing. Key-sharing is the process of breaking a key into parts such that the key can only be fully reconstructed when all parts are present. Key-typing is the process of tagging on meta-data describing the key s intended use and usage domain. For example, data keys are used for encrypting and decrypting data and communication keys are used to protect session keys. To authenticate the customer present at the ATM, banks issue a PIN to each customer that is cryptographically derived from their Primary Account Number 3 (PAN) by encrypting the customer s PAN with a PIN-derivation key 4. When customers change their PIN numbers, the numerical difference (i.e. the offset) is stored in the clear on the bank card. During authentication, the crypto-processor must then be able to compute a customer s PIN, add the offset, and then compare this value with the numbers keyed in by the customer. To prevent fraud, both the PIN and the PIN-derivation key must be protected. 3 the primary account number is also known as the bank account number 4 there are other PIN generation and verification schemes 30

31 General Technique Both the VSM and the CCA were modeled using the same basic technique: API calls were modeled as implication clauses in the usable list and the attacker s initial knowledge was written as clauses in the sos list. The inputs to the API command were written as negative literals and the output was written as the single positive literal. If the API command produced more than one output, more implication clauses for the same command were required. Then, Otter was enabled with hyper-resolution to enable the unification of the implication clause with multiple positive literals. It must be noted that this technique was developed initially by Youn in The encryption command is given below: set(hyper_res). set(sos_queue). list(usable). -U(x) -U(KEY) U(E(x,KEY)). end_of_list. list(sos). U(DATA). U(KEY). end_of_list. Here, set(hyper res) turns on the hyper-resolution inference rule and set(sos queue) instructs Otter to pick clauses from the sos list in the order in which they were put in. On the first iteration, Otter will pick U(DATA) and place it in the usable list. However, nothing new is inferred at this point. On the second iteration, Otter picks U(KEY) and this time, together with the two clauses in the usable list, it is able to produce the resolvent U(E(x,KEY)) with the substitution DATA/x, giving U(E(DATA,KEY)). 31

32 4.2.1 VSM The following attack on the VSM, briefly mentioned in 2.1, enables the attacker to derive any customer s original PIN number. To demonstrate the attack, it suffices to know that communication keys are only used to transfer data from module to module, master keys are used only to encrypt other keys and the PIN derivation key is used only to calculate the PIN from a PAN. The attack uses two API commands: Create Communications Key creates keys of type communication key from a user provided blob. Translate Communications Key re-encrypts communication keys under a user provided terminal master key. The Otter model is shown below, where the goal is to obtain the clauseu(e(acc,p)), which represents the attacker obtaining the encryption of an arbitrary account number under the PIN derivation key, yielding the customer s PIN. % Create_Communications_Key -U(x) U(E(x,TC)). % Translate_Communications_Key -U(E(x,TC)) -U(E(y,TMK)) U(E(x,y)). % User knows the account number U(Acc). % User knows the encrypted PIN-derviation key U(E(P,TMK)). % Negation of goal -U(E(Acc,P)). 32

33 Otter was able to derive the clause U(E(Acc,TC)) and then U(E(Acc,P)) to finally produce a resolution refutation (i.e. the empty clause) with -U(E(Acc,P)), even with many other extra commands added to the model, testifying to Otter s ability to search large spaces quickly IBM 4758 CCA The following attack on the IBM 4758 CCA, briefly mentioned in 2.1, also enables the attacker to derive any customer s original PIN number. The attack uses three API operations: Key Import uses an importer key to decrypt an encrypted key, thereby importing it into the system. Key Part Import adds a key part (split during key sharing) to the already existing key parts in the system, generating a full key. Encrypt uses a data key to encrypt arbitrary data. This attacks leveraged on key-typing, insufficient type-checks and XOR cancellation to use a legitimate command (Key Import) in an illegitimate way. Instead of describing the attack in detail, we focus on the modeling techniques that were developed by Youn [24] to control the exponential search space explosion in the model. For details of the attack, please refer to [24]. Knowledge Partitioning the predicates UN(x) and UE(x) both refer to the user knowledge of x but one refers to x as a plaintext value and the other refers to x as an encryption. By having UE(x) representing encrypted values, we prevent the repeated encryption of encrypted values since the encryption command only takes in non-encrypted values UN(x). This kind of partitioning can be very useful since it can dramatically reduce the branching factor. Demodulation the use of XORs in this security API presented a rather difficult problem: a clause containing XORs could be expressed in multiple 33

34 different ways. With the list of demodulators, Otter was able to rewrite every XOR-containing clause to its canonical form. Intermediate Clauses the inference rule for Key Import was broken down into two inference rules: the first produced an intermediate clause INTUE to be used an input to the second inference rule. This was introduced to solve the problem when the representation required for unification is not the canonical form (and when paramodulation 5 is not available). Lexicographic Weighting Otter s given-clause algorithm picks a givenclause from the sos list based on weight. By specifying a lexicographical ordering over symbols, we can guide Otter into picking certain clauses over others. A portion of the Otter model with these techniques in use is shown below: list(usable). % Key Import, INTUE() is the intermediate clause -UN(x) -UE(E(z,xor(IMP,KM))) INTUE(E(y,xor(z,x)),xor(x,KP)). -INTUE(E(y,x),xor(w,KP)) -UE(E(y,x)) UE(E(y,xor(w,KM))). % Key Part Import -UN(xor(x,KP)) -UN(y) -UE(E(z,xor(x,xor(KP,KM)))) UE(E(xor(z,y), xor(x,km))). % Encrypt -UN(x) -UE(E(y,xor(DATA,KM))) UE(E(x,y)). % Ability to XOR -UN(x) -UN(y) UN(xor(x,y)). end_of_list. 5 refer to the Otter manual [14] for an explanation of paramodulation 34

35 list(sos). % User has the third key part UN(K3). % User knows the DATA key type UN(DATA). % User has the encrypted PIN-derivation key UE(P,xor(KEK,PIN)). % XOR demodulators (rewrite rules) list(demodulators). xor(x,y) = xor(y,x). xor(x,xor(y,z)) = xor(y,xor(x,z)). xor(x,x) = ID. xor(id,x) = x. end_of_list. Although these methods all help to reduce the search-space explosion, there is the corresponding trade-off of Otter not visiting certain search paths, potentially missing certain attacks. 35

36 Chapter 5 Alloy Analyzer a Model Finder 5.1 Overview Alloy is a lightweight modeling language based on first-order relational logic [13], designed primarily for software design. The Alloy Analyzer [10] is a model-finder developed to analyze software models written in the Alloy language. Hereafter, we will use Alloy to refer to both the language and the tool. As an automated reasoning tool, Alloy can be described neatly in three parts, each of which will be described in greater detail later. logic Alloy uses first-order relational logic, where reasoning is based on statements written in terms of atoms and relations between atoms. Any property or behaviour is expressed as a constraint using set, logical and relational operators. language the Alloy language supports typing, sub-typing and compile-time type-checking, giving more expressive power on top of the logic. It also allows for generically-described modules to be re-used in different contexts. Also very useful is syntax support for three styles of writing Alloy model specifications that users can mix and vary at will: predicate-calculus style, relational-calculus style and navigational-expression style 1. Navigational-expression style, where expressions are sets and are addressed by navigating from quantified variables, 1 refer to the Alloy manual [13] for explanations and examples of these styles of writing 36

37 is the most common and natural. analysis Alloy a model-finder that tries either to find an instance of the model or a counter-example to any property assertions specified in the model. An instance is literally an instantiation of the atoms and relations in the model specification. It performs a bounded-analysis by requiring a user-specified bound on the number of atoms instantiated in the model. With this bound, it translates the model into a boolean satisfiability (SAT) problem. Then, it hands the SAT problem off to a commercial SAT solver such as Berkmin [9]. The resulting solution is then interpreted under the scope of the model and presented to the user in a graphical user-interface as show in Figure Relational Logic All systems modeled in Alloy are built from atoms, tuples and relations. An atom is an indivisible entity, a tuple is an ordered sequence of atoms and a relation is a set of tuples. Because Alloy s relational logic is first-order, relations can only contain tuples of atoms and not tuples of other relations. A good way of thinking about relations is to view them as tables. A table has multiple rows, where each row corresponds to a tuple and each column in a tuple corresponds to an atom. Then, a unary relation is a table with one column, a binary relation is a table with two columns and so on. Also, the number of rows in a relation is its size and the number of columns is its arity. A scalar quantity is represented by a singleton set containing exactly one tuple with one atom. To express constraints and manipulate relations, we use operators. Alloy s operators fall into three classes: the standard set operators, the standard logical operators and the relational operators. The standard set and logical operators listed in Table 5.1. The relational operators require a little more treatment. Let p be a relation containing k tuples of the form {p 1,, p m } and q be a relation containing l 37

38 symbol operator + union - difference & intersection in subset = equality! negation && conjunction disjunction Table 5.1: Set and logical operators used in Alloy tuples of the form {q 1,, q n }. p -> q the relational product of p and q gives a new relation r = {p 1,, p m, q 1,, q n } for every combination of a tuple from p and a tuple for q (kl pairs). p.q the relational join of p and q is the relation containing tuples of the form {p 1,, p m 1, q 2,, q n } for each pair of tuples where the first is a tuple from p and the second is a tuple from q and the last atom of the first tuple matches the first atom of the second tuple. ^p the transitive closure of p is the smallest relation that contains p and is transitive. Transitive means that if the relation contains (a, b) and (b, c), it also contains (a, c). Note that p must a binary relation and that the resulting relation is also a binary relation. *p the reflexive-transitive closure of p is the smallest relation that contains p and is both transitive and reflexive, meaning that all tuples of the form (a, a) are present. Again, p must be a binary relation and the resulting relation is also a binary relation. p the transpose of a binary relation r forms a new relation that has the order of atoms in its tuples reversed. Therefore, if p contains (a, b) then r will contain (b, a). p <:q the domain restriction of p and q is the relation r that contains 38

39 those tuples from q that start with an atom in p. Here, p must be a set. Therefore, a tuple {q 1,, q n } only appears in r if q 1 is found in p. p >:q the range restriction of p and q is the relation r that contain those tuples from p that end with an atom in q. This time, q must a set. Similarly, a tuple {p 1,, p m } only appears in r if p m is found in q. p++q the relational override of p by q results in relation r that contains all tuples in p except for those tuples whose first atom appears as the first atom in some tuple in q. Those tuples are replaced by their corresponding ones in q. Now, let us explore a brief tutorial on writing model specifications in Alloy Alloy Models To illustrate the basic modeling techniques in Alloy, we will consider a simple model of encryption. Defining object types An Alloy model consists of a number of signatures (sig), or sets of atoms that describe basic types in the model. Within these signatures, we can define relations that relate these basic types to one another. First, we define a set of Value and Key atoms. We also define a relation enc that relates a Key to a (Value, Value) tuple, resulting in tuples of the form (Key,Value,Value). Finally, we also define a singleton set K1 of type Key that contains a single atom. All this is expressed with the following: // define a set of Value objects sig Value {} // define a set of Key objects 39

40 sig Key { } enc: Value one -> one Value // singleton set K1 one sig K1 extends Key {} We use the enc relation to represent the encryption function: a value, encrypted under a key, results in some value. The one -> one multiplicity constraint states that the encryption relation (Value,Value) for each key is a one-to-one function (there are other multiplicity symbols that can be used to describe injections and partial functions). In this relation, the tuple (k 1, v 1, v 2 ) reads: v 1 encrypted under k 1 gives v 2 and this generalizes to the representation (key,plaintext,ciphertext) 2. Specifying constraints After defining object types, we want to be able to specify properties about their behaviour by specifying constraints. Although the encryption relation has been sufficiently defined, we may need to add certain constraints to prevent Alloy from producing trivial instances or counterexamples during its model-finding. One reasonable constraint is that two different keys have different encryption functions, which is not necessarily the case but is typically true. In other words, the encryption relations for any two keys must differ in at least one tuple. This can be written as a fact: fact { // for all distinct k, k chosen from the set Key all disj k, k : Key some (k.enc - k.enc) } 2 the encryption function can be expressed in other equivalent forms, such as (Value,Key,Value) 40

41 The expression k.enc is a relational join between the singleton relation k and the enc relation, giving a binary relation (Value,Value) that represents the encryption function of key k. The last statement "some (k.enc - k.enc)" says that there is at least one tuple in the difference of the encryption relations of two different key atoms. Running analyses With the model in place, we can proceed to perform some analyses. We can simulate the model by defining predicates on the model which can either be true or false depending on the instance. These are different from factual constraints, which are always required to be true in any instance. Simulation of a predicate is the process of instructing Alloy to find an instance for which the predicate is true. To obtain an instance of the model without imposing any further constraints, we instruct Alloy to simulate the empty predicate show(), which is effectively a non-constraint. In this case, Alloy just has to find an instance that satisfies the object definitions and their factual constraints. Since Alloy only performs bounded-analyses, we specify a maximum of 4 atoms in each object type (i.e. each sig). pred show() {} run show for 4 Figure 5-1 shows Alloy s graphical output for this model of encryption. We added the constraint #Value = 4 to enforce that there be exactly 4 Value objects in any instance. In the instance that was generated, Alloy has chosen one key (K1) and four values. The arrow labeled enc[value2] from K10 to Value3 means that Value2 encrypted under key K1 gives Value3. Alternatively, we can also define assertions on properties of the model that we believe to be true. Checking is the process of instructing Alloy to find an instance satisfying the model but not the assertion. For example, we might 41

42 Figure 5-1: Alloy Analyzer - model of encryption want to assert that the decryption functions of any two keys are different on at least one value since we believe it follows from the constraint that we put on encryption earlier. We write this as: assert decryption { // the transpose (~) represents the decryption relation all disj k,k : Key some (~(k.enc) - ~(k.enc)) } check decryption for 4 Alloy will then proceed to find an instance that satisfies the model but acts as a counterexample to the assertion, an instance where two different keys have the same decryption relation. All of Alloy s analyses have to be run within a size bound on the number of atoms in each object type. Alloy will check all models within this bound until a satisfying instance or counter-example is found. Given that the running time of the analysis increases exponentially with size, we can only check small models 42

43 3 in reasonable time. However, the philosophy behind Alloy is that if there are any inconsistencies with the model, they are extremely likely to appear even in small models. 5.2 Attacks Modeled with Alloy In this section, we describe how Alloy was used to model the cryptanalytic attack found against the DES triple-mode block-cipher ECB ECB OFB [5] and the guessing attack against the PIN-verification API in the IBM 4758 CCA [16]. This discussion aims to draw out the strengths of Alloy that make it an excellent complement to Otter Triple-mode DES The block cipher Data Encryption Standard (DES) was developed by IBM in 1974 in response to a U.S. government invitation for an encryption algorithm. A block-cipher is basically an algorithm that encrypts data in blocks 4. However, DES s key strength of 56 bits became a cause for concern as computing speed and technology evolved. Subsequently, multiple modes of operation were developed as a way of improving the strength of block-ciphers, in particular DES. A multiple-mode of operation is one that consists of multiple modes of operation chained together, each having its own key. There are many modes of operation but the ones we are concerned with are the Electronic Code Book (ECB) and Output FeedBack (OFB) mode. The ECB mode of operation is the simplest way to use a block cipher. The data is divided into blocks and encrypted by the cipher one at a time. Therefore, two identical plaintext values will give the same ciphertext. The OFB mode of operation creates a stream of values by repeatedly encrypting a single initial value. This stream of values is then XORed with the input plaintext to produce the ciphertext. In this case, two similar plaintext values could yield different ciphertexts. 3 small models contain 8-12 atoms per object type 4 usually 64-bits 43

44 In 1994, Eli Biham showed that many multiple modes were much less secure than expected [4] and in 1996, Biham extended these results to conclude that all the triplemodes of operation were theoretically not much more secure than a single encryption [5], except the ECB ECB ECB mode. We now focus on the ECB ECB OFB triplemode of operation and show how Alloy managed to find the same cryptanalytic attack that Biham did. plaintext In this triple-mode of operation, the plaintext is encrypted with two ECB DES ciphers in series before it is fed into the OFB DES stream for a third encryption. The ECB ECB OFB triple-mode schematic is shown in Figure 5-2: ECB ECB plaintext ECB plaintext ECB ECB ECB OFB OFB OFB ciphertext ciphertext ciphertext Figure 5-2: ECB ECB OFB triple-mode To model this triple-mode of operation in Alloy, we had to model i) encryption, ii) XOR, iii) the modes of operation and iv) the chaining of inputs to outputs between the various components. We have already modeled encryption earlier so let us proceed with the XOR operation. First, we defined a signature Value, a set of atoms with type Value, within which we defined the xor relation containing triples of the form (v 1, v 2, v 3 ), where each v is of type Value and v 1 v 2 = v 3. sig Value { xor : Value one -> one Value } 44

45 Next, we had to define the three properties of XOR: all b : Value xor.b = ~(xor.b) the commutativity property states that v 1 v 2 = v 2 v 1 for any v 1, v 2. The relation xor.b is a binary relation (v 1, v 2 ) representing the pairs of values v 1 and v 2 that XOR to give b. By saying that xor.b is equal to its transpose, we are enforcing the constraint that if (v 1, v 2 ) exists in xor.b, then so must (v 2, v 1 ). This succinctly captures the required property. all a, b, c : Value xor.c.(xor.b.a) = xor.(xor.c.b).a the associativity property states that (v 1 v 2 ) v 3 = v 1 (v 2 v 3 ) for any v 1, v 2, v 3. The expression xor.c.b evaluates to the Value a such that a b = c. By putting the parenthesis in the correct places in the expression, we state the property as we would have done mathematically. one identity : Value no identity.xor - iden the identity is the special value such that identity v = v for any v. The expression identity.xor represents the XOR function for exactly one chosen value one identity. The iden constant represents the universal identity relation, which is a binary relation containing tuples of the form (a, a) for any atom a defined in the Alloy model. Therefore, the identity statement says that the identity s XOR relation with the all identity tuples removed is empty, which translates to saying that the identity XOR relation only contains identity tuples. Writing identity.xor = iden would be incorrect since this means that the XOR function must contain the identity tuples over atoms not defined in the xor relation. Now, let us proceed to modeling the DES blocks and their input/output behaviour. We defined a set of blocks sig Block, where each block contains a plaintext p, an output feedback stream value ofb and an output ciphertext c. Next, we imposed an ordering on the blocks with the general util/ordering module packaged in Alloy. The module also provides operations (first(), last(), next(x), prev(x)) 45

46 to help us navigate through the ordering: first() and last() return the first and last elements of the ordering while next(x) and prev(x) return the element after or before the element x. open util/ordering[block] sig Block { p: one Value, ofb: one Value, c: one Value } Next, we proceeded to describe the data flow within these blocks. Since we were only attacking the ECB OFB boundary, we used a single ECB encryption to represent the two ECB encryptions in series. This is actually equivalent if the single ECB encryption uses a key of twice the original length. The block s ciphertext is the result of the encryption of its plaintext under K1, a Key atom, XORed with the OFB value of the block. Also, the OFB value of the next block is the encryption of this block s OFB value under the OFB key. fact { //the ciphertext is a function of the plaintext and the ofb value all b: Block b.c = (K1.enc[b.p]).xor[b.ofb] } // setting up the OFB sequence through the blocks all b: Block-last() let b = next(b) b.ofb = OFB.enc[b.ofb] Finally, let us consider the cryptanalytic attack on this triple-mode of operation, as presented by Biham [5]. Assuming that the block size is 64-bits, providing the same 64-bit value v as plaintext 2 64 times will result in 2 64 ciphertext values of the form: c i = {{v} K1 } K2 o i 46

47 where the braces {} represent encryption, o i is the OFB stream value such that o i+1 = {o i } KOFB and K1,K2 are the 2 ECB encryption keys. This basically allows us to isolate the OFB layer of encryption since it is equivalent to having a complete OFB stream XORed forward by a single value {{v} K1 } K2. This means that it should only take 2 64 amount of work to break the key used in the OFB layer. After peeling off the OFB layer, we can then proceed to break the double ECB layer by the standard meet-in-the-middle attack [23]. To break the OFB layer, we do the following: 1. choose an arbitrary value u. 2. encrypt u under all possible keys and get u 1 = {u} k and u 2 = {u 1 } k. 3. then check that u u 1 = c c 1 and u 1 u 2 = c 1 c 2 for a consecutive triple of ciphertext c, c 1, c 2. On average, we need to try about 2 64 /period times, where period is the expected period of the OFB stream, which is expected to be small. To observe that the cryptanalytic attack was successful, the sequence of ciphertexts must meet the following property: c i, o j (c i c i+1 = o j o j+1 ) (c i+1 c i+2 = o j+1 o j+2 ) where (c i, c i+1, c i+2 ) is a consecutive triple of ciphertexts and (o i, o i+1, o i+2 ) is a consecutive triple of OFB stream values. We write this in Alloy as: pred FindAttack() { // the set contains the relevant properties that two sequential XORs // of outputs match two sequential XORs of OFB encryptions all b : Block - first() - last() let b=prev(b ), b =next(b ) some d:ofb.enc.value let d =OFB.enc[d], d =OFB.enc[d ] b.c.xor[b.c] = d.xor[d ] && b.c.xor[b.c] = d.xor[d ] } 47

48 run FindAttack for 8 but exactly 2 Key The full model can be found in the Appendix A.1. Using this model, Alloy was able to verify the attack with a scope of 8 atoms in 2 minutes and 25 seconds. However, it is important to note that the ciphertext property in our model could have been achieved with a few different sets of input plaintexts. Since Alloy works non-deterministically, it could have found any one of those sets. While the attack on the triple-mode block-cipher showed Alloy s native power in dealing with mathematical properties in its relational logic, the following attack on PIN guessing demonstrates Alloy s ability to work with sets PIN Decimalisation To use Automated Teller Machines (ATMs), customers just have to swipe a debit card and punch in a PIN. In 2003, Bond et al. found that some banking security modules using decimalisation tables to generate PINs were susceptible to what they termed the PIN-decimalisation attack [16] [7], allowing the attacker to guess the customer s PIN in 15 tries, thereby subverting the primary security mechanism against debit card fraud. To understand the vulnerability, we must first understand a PIN generation algorithm that uses decimalisation tables. The PIN-derivation algorithm used in IBM s banking modules is as follows: 1. calculate the original PIN by encryption the account number with a secret PIN generation DES key 2. convert the resulting ciphertext into hexadecimal 3. convert the hexademical into a PIN using a decimalisation table. The actual PIN is usually the first 4 or 6 numerals of the result. The standard decimalisation table is shown below: 48

49 ABCDEF During decimalisation, the hexadecimal values on the top line are converted to the corresponding numerical digits on the second line. Some customers may choose to change their PINs. In this case, the numerical offset between the changed PIN and the real PIN is kept on the card and in the mainframe database. When the customer enters a PIN, the ATM subtracts this offset before checking it against the decimalised PIN. The IBM Common Cryptographic Architecture (CCA) [12] is a financial API implemented by the IBM 4758, a hardware security module used in banking ATM networks. The CCA PIN verification command, Encrypted PIN Verify allows the user to enter a decimalisation table, the primary account number (PAN), also known as the bank account number, and an encrypted PIN. The encrypted PIN is first decrypted and then compared to the PIN generated from the PAN using the generation algorithm described earlier. The command returns true if the PINs match, false otherwise. Although we do not have access to the PIN encryption key, any trial PIN can be encrypted using the CCA command Clear PIN Encrypt. The attack, as presented in Bond et al. [16], can be broken down into two stages: the first determines which digits are present in the PIN and the second determines the true PIN by trying all possible digit combinations. The first stage hinges on the vulnerability that CCA developers allowed clients to specify the decimalisation table. Instead, they should have hard-coded the standard decimalisation table into the commands. We will present a simpler version of the attack against a single-digit PIN, thereby eliminating the need for the second stage. To determine the digit present in the PIN, we pass in the desired PAN, an encrypted trial PIN of 0 and a binary decimalisation table, where all entries are 0 except the entry for hex-digit i, which is 1. The trial PIN of 0 will fail to verify only if the encrypted PAN (before decimalisation) contains hex-digit i. However, instead of engineering this attack into our model, we would like for Alloy to find this attack. Therefore, we set up the system by modeling the 49

50 attacker s decimalisation and trial PIN inputs and his actions after getting an answer from the Encrypted PIN Verify command. First, we defined a set ofnum (numbers) and HEX (hexadecimal) atoms, as well as a singleton set TrueHEX representing the correct pre-decimalised PIN digit. Next, we defined a set ofstate atoms, where in each state the user performs anencrypted PIN Verify API call using the decimalisation table dectab, which is a mapping from HEX to NUM and the trial PIN trialpin. Here, we abstracted away the fact that the input PIN was encrypted. Also, guess represents the set of possible hexadecimal guesses that the attacker has to choose from in each state. As API calls are made, this set should decrease in size until the attacker uncovers the real hexadecimal value. He can then compare this value against the standard decimalisation table (shown earlier) to find the real PIN. open util/ordering[state] // set of numbers sig NUM{} // set of hexadecimal digits sig HEX{} // one hex digit representing the one sig TrueHEX extends HEX {} // in each state, the attack picks a trialpin, a decimalisation table // and has a set of guesses of what the pre-decimalised PIN is sig State { trialpin: one NUM, dectab: HEX -> one NUM, guess: set HEX } 50

51 Next, we modeled the event of the attacker choosing a particular number as a trial PIN. We used a predicate to constrain the possible guesses guess in two states s, s based on the choice of trial PINs in state s. If the trial PIN verifies, then the predecimalised PIN must contain the hexadecimal values corresponding to the trial PIN in the given decimalisation table, therefore, we intersect the attacker s current possibility set with those values. Otherwise, we intersect the attacker s current possibility set with the hexadecimal values corresponding to the numbers other than the trial PIN in the decimalisation table. In either case, the attacker is gaining knowledge about the value of the pre-decimalised PIN. pred choose(s,s : State) { // if trial PIN verifies ( (s.trialpin = s.dectab[truehex]) && (s.guess = s.guess & s.dectab.(s.trialpin)) ) // if trial PIN does not verify ( (s.trialpin!= s.dectab[truehex]) && (s.guess = s.guess & s.dectab.(num-(s.trialpin))) ) } Finally, we set up the attack by defining transition operations across the states. Here, we defined a predicate init on a state to set the possibility set to that of all the hexadecimal digits. The predicate Attack is meant to simulate the attack as the attacker makes the API calls with different decimalisation tables and trial PIN choices. The predicate defines a transition relation between each state and the next and also sets constraints on the last state that the correct value be found. Then, by running this Attack predicate, Alloy will find an instance satisfying all the constraints in Attack which also means satisfying the init and choose predicates. pred init(s:state) { // in the first state s, the attacker has no information, // so his set of guesses covers all the hexadecimal values 51

52 s.guess = HEX } pred Attack () { // initialize on the first state atom init(first()) // for all states other than the last, the attacker makes // a choice after issuing an API call all s:state-last() let s =next(s) choose(s,s ) } // in the last state, the attacker finds the correct HEX #(last().poss) = 1 Alloy was able to find an instance of the attack for 6 State atoms, 16 HEX atoms and 10 NUM atoms in 28 seconds. The last two constraints in the Attack predicate demonstrate that some creative constraining is required to force Alloy not to find trivial instances. Without those constraints, Alloy would have allowed the attacker to discover the correct value on the very first try, a very lucky scenario that is unlikely to happen. This is an instance of a more general difficulty with constraint modeling while it is easy for Alloy to reason about the worst or best-case scenarios, it is extremely challenging to model the notion of an average case. In this example, Alloy was able to find the attack but did not choose the decimalisation tables as we did which would guarantee the best performance on average. 52

53 Chapter 6 Trusted Platform Module As software on computing platforms get increasingly complex, the number of security bugs also increases. Subsequently, protecting sensitive data through a software-only security mechanism suffers from two inherent weaknesses. First, large software will have implementation bugs and therefore can potentially be attacked. Second, it may be impossible to tell if software has been attacked. In fact, experts in information security have concluded that some security problems are unsolvable without a bootstrap to protected hardware [22]. The Trusted Platform Module (TPM) is the Trusted Computing Group s (TCG) [1] answer to this problem. In this chapter, we present an overview of the TPM. 6.1 History of Trusted Computing Prior to 1999, although there were a number of secure coprocessors (separate shielded computing units like the IBM 4758) that were able to attest to the integrity of their software, there were no full computing platform security solutions that could do the same. Subsequently, users were assured of security only by blindly trusting that the platform was doing what it was supposed to do. In 1999, the Trusted Computing Platform Alliance (TCPA) was formed to address this issue at two levels: increasing the security of platforms and enabling trust mechanisms between platforms. This led to the development of Trusted Platforms. Later in 2003 the TCPA became the 53

54 Trusted Computing Group (TCG), which took on the necessary burden of encouraging worldwide industry participation in such an endeavour Trusted Platforms Under the TCG s definition of trust, Trusted Platforms are platforms that can be expected to always behave in a certain manner for an intended purpose. Furthermore, the users do not have to make this decision blindly but rather can request for the platform to prove its trustworthiness by asking for certain metrics and certificates. A Trusted Platform should provide at least these basic features: Protected Capabilities: the ability to perform computation and securely store data in a trustworthy manner. Integrity Measurement: the ability to trustworthily measure metrics describing the platform s configuration. Attestation: the ability to vouch for information. In essence, these features mean that a Trusted Platform should be protected against tampering, be able to attest to the authenticity and integrity of platform code and data, perform guarded execution of code and maintain the confidentiality of sensitive information. To establish trust, a Trusted Platform can reliably measure any metric about itself and attest to it. Some useful metrics include the software loaded on the platform and any device firmware. The user will then need to verify these metrics against trusted values obtained separately to decide if this platform is trustworthy. Immediately, we can see that these abilities can enable a variety of new applications Use Scenarios A solution enabling clients or servers to trust one another s software state gives hope to many commercial applications. A Digital Rights Management (DRM) solution is possible since servers can have confidence that client machines are not subverting the media policies. Auctions and various fair-games like online poker could leverage 54

55 on server-client trust relationships to execute fair game play. Massive computation requiring confidentiality can be outsourced to computing grids, as long as the grid attests that it will not reveal any secrets. The TCG s solution to turning regular computing platforms into Trusted Platforms resulted in the development of a specification known as the Trusted Platform Module. 6.2 TPM Design The TCG specification for this single tamper-resistant chip comprises both informative comments and normative statements but does not come with an implementation. Rather, it is up to the individual chip manufacturers to work with the specification requirements Features Conceptually, the TPM will create three Roots of Trust on its parent platform that are used to effect trust and security mechanisms: Root of Trust for Measurement (RTM): reliably measures any user-defined metric of the platform configuration. The RTM starts out as trusted code in the motherboard s Boot ROM but extends its trust domain during system boot to the entire platform through the process of inductive trust. In this process, the RTM measures the next piece of code to be loaded, checks that the measurement is correct and then transfers control. This process of extending the trust domain continues until the a trusted operating system is booted. Root of Trust for Reporting (RTR): allowed to access protected locations for storage, including the Platform Configuration Registers (PCRs) and nonvolatile memory, and also attests to the authenticity of these stored values using signing keys. PCRs are storage registers that not only store values but also the order in which the values were put in. 55

56 Root of Trust of Storage (RTS): protects keys and sensitive data entrusted to the TPM. The RTS basically RAM refers to all the key BootROM management functionality, (RTM) including key generation, key management, encryption and guarded decryption. CPU Controller (RTR,RTS) TPM Display Devices Figure 6-1: Roots of trust in a TPM-enabled PC architecture Figure 6-1 shows how the TPM fits into a regular PC architecture and where the various roots of trust reside. These roots of trust provide the means of knowing whether a platform can be trusted. It is up to the client to decide whether the platform should be trusted. If the client trusts the credential certifying the correct manufacturing of the platform, the RTM, RTR and the RTS, then he can trust the values measured by the RTM, stored by the RTS and reported by the RTR. The client must then compare these reported values against other credentials (e.g. those issued by a trusted third party) to determine if the platform state is trustable Trusted Platform Module API In section, we will introduce the TPM s API by describing it in the order that a typical user would encounter them when using a TPM-enabled platform. An owner of a brand new TPM must first take ownership of the chip. This involves setting an owner password and the TPM generating for itself a storage root key (SRK) and a master secret (tpmproof). These secrets enable much of the TPM s functionality. This 56

Security Architecture for the TEAMDEC System

Security Architecture for the TEAMDEC System Security Architecture for the TEAMDEC System Haiyuan Wang Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree

More information

A Semantical Perspective on Verification of Knowledge

A Semantical Perspective on Verification of Knowledge A Semantical Perspective on Verification of Knowledge Paul Leemans, Jan Treur, Mark Willems Vrije Universiteit Amsterdam, Department of Artificial Intelligence De Boelelaan 1081a, 1081 HV Amsterdam The

More information

A Survey of Techniques for Security Architecture Analysis

A Survey of Techniques for Security Architecture Analysis A Survey of Techniques for Security Architecture Analysis Brendan Lawlor and Linh Vu Information Networks Division Information Sciences Laboratory DSTO-TR-1438 ABSTRACT This technical report is a survey

More information

A Tutorial on Physical Security and Side-Channel Attacks

A Tutorial on Physical Security and Side-Channel Attacks A Tutorial on Physical Security and Side-Channel Attacks François Koeune 12 and François-Xavier Standaert 1 1 UCL Crypto Group Place du Levant, 3. 1348 Louvain-la-Neuve, Belgium

More information


NEER ENGI ENHANCING FORMAL MODEL- LING TOOL SUPPORT WITH INCREASED AUTOMATION. Electrical and Computer Engineering Technical Report ECE-TR-4 NEER ENGI ENHANCING FORMAL MODEL- LING TOOL SUPPORT WITH INCREASED AUTOMATION Electrical and Computer Engineering Technical Report ECE-TR-4 DATA SHEET Title: Enhancing Formal Modelling Tool Support with

More information

Distributed Secure Systems: Then and Now

Distributed Secure Systems: Then and Now Distributed Secure Systems: Then and Now Brian Randell * and John Rushby** * School of Computing Science Newcastle University Newcastle upon Tyne NE1 7RU, UK ** Computer Science

More information

Profiling Deployed Software: Assessing Strategies and Testing Opportunities. Sebastian Elbaum, Member, IEEE, and Madeline Diep

Profiling Deployed Software: Assessing Strategies and Testing Opportunities. Sebastian Elbaum, Member, IEEE, and Madeline Diep IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 8, AUGUST 2005 1 Profiling Deployed Software: Assessing Strategies and Testing Opportunities Sebastian Elbaum, Member, IEEE, and Madeline Diep Abstract

More information

Why Johnny Can t Encrypt: A Usability Evaluation of PGP 5.0

Why Johnny Can t Encrypt: A Usability Evaluation of PGP 5.0 Why Johnny Can t Encrypt: A Usability Evaluation of PGP 5.0 Alma Whitten School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 J. D. Tygar 1 EECS and SIMS University

More information

The Identity Crisis Security, Privacy and Usability Issues in Identity Management

The Identity Crisis Security, Privacy and Usability Issues in Identity Management Preprint Id: identity-crisis-body.tex 1355 2011-01-02 14:00:45Z jhh The Identity Crisis Security, Privacy and Usability Issues in Identity Management Gergely Alpár Jaap-Henk Hoepman Johanneke Siljee January

More information

Logical Cryptanalysis as a SAT Problem

Logical Cryptanalysis as a SAT Problem Journal of Automated Reasoning 24: 165 203, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 165 Logical Cryptanalysis as a SAT Problem Encoding and Analysis of the U.S. Data Encryption

More information

The Predecessor Attack: An Analysis of a Threat to Anonymous Communications Systems

The Predecessor Attack: An Analysis of a Threat to Anonymous Communications Systems The Predecessor Attack: An Analysis of a Threat to Anonymous Communications Systems MATTHEW K. WRIGHT, MICAH ADLER, and BRIAN NEIL LEVINE University of Massachusetts Amherst and CLAY SHIELDS Georgetown

More information

Hypercomputation: computing more than the Turing machine

Hypercomputation: computing more than the Turing machine Hypercomputation: computing more than the Turing machine Abstract: Toby Ord Department of Philosophy * The University of Melbourne In this report I provide an introduction to

More information

Pellet: A Practical OWL-DL Reasoner

Pellet: A Practical OWL-DL Reasoner Pellet: A Practical OWL-DL Reasoner Evren Sirin a, Bijan Parsia a, Bernardo Cuenca Grau a,b, Aditya Kalyanpur a, Yarden Katz a a University of Maryland, MIND Lab, 8400 Baltimore Ave, College Park MD 20742,

More information

Enhanced Security Models for Network Protocols

Enhanced Security Models for Network Protocols Enhanced Security Models for Network Protocols by Shabsi Walfish A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science

More information

Data Abstraction and Hierarchy

Data Abstraction and Hierarchy Data Abstraction and Hierarchy * This research was supported by the NEC Professorship of Software Science and Engineering. Barbara Liskov Affiliation: MIT Laboratory for Computer Science Cambridge, MA,

More information

Understanding and Selecting a Tokenization Solution

Understanding and Selecting a Tokenization Solution Understanding and Selecting a Tokenization Solution Understanding and Selecting a Tokenization Solution 1 Author s Note The content in this report was developed independently of any sponsors. It is based

More information

A Security Architecture for Object-Based Distributed Systems

A Security Architecture for Object-Based Distributed Systems A Security Architecture for Object-Based Distributed Systems Bogdan C. Popescu Vrije Universiteit Amsterdam, The Netherlands Maarten van Steen Vrije Universiteit Amsterdam, The Netherlands

More information

All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but might have been afraid to ask)

All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but might have been afraid to ask) All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis Avgerinos, David Brumley Carnegie Mellon University

More information



More information

What are requirements?

What are requirements? 2004 Steve Easterbrook. DRAFT PLEASE DO NOT CIRCULATE page 1 C H A P T E R 2 What are requirements? The simple question what are requirements? turns out not to have a simple answer. In this chapter we

More information

Managing Trust in a Distributed Network

Managing Trust in a Distributed Network INSTITUT FÜR INFORMATIK DER TECHNISCHEN UNIVERSITÄT MÜNCHEN Diplomarbeit Managing Trust in a Distributed Network Bearbeiter: Aufgabensteller: Betreuer: Benedikt Elser Prof. Dr. Heinz-Gerd Hegering Helmut

More information

A Security Architecture for Mobile Agent System

A Security Architecture for Mobile Agent System A Security Architecture for Mobile Agent System by PENG FU B.Eng., Tsinghua University China, 1999 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE

More information

A Usability Study and Critique of Two Password Managers

A Usability Study and Critique of Two Password Managers A Usability Study and Critique of Two Password Managers Sonia Chiasson and P.C. van Oorschot School of Computer Science, Carleton University, Ottawa, Canada Robert Biddle Human

More information

Using Root Cause Analysis to Handle Intrusion Detection Alarms

Using Root Cause Analysis to Handle Intrusion Detection Alarms Using Root Cause Analysis to Handle Intrusion Detection Alarms Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften der Universität Dortmund am Fachbereich Informatik von Klaus Julisch

More information

Engineering Privacy by Design

Engineering Privacy by Design Engineering Privacy by Design Seda Gürses, Carmela Troncoso, and Claudia Diaz K.U. Leuven/IBBT, ESAT/SCD-COSIC Abstract. The design and implementation of privacy requirements

More information

C3P: Context-Aware Crowdsourced Cloud Privacy

C3P: Context-Aware Crowdsourced Cloud Privacy C3P: Context-Aware Crowdsourced Cloud Privacy Hamza Harkous, Rameez Rahman, and Karl Aberer École Polytechnique Fédérale de Lausanne (EPFL),,

More information

Data protection. Protecting personal data in online services: learning from the mistakes of others

Data protection. Protecting personal data in online services: learning from the mistakes of others Data protection Protecting personal data in online services: learning from the mistakes of others May 2014 Contents Introduction... 2 What the DPA says... 4 Software security updates... 5 Software security

More information

CLoud Computing is the long dreamed vision of

CLoud Computing is the long dreamed vision of 1 Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data Cong Wang, Student Member, IEEE, Ning Cao, Student Member, IEEE, Kui Ren, Senior Member, IEEE, Wenjing Lou, Senior Member,

More information

A Process for COTS Software Product Evaluation

A Process for COTS Software Product Evaluation A Process for COTS Software Product Evaluation Santiago Comella-Dorda John Dean Grace Lewis Edwin Morris Patricia Oberndorf Erin Harper July 2004 TECHNICAL REPORT ESC-TR-2003-017 Pittsburgh, PA 15213-3890

More information

Designing workflow systems

Designing workflow systems TECHNISCHE UNIVERSITEIT EINDHOVEN Department of Mathematics and Computing Science MASTER S THESIS An algorithmic approach to process design and a human oriented approach to process automation by Irene

More information