BARF: A multiplatform open source Binary Analysis and Reverse engineering Framework
|
|
|
- Edward Webb
- 9 years ago
- Views:
Transcription
1 BARF: A multiplatform open source Binary Analysis and Reverse engineering Framework Christian Heitman and Iván Arce Fundación Sadosky, {cnheitman,iarce}@fundacionsadosky.org.ar Abstract. The analysis of binary code is a crucial activity in many areas of the computer sciences and software engineering disciplines ranging from software security and program analysis to reverse engineering. Manual binary analysis is a difficult and time-consuming task and there are software tools that seek to automate or assist human analysts. However, most of these tools have several technical and commercial restrictions that limit access and use by a large portion of the academic and practitioner communities. In this paper we introduce BARF, an open source binary analysis framework that aims to support a wide range of binary code analysis tasks that are common in the information security discipline. BARF is a scriptable platform that supports instruction lifting from multiple architectures, binary translation to an intermediate representation, an extensible framework for code analysis plugins and interoperation with external tools such as debuggers, SMT solvers and instrumentation tools. The framework is designed primarily for humanassisted analysis but it can be fully automated. 1 Introduction 1.1 Problem Description The ability to inspect a program and to understand how it works is often a strict requirement in many common practices of the information security discipline. Among other things, program analysis is of critical importance to classify programs as benign or malicious, to identify and cluster together semantically equivalent malicious programs in real time attack detection, to discover software security vulnerabilities in benign programs, to determine the set of input data that could trigger a known bug in the program and to assess whether a vulnerability is exploitable. There is a range of program analysis techniques with a relatively long history of research and development in the computer sciences field, such as model checking, type and effect systems, abstract interpretation, control and data flow analysis and constrain satisfaction problem solving. While these techniques have been regularly applied to security-oriented static and dynamic program analysis with varying degree of success, the vast majority of the existing work is focused on, or relies in the availability of, the source code of the program to be analyzed.
2 However, today s Information and Communications Technologies (ICT) ecosystem is characterized by a significant dependency on software systems for which the source code is not readily available to parties with a legitimate interest to determine their security properties. As a result, the tools and techniques for security-oriented program analysis that are available and can be applied by practitioners to real-world scenarios is limited. The state-of-the-art techniques developed in the academic world are usually not directly applicable to binary analysis due to a lack of suitable tools for the purpose or to their lack of effectiveness or efficiency when applied to COTS software components. Generally, publicly available open source binary analysis tools offer a limited set of capabilities and target a small number of architectures (x86, x86-64, ARM) and operating systems (Windows, Linux, OS X). On the other hand, the relatively low number of commercially available tools that are adequate for multi-architecture, multi-platform binary program analysis, such as IDA [7] and Hopper [8], come with a number of licensing restrictions and price tags that make large scale adoption by information security practitioners an onerous task. In this paper we present BARF, an extensible open source framework for multi and cross platform binary code analysis, that seeks to provide end to end coverage for all the common program analysis tasks performed by information security practitioners and reverse engineers. The design of BARF encompasses file format recognition and parsing, instruction lifting, binary translation and representation of the program to be analyzed using an intermediate language. The core analysis algorithms are implemented on top of the intermediate language, which makes them architecture- independent, facilitates reuse and reduces the cost of adding support for new architectures. This approach differs from most existing tools, which are tied to specific architectures and operating systems. 1.2 Contributions In this paper we present BARF, a cross-platform binary analysis framework. Our main contributions can be summarized as follows: We provide a design for an extensible platform-independent binary analysis framework. It operates on an intermediate representation of the binary code resulting in a cross-platform framework by design. We implement the following core functionalities: a) intermediate language support (REIL) b) an IR emulator, c) integration with a SMT solver and d) translation between intermediate language to SMT expressions. We present a design that allows analysts to model a piece of binary code as formal formulae and to reason about it very easily. We give support for the x86 architecture. Support for other architectures can be added easily to the framework as it was designed with that goal. We implement two analysis modules that are architecture-independent: a) a module that generates a graph of Basic Blocks of a given binary and b) a code analyzer that resolves constraints on code fragments and checks path satisfiability.
3 We built a tool on top of the framework which is a complex example of what can be achieved. It finds ROP gadgets[18] on binary code, classifies them in different types and verifies their semantic. The framework is open source and can be downloaded from a public repository at 2 Binary Analysis The two main approaches in binary analysis are static binary analysis and dynamic binary analysis. In the first case, the analysis is done without executing the program under analysis. The first step of the analysis involves reading and interpreting the format of the binary program such as ELF, PE, Mach-O, etc. This provides information about the file which includes the location of its different sections, e.g., data and text sections. Once the text section is located, the disassembly process may begin. This task is not straight forward. Some architectures, for instance, x86, have variable length instructions which makes the process tricky. Moreover, assembly code lacks structure, meaning code and data can be mixed together which complicates the process even further. The last step consists of translation to the intermediate representation language over which analysis algorithms are applied. In contrast, dynamic analysis requires running the program in order to obtain an execution trace. This is accomplished through dynamic binary instrumentation. There is a lot of information that can be obtained through this process, for instance, the value of each register at each step of the program execution, the list of system calls invoked by a program, etc. Then, the trace is analyzed with offline methods. Limited path coverage is one of the main disadvantages of this technique as only a small subset of all possible paths are exercised during normal execution. 2.1 BARF The framework was designed with the following goals in mind: Cross-platform support : It should be able to analyze binary programs compiled for any available platform (architecture + operating system.) Extensibility : It should be simple to add support for new architectures. User-oriented : It should provide means for users to create their own analysis tools or adapt existing ones using the framework. Two of the primary goals of BARF are: a) cross-platform support and b) extensibility, i.e. easy support for new architectures. Both goals are tackled by the use of an intermediate representation language for assembly code. This way, it is possible to abstract architecture-dependent details and work on a common ground. It also encapsulates these details and provides a common interface to the architecture so extensibility to new ones is straightforward.
4 The criteria to determine the intermediate representation language to use was based on a few desirable properties: a) expressiveness, it should be able to model any architecture; b) side-effect free, instructions should change program state explicitly (and in just one way) and c) simplicity, it should be a reduced instruction set so analysis and implementation could be done easily. These are discussed in section System Overview Figure 1 depicts the architecture of the framework. It is divided in three main components: Core, Arch and Analysis. The first contains essential modules. The second, provides functionality specific to supported architectures. The last component contains architecture-independent analysis algorithms. BARF x86 Arch REIL SMT BI Core Basic Block Code Analyzer Analysis Debugger Command Line Script Fig. 1: Architecture of BARF. The framework relies on external libraries for machine code disassembly and executable format parsing, Capstone [3] and PyBFD [16], respectively. Core Component It is divided into the following modules: REIL, SMT and BI. These are platform independent and are the building blocks of the rest of the components. The REIL module provides definitions for the REIL language. Also, it implements an emulator and a parser (which is used extensively in instruction translation.) The SMT module consists of two parts. The first one, SMTLIB, a generic interface to SMT solvers (based on PySymEmu [20] SMT support; we currently use Z3 [9] as solver) at the lowest level. It is used to create variables and set assumptions. Also, it can check satisfiability on different sets of formulas. The second part is a module that translates REIL to SMT expressions. Finally, the BI (Binary Interface) module is responsible for loading binary files for processing. Arch Component Each supported architecture is provided as a subcomponent and is structured in the same way (thus, simplifying framework extension to new
5 architectures). At the time, we provide support for the Intel x86 architecture (both, 32 and 64 bits). Each subcomponent provides five well-defined modules. The first describes the architecture, i.e., registers, memory address size and endianness. The second describes the instruction set including information such as number of operands, implicit operands, flags that modifies, etc. The third is the disassembling module which, in this case, is just an interface with the disassembling library. Then comes the parser module, that receives a string with disassembled instructions and produces a series of an annotated objects that describe them. Finally, the translation module provides functionality to transform each assembly instruction into a semantically equivalent sequence of REIL instructions. Analysis Component This component includes two sample platform-independent modules: Basic Block and Code Analyzer, which will be described in section REIL REIL (Reverse Engineering Intermediate Language) [10] meets the properties discussed in section 2.1. Other IRs were considered, particularly, LLVM IR. We choose REIL, primarily, because of its simplicity. It worth mentioning that existing research on machine code to LLVM IR translation [5,12] was reviewed but the language was judged much more complex than needed for this project. REIL is a platform-independent intermediate representation of assembly code. Originally designed for static code analysis, it is a low-level language that consists only of 17 instructions. They are grouped in five categories: a) arithmetic, b) bitwise, c) data transfer, d) conditional and c) miscellaneous. Each instruction has exactly three operands, the first two for sources and the third for the destination. There are three type of operands: Integer Literals, Registers and REIL addresses. Registers are string literals, for example, t0 or eax. REIL addresses are composed of two integer parts separated by a period, for example, The first integer represents the address of the instruction from the source architecture while the second is used as REIL instruction numbering (as one instruction can, and usually is, mapped to more than one REIL instruction.) Some instructions require less than three operand, in that case, a special operand type is used, the Empty operand. As already mentioned, there is a mapping of one to many when translating from the source architecture to REIL as REIL instructions have just one effect on program state. For example, the x86 add instruction translation includes the addition operation itself and flags modifications, such as the carry flag (CF.) 3.2 SMT Another aspect of the framework involves interfacing with existing SMT solvers. SMT (Satisfiability Modulo Theories) [9] extends the SAT problem with support for higher level theories, for example, bit-vector arithmetic. These concepts
6 allows to model semantics of assembly code quite naturally. Consequently, we can reason about code in a simple way. SMT solvers are very much in use in software security [21] both in static and dynamic analysis. They can be used for vulnerability checking, exploit generation and much more. REIL can be represented by SMT expression easily as it is a reduced instruction set and each instruction has a single explicit effect on program state. Therefore, translation between both worlds can be accomplished almost directly. Table 1 shows an example of a basic translation. Each register is modeled as a symbol, in this case, a bitvector and memory is modeled as an array of bitvectors. x86 instruction REIL instruction SMT expression mov eax, [ebp+0x8] add [DWORD ebp, DWORD 0x8, DWORD t2] (= t2_1 (bvadd ebp_0 #x )) ldm [DWORD t2, EMPTY, DWORD t1] (= (concat (select MEM (bvadd t2_1 #x )) (select MEM (bvadd t2_1 #x )) (select MEM (bvadd t2_1 #x )) (select MEM (bvadd t2_1 #x ))) t1_1) str [DWORD t1, EMPTY, DWORD eax] (= t1_1 eax_1) add ebx, eax add [DWORD ebx, DWORD eax, QWORD t3] (= t3_1 (bvadd ebx_0 eax_1)) str [QWORD t3, EMPTY, DWORD ebx] (= ebx_1 t3_1) Table 1: REIL to SMT expressions translation example. Note that the translation of add ebx,eax includes REIL instructions for the computation of the flags. In this case, they were omitted for the sake of brevity. 4 Analysis Modules 4.1 Basic Blocks This modules provides functionality for control flow graph recovery (CFG.) It operates on the intermediate language (as all modules within this component,) therefore, algorithms implemented here are architecture-independent. It provides two main ways for CFG recovery [22]: linear sweep and recursive descendant. Additional algorithms can be easily implemented. 4.2 Code Analyzer This module provides functionality for basic code analysis. It relies on the SMT support provided by the Core component. Instructions can be added to the context of the analyzer, which are taken as assumptions, and then add preand post-conditions on variables (memory locations and/or registers) and ask the solver for satisfiability. All the burden of assembly to REIL translation and REIL to SMT expression is taken care of transparently by the framework. In figure 2 we can see the disassembly code of a function that operates on local variables. In this function there are two memory accesses at addresses
7 ed push ebp ee mov ebp, esp f0 sub esp, 0x f3 mov eax, dword [ ebp 0x8 ] f6 mov edx, dword [ ebp 0xc ] f9 add eax, edx fb add eax, 0x f e mov dword [ ebp 0x4 ], eax mov eax, dword [ ebp 0x4 ] l e a v e r e t ed push ebp ee mov ebp, esp f0 sub esp, 0x f3 mov dword [ ebp 0x10 ], 0x f a cmp dword [ ebp 0xc ], 0 x j n e 0 x804841c cmp dword [ ebp 0x8 ], 0 x a j n e 0 x804841c c cmp dword [ ebp 0x4 ], 0 xabcdef j n e 0 x804841c mov dword [ ebp 0x10 ], 0 x c mov eax, dword [ ebp 0x10 ] f l e a v e r e t Fig. 2: Assembly code of a function that operates on local variables. Fig. 3: Assembly code of a function with multiple branches. ebp-0x8 and ebp-0xc, respectively (lines 4-5.) These values are then used in some calculations (two successive additions, lines 6-7) and the result is stored in memory (line 8) and returned in the eax register (line 9). One possible question that arises is what should be the values of the local variables so the function returns a specific (desired) value. Figure 4 shows how to do exactly this. First, it loads the instruction to be analyzed (lines 2-5). Then, it models registers and memory locations involved in the operation (lines 8-12) and establishes conditions that should hold (lines 15-19). Finally, it checks for satisfiability (line 22). If all conditions are satisfiable, it asks for possible values for the registers and memory (lines 24-27). In this example, the constraint set is satisfiable and one possible solution is: a = 3 and b = 5. 1 # Add i n s t r u c t i o n s to a n a l y z e 2 translation = barf. translate (0 x80483ed, 0x ) 3 f o r addr, a s m i n s t r, r e i l i n s t r s i n t r a n s l a t i o n : 4 f o r r e i l i n s t r i n r e i l i n s t r s : 5 b a r f. c a n a l y z e r. a d d i n s t r u c t i o n ( r e i l i n s t r ) 6 7 # Get SMT e x p r e s s i o n f o r r e g i s t e r s and memory 8 eax = b a r f. c a n a l y z e r. g e t r e g e x p r ( eax ) 9 ebp = b a r f. c a n a l y z e r. g e t r e g e x p r ( ebp ) 10 a = barf. c analyzer. get mem expr ( ebp 0x8, 4) 11 b = barf. c analyzer. get mem expr ( ebp 0xc, 4) 12 c = barf. c analyzer. get mem expr ( ebp 0x4, 4) # Set r a n g e f o r v a r i a b l e a and b 15 b a r f. c a n a l y z e r. s e t p r e c o n d i t i o n (a>=2, a<=100) 16 b a r f. c a n a l y z e r. s e t p r e c o n d i t i o n (b>=2, b<=100) # Set d e s i r e d v a l u e f o r t h e r e s u l t 19 b a r f. c a n a l y z e r. s e t p o s t c o n d i t i o n ( c==13) # Check s a t i s f i a b i l i t y 22 i f b a r f. c a n a l y z e r. check ( ) == s a t : 23 # Get c o n c r e t e v a l u e f o r e x p r e s s i o n s 24 p r i n t barf. c analyzer. get expr value ( eax ) 25 p r i n t b a r f. c a n a l y z e r. g e t e x p r v a l u e ( a ) 26 p r i n t b a r f. c a n a l y z e r. g e t e x p r v a l u e (b) 27 p r i n t b a r f. c a n a l y z e r. g e t e x p r v a l u e ( c ) # Recover c o n t r o l f l o w graph cfg = barf. recover cfg (0 x80483ed, 0x ) # Get SMT e x p r e s s i o n f o r r e g i s t e r s and memory esp = b a r f. c a n a l y z e r. g e t r e g e x p r ( esp ) ebp = b a r f. c a n a l y z e r. g e t r e g e x p r ( ebp ) rv = barf. c analyzer. get mem expr ( ebp 0x10, 4) a = barf. c analyzer. get mem expr ( ebp 0xc, 4) b = barf. c analyzer. get mem expr ( ebp 0x8, 4) c = barf. c analyzer. get mem expr ( ebp 0x4, 4) # Set s t a c k p o i n t e r b a r f. c a n a l y z e r. s e t p r e c o n d i t i o n ( esp==0x b f f f c e e c ) # T r a v e r s e p a t h s and check s a t i s f i a b i l i t y paths = cfg. all simple bb paths (0 x80483ed, 0x ) f o r path i n paths : # I f i t i s s a t i s f i a b l e, g e t p o s s i b l e a s s i g n m e n t s # f o r memory and r e g i s t e r s i f b a r f. c a n a l y z e r. c h e c k p a t h s a t ( path, 0 x80483ed ) : p r i n t b a r f. c a n a l y z e r. g e t e x p r v a l u e ( r v ) p r i n t b a r f. c a n a l y z e r. g e t e x p r v a l u e ( a ) p r i n t b a r f. c a n a l y z e r. g e t e x p r v a l u e (b) p r i n t b a r f. c a n a l y z e r. g e t e x p r v a l u e ( c ) Fig. 4: Extract of an analysis script for assembly code on figure 2. Fig. 5: Extract of an analysis script for assembly code on figure 3.
8 Another assembly code is shown in figure 3. This code has multiple branches (lines 6, 8, 10) which means there are multiple paths from the function entry to the exit node. The framework implements a function to list all paths between two basic blocks and the code analyzer provides functionality to know which of all these paths are feasible as well as possible values assignments for the registers and variables involved. Figure 5 shows an extract of a script that does this checking. First, it recovers the control flow graph of the function (line 2). Then, as in the previous example, it models the registers and memory locations involved as SMT expressions (lines 5-10). Finally, it iterates all simple path (path with no loops) between the entry and exit basic blocks (line 18). For each path (list of basic blocks) it check satisfiability and if so, it gets possible assignments for registers and variables (lines 22-25). The path satisfiability check burden is encapsulated within the CodeAnalyzer class. It basically adds each instruction from each basic block in the path to the analyzer and includes restrictions on the jump conditions forcing them to be true or false according to the case. 5 Case Study Return-oriented programming (ROP) is a technique for software exploitation which allows arbitrary computation by an attacker through the use of code fragments called gadgets borrowed from the attacked program without the need of code injection [24]. This technique is necessary to achieve successful exploitation on modern operating systems as most of them implement mechanisms of data execution prevention. 5.1 ROP Gadgets In addition to the analysis modules implemented and described in the previous section, we used BARF to create a tool to automatically search, classify and verify ROP gadgets [18] in x86 binary programs. The tool is based on the ideas exposed by Schwartz et al. in [17] and shows some of the capabilities of the framework. It is a step forward in the direction of tools for automatic exploit generation [17,1]. The tool has three stages. In the search stage, it looks for ret instructions in the whole text section of a given binary. Once all ret instructions have been identified, it disassembles backwards from these addresses trying to recover valid instructions (this is done according to the algorithm described in [18].) Sequences of instructions that disassembled correctly ending with a ret instruction are considered candidate gadgets. The second stage, classification, goes through the list of candidate gadgets and classifies each one of them in different types (arithmetic, memory read/write, etc) through emulation. This means that each gadget is executed using the REIL emulator provided by BARF and its output analyzed. For instance, if we found that a register contains the sum of other two (previously assigned with random values), then, we classify that gadget as an arithmetic gadget. The third stage, verification, involves the use of the SMT solver to verified that the type assigned in the previous step
9 is the correct one. It is worth noting that the last two stages are architectureindependent. So, in principle, they can be applied to any supported architecture provided that a list of gadget is given. The only architecture dependent part is the first stage, the one involved in searching for gadgets. This happens due to some architecture-specific details that have impact on gadget discovery. For instance, in the x86 architecture one can search for ret-ended gadgets through all the text section of a binary program with byte granularity. On other architectures, e.g. SPARC, this cannot be done as instructions are aligned and gadgets have to be searched in a different way. However, a general search stage can be accomplished considering the details mentioned in [11] since disassembly and translation of instructions to REIL is done transparently. 6 Related Work There are several binary analysis frameworks. All of them differ in the approach they were motivated by different applications as well as the set of functionalities they provide. BitBlaze [19] is a project that aims to provide a full binary analysis stack covering almost the whole spectrum of related tasks. It is composed of: a) Vine, a static analyzer; b) TEMU, a dynamic analyzer and c) Rudder, a symbolic execution engine. BAP [2] is a another framework based on Vine. It operates on an intermediate language called BIL which explicitly represents all side effects of assembly instructions. BAP has several built-in analyses and can perform symbolic execution. Insight [13] is a static analysis framework designed for Unix-like systems, supports i386 and amd64 architectures and can load many file formats, i.e., PE, ELF, Mach-0. It has its own IR called Microcode and provides tools to manipulate and transform binary code. Jakstab [14] is another static analysis framework based on abstract interpretation and designed to support multiple architectures using customized instruction decoding (currently, supports x86 (32 bits) for PE and ELF file formats.) It uses an IR inspired by the semantics specification language (SSL) [6]. Analysis algorithms run on top of this language and translation is done on the fly as the analysis is performed. Finally, the radare project [23] is built under the Unix-liked concepts such as everything is a file and small programs that interact through standard I/O. Currently is composed of a set of utilities including a debugger, an assembler/disassembler and a binary diffing tool. 7 Conclusion & Future Work In this paper, we presented BARF, a binary analysis framework that aims at being a cross-platform analysis tool. BARF provides means for analysis and verification on binary code. Its intermediate language, based on REIL, encode explicitly the side effects of the instructions of the source architecture. We, also, presented a tool for ROP gadget finding built upon the framework. Our next step is to provide support for the ARM architecture. Also, we plan to give support for
10 symbolic execution and taint analysis to cover a broader spectrum of common binary analysis requirements. References 1. Avgerinos, Thanassis, et al. AEG: Automatic Exploit Generation. NDSS. Vol Brumley, David, et al. BAP: A binary analysis platform. Computer Aided Verification. Springer Berlin Heidelberg, Capstone, 4. Cha, Sang Kil, et al. Unleashing mayhem on binary code. Security and Privacy (SP), 2012 IEEE Symposium on. IEEE, Chipounov, Vitaly, and George Candea. Dynamically Translating x86 to LLVM using QEMU. No. EPFL-REPORT Cifuentes, Cristina, and Shane Sendall. Specifying the semantics of machine instructions. Program Comprehension, IWPC 98. Proceedings., 6th International Workshop on. IEEE, IDA, Interactive Disassembler 8. Hopper, OS X and Linux disassembler 9. De Moura, Leonardo, and Nikolaj Bjørner. Z3: An efficient SMT solver. Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, Dullien, Thomas, and Sebastian Porst. REIL: A platform-independent intermediate representation of disassembled code for static code analysis. Proceeding of CanSecWest (2009). 11. Dullien, Thomas, Tim Kornau, and Ralf-Philipp Weinmann. A Framework for Automated Architecture-Independent Gadget Search. WOOT Fracture Insight, Kinder, Johannes, and Helmut Veith. Jakstab: A static analysis platform for binaries. Computer Aided Verification. Springer Berlin Heidelberg, Li, Lixin, and Chao Wang. Dynamic analysis and debugging of binary code for security applications. Runtime Verification. Springer Berlin Heidelberg, PyBFD, Schwartz, Edward J., Thanassis Avgerinos, and David Brumley. Q: Exploit Hardening Made Easy. USENIX Security Symposium Shacham, Hovav. The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86). Proceedings of the 14th ACM conference on Computer and communications security. ACM, Song, Dawn, et al. BitBlaze: A new approach to computer security via binary analysis. Information systems security. Springer Berlin Heidelberg, PySymEmu, Vanegue, Julien, Sean Heelan, and Rolf Rolles. SMT Solvers in Software Security. WOOT Linn, Cullen, and Saumya Debray. Obfuscation of executable code to improve resistance to static disassembly. Proceedings of the 10th ACM conference on Computer and communications security. ACM, radare, Bratus, Sergey, et al. Exploit Programming: From Buffer Overflows to Weird Machines and Theory of Computation. USENIX; login (2011):
Platform-independent static binary code analysis using a metaassembly
Platform-independent static binary code analysis using a metaassembly language Thomas Dullien, Sebastian Porst zynamics GmbH CanSecWest 2009 Overview The REIL Language Abstract Interpretation MonoREIL
Return-oriented programming without returns
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Return-oriented programming without urns S. Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, H. Shacham, M. Winandy
Introduction. Figure 1 Schema of DarunGrim2
Reversing Microsoft patches to reveal vulnerable code Harsimran Walia Computer Security Enthusiast 2011 Abstract The paper would try to reveal the vulnerable code for a particular disclosed vulnerability,
MSc Computer Science Dissertation
University of Oxford Computing Laboratory MSc Computer Science Dissertation Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities Author: Sean Heelan Supervisor: Dr. Daniel
Instruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine
Automating Mimicry Attacks Using Static Binary Analysis
Automating Mimicry Attacks Using Static Binary Analysis Christopher Kruegel and Engin Kirda Technical University Vienna [email protected], [email protected] Darren Mutz, William Robertson,
Jakstab: A Static Analysis Platform for Binaries
Jakstab: A Static Analysis Platform for Binaries (Tool Paper) Johannes Kinder and Helmut Veith Technische Universität Darmstadt, 64289 Darmstadt, Germany Abstract. For processing compiled code, model checkers
Introduction. Application Security. Reasons For Reverse Engineering. This lecture. Java Byte Code
Introduction Application Security Tom Chothia Computer Security, Lecture 16 Compiled code is really just data which can be edit and inspected. By examining low level code protections can be removed and
Automatic Network Protocol Analysis
Gilbert Wondracek, Paolo M ilani C omparetti, C hristopher Kruegel and E ngin Kirda {gilbert,pmilani}@ seclab.tuwien.ac.at chris@ cs.ucsb.edu engin.kirda@ eurecom.fr Reverse Engineering Network Protocols
The Beast is Resting in Your Memory On Return-Oriented Programming Attacks and Mitigation Techniques To appear at USENIX Security & BlackHat USA, 2014
Intelligent Things, Vehicles and Factories: Intel Workshop on Cyberphysical and Mobile Security 2014, Darmstadt, June 11 The Beast is Resting in Your Memory On Return-Oriented Programming Attacks and Mitigation
CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08
CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 20: Stack Frames 7 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Where We Are Source code if (b == 0) a = b; Low-level IR code
How To Trace
CS510 Software Engineering Dynamic Program Analysis Asst. Prof. Mathias Payer Department of Computer Science Purdue University TA: Scott A. Carr Slides inspired by Xiangyu Zhang http://nebelwelt.net/teaching/15-cs510-se
Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers
CS 4 Introduction to Compilers ndrew Myers Cornell University dministration Prelim tomorrow evening No class Wednesday P due in days Optional reading: Muchnick 7 Lecture : Instruction scheduling pr 0 Modern
Evaluating a ROP Defense Mechanism. Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis Columbia University
Evaluating a ROP Defense Mechanism Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis Columbia University Outline Background on ROP attacks ROP Smasher Evaluation strategy and results Discussion
Dongwoo Kim : Hyeon-jeong Lee s Husband
2/ 32 Who we are Dongwoo Kim : Hyeon-jeong Lee s Husband Ph.D. Candidate at Chungnam National University in South Korea Majoring in Computer Communications & Security Interested in mobile hacking, digital
Stitching the Gadgets On the Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection
USENIX Security Symposium 2014, San Diego, CA, USA Stitching the Gadgets On the Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection Lucas Davi Intel Collaborative Research Institute for
Attacking Obfuscated Code with IDA Pro. Chris Eagle
Attacking Obfuscated Code with IDA Pro Chris Eagle Outline Introduction Operation Demos Summary 2 First Order Of Business MOVE UP AND IN! There is plenty of room up front I can't increase the font size
Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: [email protected] Alfonso Ortega: alfonso.ortega@uam.
Compilers Spring term Mick O Donnell: [email protected] Alfonso Ortega: [email protected] Lecture 1 to Compilers 1 Topic 1: What is a Compiler? 3 What is a Compiler? A compiler is a computer
I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich
I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation Mathias Payer, ETH Zurich Motivation Applications often vulnerable to security exploits Solution: restrict application
Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation
Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan Caballero, Zhenkai Liang, James Newsome, and Dawn
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters
Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.
TitanMist: Your First Step to Reversing Nirvana TitanMist. mist.reversinglabs.com
TitanMist: Your First Step to Reversing Nirvana TitanMist mist.reversinglabs.com Contents Introduction to TitanEngine.. 3 Introduction to TitanMist 4 Creating an unpacker for TitanMist.. 5 References and
Language Processing Systems
Language Processing Systems Evaluation Active sheets 10 % Exercise reports 30 % Midterm Exam 20 % Final Exam 40 % Contact Send e-mail to [email protected] Course materials at www.u-aizu.ac.jp/~hamada/education.html
esrever gnireenigne tfosorcim seiranib
esrever gnireenigne tfosorcim seiranib Alexander Sotirov [email protected] CanSecWest / core06 Reverse Engineering Microsoft Binaries Alexander Sotirov [email protected] CanSecWest / core06 Overview
GameTime: A Toolkit for Timing Analysis of Software
GameTime: A Toolkit for Timing Analysis of Software Sanjit A. Seshia and Jonathan Kotker EECS Department, UC Berkeley {sseshia,jamhoot}@eecs.berkeley.edu Abstract. Timing analysis is a key step in the
Identification and Removal of
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Ei Eric Laspe, Reverse Engineer Jason Raber, Lead Reverse Engineer Overview The Problem:
An Introduction to Assembly Programming with the ARM 32-bit Processor Family
An Introduction to Assembly Programming with the ARM 32-bit Processor Family G. Agosta Politecnico di Milano December 3, 2011 Contents 1 Introduction 1 1.1 Prerequisites............................. 2
Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail
Hacking Techniques & Intrusion Detection Ali Al-Shemery arabnix [at] gmail All materials is licensed under a Creative Commons Share Alike license http://creativecommonsorg/licenses/by-sa/30/ # whoami Ali
How to Grow a TREE from CBASS Interactive Binary Analysis for Security Professionals
How to Grow a TREE from CBASS Interactive Binary Analysis for Security Professionals Lixin (Nathan) Li, Xing Li, Loc Nguyen, James E. Just Outline Background Interactive Binary Analysis with TREE and CBASS
Binary Code Extraction and Interface Identification for Security Applications
Binary Code Extraction and Interface Identification for Security Applications Juan Caballero Noah M. Johnson Stephen McCamant Dawn Song UC Berkeley Carnegie Mellon University Abstract Binary code reuse
Software Fingerprinting for Automated Malicious Code Analysis
Software Fingerprinting for Automated Malicious Code Analysis Philippe Charland Mission Critical Cyber Security Section October 25, 2012 Terms of Release: This document is approved for release to Defence
Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-UA.0201-003 Computer Systems Organization Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Some slides adapted (and slightly modified)
2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
X86-64 Architecture Guide
X86-64 Architecture Guide For the code-generation project, we shall expose you to a simplified version of the x86-64 platform. Example Consider the following Decaf program: class Program { int foo(int
İSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
Dynamic Behavior Analysis Using Binary Instrumentation
Dynamic Behavior Analysis Using Binary Instrumentation Jonathan Salwan [email protected] St'Hack Bordeaux France March 27 2015 Keywords: program analysis, DBI, DBA, Pin, concrete execution, symbolic
Automated Program Behavior Analysis
Automated Program Behavior Analysis Stacy Prowell [email protected] March 2005 SQRL / SEI Motivation: Semantics Development: Most engineering designs are subjected to extensive analysis; software is
Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives
Introduction to Programming and Algorithms Module 1 CS 146 Sam Houston State University Dr. Tim McGuire Module Objectives To understand: the necessity of programming, differences between hardware and software,
Introduction to Reverse Engineering
Introduction to Reverse Engineering Inbar Raz Malware Research Lab Manager December 2011 What is Reverse Engineering? Reverse engineering is the process of discovering the technological principles of a
64-Bit NASM Notes. Invoking 64-Bit NASM
64-Bit NASM Notes The transition from 32- to 64-bit architectures is no joke, as anyone who has wrestled with 32/64 bit incompatibilities will attest We note here some key differences between 32- and 64-bit
Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
Cloud Computing. Up until now
Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines
Computer Organization and Architecture
Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal
LASTLINE WHITEPAPER. Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade
LASTLINE WHITEPAPER Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade Abstract Malicious code is an increasingly important problem that threatens the security of computer systems. The
Dynamic Spyware Analysis
Dynamic Spyware Analysis Manuel Egele, Christopher Kruegel, Engin Kirda, Heng Yin, and Dawn Song Secure Systems Lab Technical University Vienna {pizzaman,chris,ek}@seclab.tuwien.ac.at Carnegie Mellon University
Static detection of C++ vtable escape vulnerabilities in binary code
Static detection of C++ vtable escape vulnerabilities in binary code David Dewey Jonathon Giffin School of Computer Science Georgia Institute of Technology ddewey, [email protected] Common problem in C++
Hotpatching and the Rise of Third-Party Patches
Hotpatching and the Rise of Third-Party Patches Alexander Sotirov [email protected] BlackHat USA 2006 Overview In the next one hour, we will cover: Third-party security patches _ recent developments
Obfuscation: know your enemy
Obfuscation: know your enemy Ninon EYROLLES [email protected] Serge GUELTON [email protected] Prelude Prelude Plan 1 Introduction What is obfuscation? 2 Control flow obfuscation 3 Data flow
Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic
Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic The challenge When building distributed, large-scale applications, quality assurance (QA) gets increasingly
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Towards a Framework for Generating Tests to Satisfy Complex Code Coverage in Java Pathfinder
Towards a Framework for Generating Tests to Satisfy Complex Code Coverage in Java Pathfinder Matt Department of Computer Science and Engineering University of Minnesota [email protected] Abstract We present
CS61: Systems Programing and Machine Organization
CS61: Systems Programing and Machine Organization Fall 2009 Section Notes for Week 2 (September 14 th - 18 th ) Topics to be covered: I. Binary Basics II. Signed Numbers III. Architecture Overview IV.
Optimised Realistic Test Input Generation
Optimised Realistic Test Input Generation Mustafa Bozkurt and Mark Harman {m.bozkurt,m.harman}@cs.ucl.ac.uk CREST Centre, Department of Computer Science, University College London. Malet Place, London
Harnessing Intelligence from Malware Repositories
Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University of Louisiana at Lafayette [email protected], [email protected] 7/22/2015 (C) 2015
Memory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
Transparent Monitoring of a Process Self in a Virtual Environment
Transparent Monitoring of a Process Self in a Virtual Environment PhD Lunchtime Seminar Università di Pisa 24 Giugno 2008 Outline Background Process Self Attacks Against the Self Dynamic and Static Analysis
Reverse Engineering and Computer Security
Reverse Engineering and Computer Security Alexander Sotirov [email protected] Introduction Security researcher at Determina, working on our LiveShield product Responsible for vulnerability analysis and
Central Processing Unit Simulation Version v2.5 (July 2005) Charles André University Nice-Sophia Antipolis
Central Processing Unit Simulation Version v2.5 (July 2005) Charles André University Nice-Sophia Antipolis 1 1 Table of Contents 1 Table of Contents... 3 2 Overview... 5 3 Installation... 7 4 The CPU
The V8 JavaScript Engine
The V8 JavaScript Engine Design, Implementation, Testing and Benchmarking Mads Ager, Software Engineer Agenda Part 1: What is JavaScript? Part 2: V8 internals Part 3: V8 testing and benchmarking What is
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Software Reverse Engineering
Software Reverse Engineering Jacco Krijnen June 19, 2013 Abstract While reverse engineering probably started with the analysis of hardware, today it plays a significant role in the software world. We discuss
Detecting the One Percent: Advanced Targeted Malware Detection
Detecting the One Percent: Advanced Targeted Malware Detection Tomer Teller Check Point Software Technologies Session ID: SP02-T19 Session Classification: Intermediate Antivirus 20 th+ Anniversary The
Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis
Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis Juan Caballero, Heng Yin, Zhenkai Liang Carnegie Mellon University Dawn Song Carnegie Mellon University & UC Berkeley
The Advantages of Block-Based Protocol Analysis for Security Testing
The Advantages of Block-Based Protocol Analysis for Security Testing Dave Aitel Immunity,Inc. 111 E. 7 th St. Suite 64, NY NY 10009, USA [email protected] February, 4 2002 Abstract. This paper describes
Pattern Insight Clone Detection
Pattern Insight Clone Detection TM The fastest, most effective way to discover all similar code segments What is Clone Detection? Pattern Insight Clone Detection is a powerful pattern discovery technology
Secure Web Application Coding Team Introductory Meeting December 1, 2005 1:00 2:00PM Bits & Pieces Room, Sansom West Room 306 Agenda
Secure Web Application Coding Team Introductory Meeting December 1, 2005 1:00 2:00PM Bits & Pieces Room, Sansom West Room 306 Agenda 1. Introductions for new members (5 minutes) 2. Name of group 3. Current
From Faust to Web Audio: Compiling Faust to JavaScript using Emscripten
From Faust to Web Audio: Compiling Faust to JavaScript using Emscripten Myles Borins Center For Computer Research in Music and Acoustics Stanford University Stanford, California United States, [email protected]
A Business Process Services Portal
A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru
Off-by-One exploitation tutorial
Off-by-One exploitation tutorial By Saif El-Sherei www.elsherei.com Introduction: I decided to get a bit more into Linux exploitation, so I thought it would be nice if I document this as a good friend
Glossary of Object Oriented Terms
Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction
Full System Emulation:
Full System Emulation: Achieving Successful Automated Dynamic Analysis of Evasive Malware Christopher Kruegel Lastline, Inc. [email protected] 1 Introduction Automated malware analysis systems (or sandboxes)
Format string exploitation on windows Using Immunity Debugger / Python. By Abysssec Inc WwW.Abysssec.Com
Format string exploitation on windows Using Immunity Debugger / Python By Abysssec Inc WwW.Abysssec.Com For real beneficiary this post you should have few assembly knowledge and you should know about classic
System/Networking performance analytics with perf. Hannes Frederic Sowa <[email protected]>
System/Networking performance analytics with perf Hannes Frederic Sowa Prerequisites Recent Linux Kernel CONFIG_PERF_* CONFIG_DEBUG_INFO Fedora: debuginfo-install kernel for
Efficient Program Exploration by Input Fuzzing
Efficient Program Exploration by Input Fuzzing towards a new approach in malcious code detection Guillaume Bonfante Jean-Yves Marion Ta Thanh Dinh Université de Lorraine CNRS - INRIA Nancy First Botnet
4D and SQL Server: Powerful Flexibility
4D and SQL Server: Powerful Flexibility OVERVIEW MS SQL Server has become a standard in many parts of corporate America. It can manage large volumes of data and integrates well with other products from
Peach Fuzzer Platform
Fuzzing is a software testing technique that introduces invalid, malformed, or random data to parts of a computer system, such as files, network packets, environment variables, or memory. How the tested
A Test Suite for Basic CWE Effectiveness. Paul E. Black. [email protected]. http://samate.nist.gov/
A Test Suite for Basic CWE Effectiveness Paul E. Black [email protected] http://samate.nist.gov/ Static Analysis Tool Exposition (SATE V) News l We choose test cases by end of May l Tool output uploaded
Operating Systems. Virtual Memory
Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page
l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language
198:211 Computer Architecture Topics: Processor Design Where are we now? C-Programming A real computer language Data Representation Everything goes down to bits and bytes Machine representation Language
ios applications reverse engineering Julien Bachmann [email protected]
ios applications reverse engineering 1 Julien Bachmann [email protected] Agenda Motivations The architecture Mach-O Objective-C ARM AppStore binaries Find'em Decrypt'em Reverse'em What to look for Where to
How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer
How to make the computer understand? Fall 2005 Lecture 15: Putting it all together From parsing to code generation Write a program using a programming language Microprocessors talk in assembly language
Exploiting nginx chunked overflow bug, the undisclosed attack vector
Exploiting nginx chunked overflow bug, the undisclosed attack vector Long Le [email protected] About VNSECURITY.NET CLGT CTF team 2 VNSECURITY.NET In this talk Nginx brief introduction Nginx chunked
CPU Organisation and Operation
CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program
Static Taint-Analysis on Binary Executables
Static Taint-Analysis on Binary Executables Sanjay Rawat, Laurent Mounier, Marie-Laure Potet VERIMAG University of Grenoble October 2011 Static Taint-Analysis on Binary Executables 1/29 Outline 1 Introduction
The BINCOA Framework for Binary Code Analysis
1/ 13 The BINCOA Framework for Binary Code Analysis Sébastien Bardin, Philippe Herrmann, Jérôme Leroux, Olivier Ly, Renaud Tabary, Aymeric Vincent CEA LIST (Saclay, Paris) LABRI (Bordeaux) Binary code
Next-Generation Protection Against Reverse Engineering
Next-Generation Protection Against Reverse Engineering 4 February 2005 Chris Coakley, Jay Freeman, Robert Dick [email protected]; [email protected]; [email protected] Purpose This white
Compiler-Assisted Binary Parsing
Compiler-Assisted Binary Parsing Tugrul Ince [email protected] PD Week 2012 26 27 March 2012 Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance
Dynamic Spyware Analysis
Dynamic Spyware Analysis Manuel Egele, Christopher Kruegel, Engin Kirda Secure Systems Lab Technical University Vienna {pizzaman,chris,ek}@seclab.tuwien.ac.at Heng Yin Carnegie Mellon University and College
Chapter 7D The Java Virtual Machine
This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly
How to Sandbox IIS Automatically without 0 False Positive and Negative
How to Sandbox IIS Automatically without 0 False Positive and Negative Professor Tzi-cker Chiueh Computer Science Department Stony Brook University [email protected] 2/8/06 Blackhat Federal 2006 1 Big
Outline. Introduction. State-of-the-art Forensic Methods. Hardware-based Workload Forensics. Experimental Results. Summary. OS level Hypervisor level
Outline Introduction State-of-the-art Forensic Methods OS level Hypervisor level Hardware-based Workload Forensics Process Reconstruction Experimental Results Setup Result & Overhead Summary 1 Introduction
Hardware Assisted Virtualization
Hardware Assisted Virtualization G. Lettieri 21 Oct. 2015 1 Introduction In the hardware-assisted virtualization technique we try to execute the instructions of the target machine directly on the host
Visualizing Information Flow through C Programs
Visualizing Information Flow through C Programs Joe Hurd, Aaron Tomb and David Burke Galois, Inc. {joe,atomb,davidb}@galois.com Systems Software Verification Workshop 7 October 2010 Joe Hurd, Aaron Tomb
Certification Authorities Software Team (CAST) Position Paper CAST-13
Certification Authorities Software Team (CAST) Position Paper CAST-13 Automatic Code Generation Tools Development Assurance Completed June 2002 NOTE: This position paper has been coordinated among the
Heap-based Buffer Overflow Vulnerability in Adobe Flash Player
Analysis of Zero-Day Exploit_Issue 03 Heap-based Buffer Overflow Vulnerability in Adobe Flash Player CVE-2014-0556 20 December 2014 Table of Content Overview... 3 1. CVE-2014-0556 Vulnerability... 3 2.
Attacking x86 Windows Binaries by Jump Oriented Programming
Attacking x86 Windows Binaries by Jump Oriented Programming L. Erdődi * * Faculty of John von Neumann, Óbuda University, Budapest, Hungary [email protected] Abstract Jump oriented programming
