Java Obfuscation Salah Malik BSc Computer Science 2001/2002

Size: px
Start display at page:

Download "Java Obfuscation Salah Malik BSc Computer Science 2001/2002"

Transcription

1 Java Obfuscation Salah Malik BSc Computer Science 2001/2002

2 Summary Java has become a popular language in both academia and industry. Its strength lies in the "Write Once Run Anywhere" paradigm. This is achieved through compiling the source code into "byte code" for the Java Virtual Machine (JVM). Unfortunately this byte code can be very easily reverse engineered. That is; changed from byte code back into the original source code. The problem of decompilation has been addressed through the usage of "obfuscation". The byte code is altered in such a way to render the source code difficult for humans to read after decompilation. The less human-readable the code is, the more successful the obfuscation can be considered. This report includes an outline of the way the JVM operates and the class file format into which Java source programs are transformed. There is also a description of the current known obfuscation techniques and how they affect Java programs. The main aim of this report is to investigate the effects of obfuscation on Java byte codes. This includes an evaluation of currently available obfuscation tools and an investigation into the possibility and technical problems of obfuscation via unconditional jump statements. Initially the development of an obfuscator for private/public methods, variables and class names was considered, but this was later rejected for reasons that are made clear in this report. An extensive strategy was implemented for the background research, focusing on the implications of code protection and reverse engineering, the structure of the JVM, the class file format and the byte code instruction set, the principles and techniques of obfuscation and the actual services by currently available obfuscators. Most of the reference is web based as Java obfuscation is on the cutting-edge of software security. The initial and revised project schedules are available in Appendix A. The initial plan aimed to finish the initial research by February 2002, the evaluation of obfuscation software by March 2002, the development of a name obfuscator by April 2002 and the investigation of jump statement obfuscation by May This however did not take into the account of the writing of this report and the development of the name obfuscator was rejected as stated above. The revised schedule reflects this. i

3 Acknowledgements I would like to thank my supervisor Chris Gillespie and my project assessor Dr. Nick Efford for the invaluable advice they have given me throughout this project. I would also like to thank Dr. Sara Fores, the project administrator, and Mr. Martyn Clark for their advice. ii

4 Table of Contents Summary... p.i Acknowledgements... p.ii Table of Contents... p.iii 1. Introduction to Code Protection... p Introduction... p Server Side Execution... p Encryption... p Signed Native Code Execution... p Code Obfuscation... p Decompilation... p Deobfuscation... p Why Java?... p.5 2. Java Virtual Machine... p Introduction to the Java Virtual Machine... p Introduction to Java Architecture... p Introduction to Java Byte Codes... p Class Loader... p Byte Code Verifier... p Supported Data Types... p JVM Registers... p Method Area... p Java Stack... p Garbage-Collected Heap... p Frames... p Java Instruction Set... p.9 3. Class File Format... p Introduction to Class File Format... p Magic Number... p Major Number, Minor Number... p Constant Pool... p Access flags... p This Class... p Super Class... p.11 iii

5 3.8. Interfaces... p Fields... p Methods... p Attributes... p Class File Descriptors... p Field Descriptors... p Method Descriptors... p Code Obfuscation Techniques... p Obfuscation Transformation... p Definition... p Quality... p Potency... p Definition... p Measure Scale... p Resilience... p Definition... p Measure Scale... p Stealth... p Definition... p Measure Scale... p Cost... p Definition... p Measure Scale... p Types of Obfuscation... p Layout Transformations... p Change Formatting... p Scrambling Identifier Names... p Remove Comments... p Data Transformations... p Data Storage... p Data Encoding... p Data Aggregation... p Data Ordering... p Control Transformations... p Opaque Constructs... p Definition... p.23 iv

6 Trivial Constructs... p Weak Constructs... p Control Aggregation... p Control Ordering... p Control Computations... p Preventive Transformations... p Inherent Preventive Transformations... p Targeted Preventive Transformations... p Evaluation of Currently Available Tools... p Purpose of Evaluation... p Web Resources... p Comparison of Obfuscation Reviews Available... p Evaluation Criteria... p Cost... p Availability... p Range of Transformations Offered... p Potency... p Resilience... p Stealth... p Execution Cost... p Effectiveness against Decompilation... p Usability... p Documentation... p Test Data... p Obfuscation Software Tools... p Zelix KlassMaster... p Jshrink... p DashO-Pro... p File Obfuscator... p RetroGuard... p stBarrier... p LKitObfuscator... p Aubjex... p CafeBabe... p CodeShield... p Condensity... p.36 v

7 Crema... p Elixir... p Excelsior Jet... p Helseth JObfuscator... p Jammer... p JCloak... p Jopt... p Mocha Source Obfuscator... p Marvin Obfuscator... p Obfuscate... p ShroudIt!... p WingGuard... p SmokeScreen... p JMangle... p JODE... p JOBE... p HashJava... p SourceGuard... p Evaluation Results... p Conclusion... p Investigation Goto Statement Obfuscation... p Purpose of Investigation... p Justification for Choice of Transformation... p Investigation Method... p Test Data... p Test Results...p Statement Rearranging... p For Loop... p For Loop with Opaque Constructs... p Method Parameters... p Evaluation of Test Results... p Implementation... p Byte Code Engineering Library... p jclasslib... p Kawa... p.48 vi

8 6.8. Evaluation of Obfuscator... p Difficulties in Development... p Investigation Conclusion... p Conclusion... p Obfuscation Evaluation... p Goto Statement Obfuscation... p.49 Bibliography... p.50 Appendix A - Reflection... p.59 Appendix B - Software Metrics Table... p.61 Appendix C - Evaluation Table and Results... p.63 Appendix D - Obfuscated Program Code... p.69 D.1. Test Data... p.69 D.2. Disassembled Class Files... p.71 D.3. Decompiled Class Files... p.79 D.4. Zelix KlassMaster... p.81 D.5. Jshrink... p.86 D.6. DashO-Pro... p.89 D.7. File Obfuscator... p.90 D.8. RetroGuard... p.90 D.9. 1stBarrier... p.98 D.10. 2LKitObfuscator... p.103 D.11. Aubjex... p.106 D.12. CafeBabe... p.116 D.13. JCloak... p.118 D.14. Jopt... p.121 D.15. Mocha Source Obfuscator... p.123 D.16. ShroudIt!... p.126 D.17. WingGuard... p.129 D.18. SmokeScreen... p.133 D.19. JMangle... p.137 D.20. SourceGuard... p.140 vii

9 1. Introduction to Code Protection This section introduces the main techniques for protecting programs written in network-friendly formats, giving details as to why obfuscation is the most preferable solution. It also mentions decompilation and deobfuscation and how they affect obfuscation Introduction The advent of network-independent programs has seen the need for code protection increase. Prior to this, programs were compiled for specific hardware and operating systems [34]. During compilation, information such as variable names and references to library routines were removed, producing hardware dependent machine code that was large in size and had low portability, i.e. executable only on computers of the same hardware specification as that of the computer on which the original source code was compiled, ([34], [8] p.1, [41], [42], [12]). As the machine code was stored in binary files, they proved difficult to read. This has changed with the introduction of hardware-independent specifications, which partially compile the code into a format that can be run on a separate software implementation ([8] p.1). This not only allows programs to be run on different platforms (i.e. they have high portability), but because the code does not have any hardware specific library routine calls, they are considerably small, thus making them easier to transfer over networks ([8] p.1). There is also one major drawback: because there are no hardware/operating system specific code constructs, the programs have proven to be easy to decompile into the original code. This gives ample opportunity for pirate and rival software developers to obtain vital algorithms and data structures contained within the code ([8] p.1). A software developer can take legal action to protect their code. Software artefacts are covered under copyright law; however, it can prove to be expensive for a small software house taking on a larger and more powerful corporation ([8] p.3). In response to this problem, a number of technical code protection techniques have been drawn up Server Side Execution The most secure approach is for users to connect to a web site set up by the software developer to run the program remotely, paying a small amount of electronic money each time ([8] p.3). The program is executed on the developer s server and input/output is via the web. The reverse engineer never gains physical access to the application and so is unable to decompile the code ([8] p.3). 1

10 Figure 1.1: Protection by Server-Side Execution ([33] p.6) However, due to limits on network bandwidth and latency, the application will not perform as well as it could if it was run locally ([8] p.3). A way to get round this is to implement partial server side execution, where the application is broken into two parts: one part runs on the user s site and the other part (containing the code to be protected) is run remotely ([8] p.3, [34]). Figure 1.2: Protection by Partial Server-Side Execution ([33] p.6) 1.3. Encryption The software developer could encrypt the code and then send this encrypted code to users. This would guarantee protection against any software attacks except for two problems: one is that it only works if the entire encryption/decryption process takes place in hardware. This is because most encryption methods involve running a tamper-proofed environment (on a separate machine) to encrypt the code ([68] p.2, [11], [36], [69]). Compiled Java code, for example, is run on a software implementation of a 2

11 machine, and as the tamper-proofed environment runs processor-specific code, use of encryption methods is much more difficult if not impossible. Another drawback is that specialised hardware tends to limit the portability of programs. Figure 1.3: Protection by Encryption ([33] p.7) 1.4. Signed Native Code Execution The software developer can use just-in-time compilers to create an executable for all popular architectures ([8] p.3). A just-in-time (JIT) compiler is a program that turns code into instructions that can be sent directly to the processor [67]. When downloading the application, the user s site would have to identify the architecture/operating system combination it is running, and the corresponding version would be transmitted ([8] p.3). As the native code is processor-specific, it will prove harder for the reverse engineer to decompile ([8] p.3). Figure 1.4: Protection through Signed Native Code ([33] p.9) 3

12 There is still one drawback to this approach: native codes cannot be run with complete security on the user s machine ([8] p.3). To ensure that the code is safe to run on the user s system, digital signatures would be required. A digital signature can be thought of as a digital equivalent to a handwritten signature ([45] p.613), and is appended to a message (as extra data) to identify and authenticate the sender and message data using public-key encryption [23]. In public-key encryption, each person gets a pair of keys, called the public key and the private key. Each person's public key is published while the private key is kept secret. Messages are encrypted using the intended recipient's public key and can only be decrypted using his private key [24]. The sender uses a one-way hash function [(see [25])] to generate a hash-code of about 32 bits from the message data. He then encrypts the hash-code with his private key. The receiver recomputes the hash-code from the data and decrypts the received hash with the sender's public key. If the two hash-codes are equal, the receiver can be sure that data has not been corrupted and that it came from the given sender [23]. This method also increases software maintenance effort, as different versions of the application would have to be made for each of the different hardware specifications ([8] p.3) Code Obfuscation The software developer could use an obfuscator to obfuscate the program. The process of obfuscation transforms a program [by adding or removing code] so that it is more difficult to understand, yet is functionally identical to the original ([34], [26]). The program still provides the same functionality of the original application, except that it may be larger, run slower and have side effects such as creating files ([34], [8] p.3). This does not try to prevent someone from gaining access to the source code; instead it makes the task of using the data structures and algorithms within that much more difficult as the code will be more difficult for a human reverse engineer to read ([8] p.3). Although deobfuscators have been developed to counter this, a good obfuscation technique would prove to be effective even against this sort of application. Figure 1.5: Protection through Code Obfuscation ([33] p.10) 4

13 1.6. Decompilation A decompiler is a program that reads a program written in a machine language the source language and translates it into an equivalent program in a high-level language the target language. A decompiler, or reverse compiler, attempts to reverse the process of a compiler which translates a high-level program into a binary or executable program ([5] p.1). Decompilation usually is used for software maintenance and security ([5] p.15). However, it is also utilised by reverse engineers to gain access to key data structures and control constructs of other developers software ([8] p.2, [34]). Obfuscation aims to counter this by altering the object code so that the decompiled code resembles as little to the original source code as possible, making decompilation a futile exercise. See [4], [5], [6], [15], [16], [17], [19], [20], [27], [38], [44], [46], [47], [48] and [49] for more details Deobfuscation Deobfuscation attempts to undo the transformations of an obfuscator on the program code ([8] p.3), using techniques such as static analysis, data dependency analysis [29] and program slicing ([8] p.24). This can be utilised by reverse engineers to deobfuscate obfuscated code to obtain the source code. Because of this, obfuscation techniques must be able to withstand deobfuscation attacks Why Java? This report examines Java code obfuscation in particular because the Java byte code format into which Java programs are compiled is designed in such a way as to retain as much symbolic information about the original program as possible, the byte codes contain commands that can be run on a virtual machine, enabling them to be run on any hardware/operating system and the instruction set of the byte code is designed to be as small as possible. This not only makes Java programs easier to transmit over networks, but also easier to decompile. Other programs such as C++ programs are compiled into machine code, which is specific to the processor of the machine on which the program was compiled, so the program can only run on that machine. While C++ executables can be decompiled (see Section 1.6, [6] p.2), the decompiler must be specific to the hardware/operating system on which the program was compiled. 5

14 2. Java Virtual Machine This section describes the basic structure of the Java Virtual Machine architecture Introduction to the Java Virtual Machine The Java Virtual Machine, or JVM, is an abstract computer that runs compiled Java programs. The JVM is "virtual" because it is generally implemented in software on top of a "real" hardware platform and operating system. All Java programs are compiled for the JVM. Therefore, the JVM must be implemented on a particular platform before compiled Java programs will run on that platform [51]. The JVM plays a central role in making Java portable. It provides a layer of abstraction between the compiled Java program and the underlying hardware platform and operating system. The JVM is central to Java's portability because compiled Java programs run on the JVM, independent of whatever may be underneath a particular JVM implementation. The JVM is small when implemented in software. It was designed to be small so that it can fit in as many places as possible -- places like TV sets, cell phones, and personal computers. The JVM wants to be everywhere, and its success is indicated by the extent to which programs written in Java will run everywhere [51] Introduction to Java Architecture At the heart of Java technology lies the Java virtual machine. Although the name "Java" is generally used to refer to the Java programming language, there is more to Java than the language. The Java virtual machine, Java API, and Java class file work together with the language to make Java programs run. The components of the Java architecture are the JVM, the class file, API, and language. It gives an overview of Java's architecture, discusses why Java is important, and looks at Java's pros and cons [64] Introduction to Java Byte Codes Java byte codes can be thought of as the machine language of the JVM. The Java compiler reads Java language source (.java) files, translates the source into Java byte codes, and places the byte codes into class (.class) files. The compiler generates one class file per class in the source [51]. To the JVM, a stream of byte codes is a sequence of instructions. Each instruction consists of a onebyte opcode and zero or more operands. The opcode tells the JVM what action to take. If the JVM requires more information to perform the action than just the opcode, the required information immediately follows the opcode as operands. A mnemonic is defined for each byte code instruction. The mnemonics can be thought of as an assembly language for the JVM. For example, there is an 6

15 instruction that will cause the JVM to push a zero onto the stack. The mnemonic for this instruction is iconst_0, and its byte code value is 60 hex. This instruction takes no operands [51] Class Loader Class files are then loaded by the class loader, either locally or through a network. The Java class libraries required are also loaded at this stage. Before the class files are executed, they must be checked by the Java verifier. If no verification errors occur, the classes are executed by the JVM [33]. To execute a Java program, the interpreter is given the name of the main class in the program. This byte code class file is then searched for in the file system. For each class A that is loaded in to memory, the class loader determines the classes that are used by A. If these classes are not already present in memory, they must also be loaded into memory. This action is performed recursively until all the classes used by a program are present in memory. The classes are then checked by the byte code verifier [33] Byte Code Verifier The problem with distributing programs across a network, such as the Internet, is that the recipient may not be able to trust the program. The program may corrupt the user s system, either accidentally through poor programming, or deliberately, in the case of viruses. To stop this, Java byte codes are checked by the verifier before they are executed. The "virtual hardware" of the Java Virtual Machine can be divided into four basic parts: the registers, the stack, the garbage-collected heap, and the method area. These parts are abstract, just like the machine they compose, but they must exist in some form in every JVM implementation. The size of an address in the JVM is 32 bits.the JVM can, therefore, address up to 4 gigabytes (2 to the power of 32) of memory, with each memory location containing one byte. Each register in the JVM stores one 32-bit address. The stack, the garbage-collected heap, and the method area reside somewhere within the 4 gigabytes of addressable memory. The exact location of these memory areas is a decision of the implementor of each particular JVM. The method area, because it contains byte codes, is aligned on byte boundaries. The stack and garbage-collected heap are aligned on word (32- bit) boundaries [33] Supported Data Types The JVM on two kinds of types: primitive types and reference types [32]. There are, correspondingly, two kinds of values that can be stored in variables, passed as arguments, returned by methods, and operated upon: primitive values and reference values [32]. No type checking needs to be done by the JVM as this has been done by a compiler (e.g. javac) [32]. Instead, the instruction set of the Java virtual machine distinguishes its operand types using instructions intended to operate on 7

16 values of specific types, e,g iadd, ladd, fadd and dadd are all JVM that add two numeric values and produce numeric results, but each is specialized for its operand type: int, long, float, and double, respectively. Objects are either dynamically allocated class instances or arrays. A reference to an object is considered to have a JVM type of reference and can be thought of as pointers to objects [32]. More than one reference to an object may exist. Objects are always operated on, passed, and tested via values of type reference [32] JVM Registers The JVM has a program counter and three registers that manage the stack. It has few registers because the byte code instructions of the JVM operate primarily on the stack. This stack-oriented design helps keep the JVM's instruction set and implementation small. The JVM uses the program counter, or pc register, to keep track of where in memory it should be executing instructions. The other three registers -- optop register, frame register, and vars register -- point to various parts of the stack frame of the currently executing method. The stack frame of an executing method holds the state (local variables, intermediate results of calculations, etc.) for a particular invocation of the method [51] Method Area The method area is where the byte codes reside. The program counter always points to (contains the address of) some byte in the method area. The program counter is used to keep track of the thread of execution. After a byte code instruction has been executed, the program counter will contain the address of the next instruction to execute. After execution of an instruction, the JVM sets the program counter to the address of the instruction that immediately follows the previous one, unless the previous one specifically demanded a jump [51]. 2.9 Java Stack The Java stack is used to store parameters for and results of byte code instructions, to pass parameters to and return values from methods, and to keep the state of each method invocation. The state of a method invocation is called its stack frame. The vars, frame, and optop registers point to different parts of the current stack frame. There are three sections in a Java stack frame: the local variables, the execution environment, and the operand stack. The local variables section contains all the local variables being used by the current method invocation. It is pointed to by the vars register. The execution environment section is used to maintain the operations of the stack itself. It is pointed to by the frame register. The operand stack is used as a work space by byte code instructions. It is here that the parameters for byte code instructions are placed, and results of byte code instructions are 8

17 found. The top of the operand stack is pointed to by the optop register. The execution environment is usually sandwiched between the local variables and the operand stack. The operand stack of the currently executing method is always the topmost stack section, and the optop register therefore always points to the top of the entire Java stack [51] Garbage-Collected Heap The heap is where the objects of a Java program reside [51]. Programmers can allocate memory to an object using the new operator [51]. The Java language doesn't allow you to free allocated memory directly. Instead, the runtime environment keeps track of the references to each object on the heap, and automatically frees the memory occupied by objects that are no longer referenced -- a process called garbage collection [51]. See [53] and [66] for more details Frames This is where data and partial results are stored [32]. A new frame is created each time a method is invoked and is destroyed when its method invocation completes [32] Java Instruction Set For details about the JVM instruction set, see [32] Chapter 6 p.171, [50] and [53]. For more details about the JVM, see [2], [3], [37], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65] and [66]. 9

18 3. Class File Format This section describes the class file format Introduction to Class File Format The class file format defines the structure of files that a Java Virtual Machine (JVM) can run. It contains everything a JVM needs to know about one Java class or interface [52]. It consists of a stream of 8-bit bytes. All 16-bit, 32-bit, and 64-bit quantities are constructed by reading in two, four, and eight consecutive 8-bit bytes, respectively. Multibyte data items are always stored in big-endian order, where the high bytes come first ([32] p.93). The length of a class file cannot be predicted before loading, as each program contains a variable number of classes and interfaces, which in turn contain a variable number of fields and methods [52]. The class file format handles this by prefacing the actual information by its size or length. This way, when the class is being loaded by the JVM, the size of variable-length information is read first. Once the JVM knows the size, it can correctly read in the actual information [52]. Information about the many parts of the class file is generally written to the class file with no space or padding between consecutive pieces of information, keeping the size of the file down to a minimum so as to enable them to travel across networks more easily [52]. The order of class file components is strictly defined so JVM s can know what to expect, and where to expect it, when loading a class file [52]. The major components of a class file are (in order of appearance): magic number, minor and major version numbers, constant pool count, constant pool, access flags, this class, super class, interfaces, fields, methods, and attributes [52], [2]. The constant pool, interfaces, fields, methods and attributes components also have a count of the structures that they detail preceding them [32] Magic Number The first four bytes make up the magic number, whose value is always 0xCAFEBABE [52]. The magic number identifies the file as conforming to the class file format ([32] p.94) Major Number, Minor Number The second four bytes of the class file contain the major and minor version numbers. These numbers identify the version of the class file format to which a particular class file adheres and allow the JVM to verify that the class file is loadable [52] Constant Pool After the constant pool count is the constant pool itself. This is a table of structures representing various string constants, class and interface names, final variable values, variable names and types, 10

19 and method names and signatures and other constants that are referred to within the class file structure and its substructures ([32] p.95, [52]), not unlike a symbol table in compiler design terminology ([1] p.60-62). A method signature is its return type and set of argument types [52]. The constant pool is organized as an array of variable-length elements. Each constant occupies one element in the array. Throughout the class file, constants are referred to by the integer index that indicates their position in the array. The initial constant has an index of one, the second constant has an index of two, etc. The constant pool array is preceded by its array size, so [the JVM] will know how many constants to expect when loading the class file. Each element of the constant pool starts with a one-byte tag specifying the type of constant at that position in the array. Once a JVM grabs and interprets this tag, it knows what follows the tag. For example, if a tag indicates the constant is a string, the JVM expects the next two bytes to be the string length. Following this two-byte length, the JVM expects to find length number of bytes, which make up the characters of the string [52] Access Flags The next two bytes represent the access flags, which indicate whether or not this file defines a class or an interface, whether the class or interface is public or abstract, and (if it's a class and not an interface) whether the class is final [52] This Class The next two bytes represent the this class component, an index into the constant pool [52]. The constant pool entry at this index has two parts: a one-byte tag (which indicates that this element contains information about a class or interface) and a two-byte name index (which is a string constant containing the name of the class or interface) [52] Super Class The next two bytes represent the super class component [52]. For a class, the super class can be either zero or an index into the constant pool ([32] p.97). If it is zero, than the constant pool entry at this index, then this class file must represent the class Object, the only class or interface without a direct superclass ([32] p.97). Otherwise, the constant pool entry is the name of the super class from which this class descends [52]. For an instance, the super class is an index into the constant pool. The entry at this index must represent the class Object [32] Interfaces The next two bytes represent an array structure whose entries are indexes into the constant pool. The entries at these indexes represent the interfaces implemented by the class [52]. 11

20 3.9. Fields After the fields count is the fields component. This is an array of variable-length structures, one for each field. Each structure reveals information about one field such as the field's name, type, and, if it is a final variable, its constant value. Some information is contained in the structure itself, and some is contained in constant pool locations pointed to by the structure. The only fields that appear in the list are those that were declared by the class or interface defined in the file; no fields inherited from super classes or superinterfaces appear in the list [52] Methods Following the method count is the methods component, which is an array of variable-length structures, one for each method. The structure for each method contains several pieces of information about the method, including the method descriptor (its return type and argument list), the number of stack words required for the method's local variables, the maximum number of stack words required for the method's operand stack, a table of exceptions caught by the method, the byte code sequence, and a line number table [52] Attributes Following the attributes count is the attributes component, which is an array of variable-length structures, one for each attribute. These attributes give general information about the particular class or interface defined by the file. The JVM will silently ignore any attributes that it does not recognise [52] Class File Descriptors A descriptor is a string representing the type of a field or method. Descriptors are represented in the class file format using UTF-8 strings ([32] p.110, 111) and thus may be drawn, where not further constrained, from the entire Unicode character set ([32] p.99) Field Descriptors A field descriptor represents the type of a class, instance, or local variable. It is a series of characters generated by the grammar in Figure

21 MethodDescriptor ::= (ParameterDescriptor*)ReturnDescriptor ParameterDescriptor ::= FieldType ReturnDescriptor ::= FieldType V FieldDescriptor ::= FieldType ComponentType ::= FieldType FieldType ::= BaseTypeObjectTypeArrayType BaseType ::= B C D F I J S Z ObjectType ::= L <classname> ; ArrayType ::= [ComponentType] Terminals = B, C, D, F, I, J, S, Z, L, V, ;, [, ] Non-Terminals = MethodDescriptor, ParameterDescriptor, ReturnDescriptor, FieldDescriptor, ComponentType, FieldType, BaseType, ObjectType, ArrayType Figure 4.1: BNF grammar for class file descriptors ([32] p.102) The characters of BaseType, the L and ; of ObjectType, and the [ of ArrayType are all ASCII characters. The <classname> represents a fully qualified class or interface name. For historical reasons it is encoded in internal form ([32] p.101). In this internal form, the ASCII periods ('.') that normally separate the identifiers that make up the fully qualified name are replaced by ASCII forward slashes ('/'). For example, the normal fully qualified name of class Thread is java.lang.thread. In the form used in descriptors in the class file format, a reference to the name of class Thread is implemented using a CONSTANT_Utf8_info structure representing the string "java/lang/thread" ([32] p.99). 13

22 BaseType Character Type Interpretation B byte signed byte C char Unicode character D double double-precision floating-point value F float single-precision floating-point value I int integer J long long integer L<classname>; reference An instance of class <classname> S Short signed short Z boolean true or false V Void void (methods only) [ reference one array dimension [[ reference two array dimensions Table 4.1. The interpretation of the class file descriptor types ([32] p.101) For example, the descriptor of an instance variable of type int is simply I. The descriptor of an instance variable of type Object is Ljava/lang/Object; [32]. The descriptor of an instance variable that is a multidimensional int array (int array[][][]) is [[[I ([32] p.101) Method Descriptors A method descriptor represents the parameters that the method takes and the value that it returns ([32] p.102). It consists of a parameter descriptor and return descriptor. A parameter descriptor represents a parameter passed to a method. A return descriptor represents the type of the value returned from a method ([32] p.102). Method descriptors are generated by the grammar in Figure 4.1. For example, the method descriptor for the method Object method(int i, double d, Thread t) is (IDLjava/lang/Thread;)Ljava/lang/Object. Note that internal forms of the fully qualified names of Thread and Object are used in the method descriptor ([32] p.102). 14

23 4. Code Obfuscation Techniques This section describes techniques implemented by obfuscators. It defines what exactly is an obfuscating transformation, based on the quality of the transformation and the type of data that it targets Obfuscation Transformation An obfuscation transformation is a transformation that alters the program code (usually the byte codes of a class file in the case of Java obfuscation) by removing or adding code with the aim of making the process of decompiling (and deobfuscation) more difficult ([8] p.3, [34]) Definition In [8], an obfuscation transformation is defined as one that changes a program P into P, such that both P and P have the same observable behaviour regardless of any side effects such as more memory usage or degraded performance ([8] p.6). Furthermore, the following conditions must hold: If P fails to terminate or terminates with an error condition, then P may or may not terminates. Otherwise, P must terminate and produce the same output as P. ([8] p.7) Quality We define the quality of an obfuscation transformation as a combination of four measures: viz. potency, resilience, stealth and cost: T qual (P), the quality of a transformation T, is defined as the combination of the potency, resilience, [stealth] and cost of T : T qual (P) = (T pot (P), T res (P), T ste (P), T cost (P)). ([8] p.9) A transformation of high quality will aim to have high potency, resilience and stealth and low cost Potency The potency of an obfuscation transformation is a measure of how much different or more complex is the obfuscated program from the original program Definition 15

24 Let T be a [behaviour-conserving] transformation, such that P P transforms P into a target program P. Let E(P) be the complexity of P. T pot (P), the potency of T with respect to a program P, is a measure of the extent to which T changes the complexity of P. It is defined as T pot (P) = E(P )/E(P) 1. T is a potent obfuscating transformation if T pot (P) > 0. ([8] p.7) Measure Scale In the Software Complexity Metrics branch of Software Engineering, measures of complexity have been derived from theoretical and empirical studies of programs ([8] p.7). Using these measures, statements such as if programs P and P are identical except that P contains more of property q than P, then P is more complex than P. Given such a statement, we can attempt to construct a transformation which adds more of the q-property to a program, knowing that this is likely to increase its obscurity. ([8] p.7). The table in Appendix B provides an overview of some of the more popular software complexity measures ([8] p.8, [12]). In [5], potency is measured on the following scale: low medium high low potency high potency Figure 5.1: Potency scale of an obfuscation transformation In order for T to be a potent obfuscating transformation, it should Increase overall program size (µ 1 ) and introduce new classes and methods (µ 7a ). Introduce new predicates (µ 2 ) and increase the nesting level of conditional and looping constructs (µ 3 ). Increase the number of method arguments (µ 5 ) and inter-class instance variable dependencies (µ 7d ). Increase the height of the inheritance tree (µ 7b, µ 7c ). Increase long-range variable dependencies (µ 4 ). ([8] p.7) Resilience 16

25 The resilience of an obfuscation transformation is a measure of how difficult it is for someone to undo the obfuscation, in terms of the amount of time required for a programmer to construct a deobfuscator (programmer effort) and the execution time and space required by the deobfuscator to effectively reduce the potency of the transformation (deobfuscator effort) ([8] p.8). The main difference between the potency and the resilience of a transformation is that the potency attempts to confuse a human reader, whereas the resilience attempts to confuse a deobfuscator ([8] p.9) Definition Let T be a [behaviour-conserving] transformation, such that P P transforms P into a target program P. Let E(P) be the complexity of P. T res (P) is the resilience of T with respect to a program P. T res (P)=one-way if information is removed from P such that P cannot be reconstructed from P. Otherwise, T pot (P) = Resilience(T Deobfuscator effort, T Programmer effort ) Where Resilience is the function defined in the matrix in [the diagram below]. ([8] p.9) Measure Scale In [8], resilience is measured on the following scale: trivial weak strong full one-way low resilience high resilience Figure 5.2: Resilience scale of an obfuscation transformation ([8] p.9) Transformations that are the most resilient are described as one-way, i.e. they can never be undone ([8] p.9). This is typically because they remove information from the program that was useful to the human programmer, but which is not necessary in order to execute the program correctly. Other transformations typically add useless information to the program that does not change its observable behaviour, but which increases the information load on a human reader. These transformations can be undone with varying degrees of difficulty ([8] p.9). Figure 5.3 shows that deobfuscator effort is classified as either polynomial time or exponential time, whereas programmer effort is measured as a function of the scope of the transformation ([8] p.8). This is based on the intuition that it is easier to construct counter-measures against an obfuscating 17

26 transformation that only affects a small part of a procedure, than against one that may affect an entire program ([8] p.8). The scope of a transformation is defined using terminology borrowed from code optimisation theory: T is a local transformation if it affects a single basic block ([1] p.528) of a control flow graph (CFG) ([1] p.532), it is global if it affects an entire CFG, it is inter-procedural if it affects the flow of information between procedures, and it is an inter-process transformation if it affects the interaction between independently executing threads of control ([8] p.9). Programmer effort Inter-process full full Inter-procedural strong full Global weak strong Local trivial weak Deobfuscator effort Polynomial time Exponential time Figure 5.3: Resilience scale of an obfuscation transformation in terms of programmer effort and deobfuscator effort ([8] p.9) Stealth The stealth of an obfuscation transformation is a measure of how well hidden are the changes to the code after obfuscation. Obfuscated code that blends in well with the original code would be difficult for a reverse engineer to find and would therefore prove difficult to deobfuscate ([9] p.4). However, if the transformation introduces new code that differs wildly from what is in the original program it will be easy to spot for a reverse engineer ([10] p.3) Definition Let T be a [behaviour-conserving] transformation and Q be a program. P s (Q) is the set of language features used by Q, while P s (T) is the set of language features introduced by T. T ste (Q) is the stealth of T when it is applied to Q: 18

27 1.0, if P s (T) = 0. T ste (Q) = P s (T) \ P s (Q), otherwise. P s (T) ([33] Low Thesis p.23) Measure Scale In [33], stealth is measured on the following scale: unstealthy moderate stealthy low stealth high stealth Figure 5.4: Stealth scale of an obfuscation transformation If T ste (Q) is close to 1, then T is considered to be stealthy. Conversely if T ste (Q) is close to 0, then T is unstealthy. ([33] p.23) Cost The cost of an obfuscation transformation is a measure of the execution time/space overhead that it incurs on an obfuscated application ([8] p.9). This includes any changes in the file size or any degradation in performance (e.g. the obfuscated program runs slower than the original program) ([8] p.9) Definition Let T be a [behaviour-conserving] transformation, such that P P transforms P into a target program P. T cost (P) is the extra execution time/space of P compared to P. dear if executing P requires exponentially more resources than P. T pot (P) = costly if executing P requires O(n p ), p > 1, more resources than P. cheap if executing P requires O(n) more resources than P. free if executing P requires O(1) more resources than P. ([8] p.9) 19

28 Measure Scale In [8], the cost is measured on the following scale: free cheap costly dear low cost high cost Figure 5.5: Cost scale of an obfuscation transformation 4.2. Types of Obfuscation In [8], obfuscation transformations are classified by the types of source code objects that they target. There are four basic types: layout, data, control and preventive transformations Layout Transformations Layout transformations affect the layout of the program code. Information that is unnecessary to the execution of the program, such as identifier names and comments, is altered [34] Changing Formatting The first transformation removes the source code formatting information sometimes available in Java class files. This is a one-way transformation because once the original formatting is gone it cannot be recovered; it is a transformation with low potency, because there is very little semantic content in formatting, and no great confusion is introduced when that information is removed; finally, this is a free transformation since the space and time complexity of the application is not affected. ([8] p.10) Scrambling Identifier Names Scrambling identifier names is also a one-way and free transformation ([8] p.10). However, it has medium potency as identifiers contain a great deal of pragmatic information ([8] p.10) Removing Comments Removing comments is also one-way and free, but it has high potency, as the comments contain information that greatly eases the understanding of the code; without comments the code will be much harder to understand ([8] p.30) Data Transformations Data transformations change the data structures of the program code ([34], [8] p.17) Data Storage 20

29 There is usually a natural way to store a particular data item in a program, e.g. a local integer variable would be preferable as an iteration variable for iteration through the elements of an array ([8] p.17). While other variable types are possible, they would be less natural and probably less efficient ([8] p.17). Data storage obfuscation affects how data is stored in memory. For example a local variable can be converted into a global one [34]. There are a number of simple storage transformations that promote variables from a specialised storage class to a more general class. Their potency and resilience are generally low, but used in conjunction with other transformations they can be quite effective ([8] p.18). Another data storage obfuscation technique is to convert static data into procedural data as static data contain much useful pragmatic information to a reverse engineer ([8] p.18). A simple way of obfuscating a static string is to convert it into a program that produces the string ([8] p.18). The potency, resilience and cost of this type of transformation depend on the complexity of the string generation function ([8] p.30) Data Encoding Data encoding obfuscation affect how the stored data is interpreted [34] by selecting unnatural encoding for common data types ([8] p.17). Figure 5.6 gives an example in which an integer variable is replaced by a simple encoding function. Before After int i = 1; int i=11; while (i < 1000) while (i < 8003)... A[I]...;... A[(i-3)/8]...; i ++; i += 8; Figure 5.6: Data encoding obfuscation in which i is replaced by 8 * i + 3 [34] There will be a trade-off between resilience and potency on one hand and cost on the other. A simple encoding function such as the one above will add little extra execution time but can be deobfuscated using common compiler analysis techniques ([8] p.18). Boolean variables and other variables of restricted range can be split into two or more variables. The potency, resilience and cost of this transformation all grow with the number of variables into which the original variable is split ([8] p.18). The resilience can be further enhanced via the implementation of algorithms in the obfuscated application that construct the run-time look-up tables ([8] p.18). 21

30 Data Aggregation Data aggregation obfuscation alters how data is grouped together [34]. Some aggregation transformations merge two or more scalar variables into one variable ([8] p.19). This transformation has weak resilience as a deobfuscator only needs to examine the set of arithmetic operations being applied to a particular variable in order to guess that it actually consists of two merged variables ([8] p.19). The transformation also has low potency and free cost ([8] p.30). Other transformations restructure arrays by either splitting an array into several sub-arrays, merging two or more arrays into one array, folding an array (increasing the number of dimensions) and flattening an array (decreasing the number of dimensions) ([8] p.20). The potency of these transformations depends on the extent to which the arrays in question are transformed ([8] p.21). However, they have weak resilience and free cost (except for folding which has cheap cost as this transformation is more complicated than the others) ([8] p.30). According to metric µ 7b and µ 7b (Appendix B, [8] p.8), the complexity of a class grows with its depth in the inheritance hierarchy and the number of its direct descendants ([8] p.21). The complexity of a class can be increased, either by splitting up the class or inserting a new bogus class. Splitting up a class has low resilience as a deobfuscator can simply merge the classes together to get the original one. The resilience of bogus class insertion depends on the number of new classes and the increase in the depth of the inheritance hierarchy tree ([8] p.30). Both transformations have medium potency and free cost Data Ordering Data ordering obfuscation changes how data is ordered [34]. Programmers tend to organise their source code to maximise its locality. The idea is that a program is easier to read and understand if two items that are logically related are also physically close in the source text. This kind of locality works on every level of the source [all] kinds of spatial locality can provide useful clues to a reverse engineer ([8] p.16). It is therefore useful to randomise the order of declarations in the source application, particularly the order of methods and instance variables within classes and formal parameters within methods ([8] p.21). In many cases it will also be possible to reorder the elements within an array. Simply put, we provide an opaque encoding function f(i) which maps the i th element in the original array into its new position of the reordered array ([8] p.21). 22

1 The Java Virtual Machine

1 The Java Virtual Machine 1 The Java Virtual Machine About the Spec Format This document describes the Java virtual machine and the instruction set. In this introduction, each component of the machine is briefly described. This

More information

Code Obfuscation. Mayur Kamat Nishant Kumar

Code Obfuscation. Mayur Kamat Nishant Kumar Code Obfuscation Mayur Kamat Nishant Kumar Agenda Malicious Host Problem Code Obfuscation Watermarking and Tamper Proofing Market solutions Traditional Network Security Problem Hostile Network Malicious

More information

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

More information

Chapter 7D The Java Virtual Machine

Chapter 7D The Java Virtual Machine This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly

More information

University of Twente. A simulation of the Java Virtual Machine using graph grammars

University of Twente. A simulation of the Java Virtual Machine using graph grammars University of Twente Department of Computer Science A simulation of the Java Virtual Machine using graph grammars Master of Science thesis M. R. Arends, November 2003 A simulation of the Java Virtual Machine

More information

Habanero Extreme Scale Software Research Project

Habanero Extreme Scale Software Research Project Habanero Extreme Scale Software Research Project Comp215: Java Method Dispatch Zoran Budimlić (Rice University) Always remember that you are absolutely unique. Just like everyone else. - Margaret Mead

More information

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program CS 2112 Lecture 27 Interpreters, compilers, and the Java Virtual Machine 1 May 2012 Lecturer: Andrew Myers 1 Interpreters vs. compilers There are two strategies for obtaining runnable code from a program

More information

Implementation of an Obfuscation Tool for C/C++ Source Code Protection on the XScale Architecture *

Implementation of an Obfuscation Tool for C/C++ Source Code Protection on the XScale Architecture * Implementation of an Obfuscation Tool for C/C++ Source Code Protection on the XScale Architecture * Seongje Cho, Hyeyoung Chang, and Yookun Cho 1 Dept. of Computer Science & Engineering, Dankook University,

More information

C Compiler Targeting the Java Virtual Machine

C Compiler Targeting the Java Virtual Machine C Compiler Targeting the Java Virtual Machine Jack Pien Senior Honors Thesis (Advisor: Javed A. Aslam) Dartmouth College Computer Science Technical Report PCS-TR98-334 May 30, 1998 Abstract One of the

More information

The Java Virtual Machine (JVM) Pat Morin COMP 3002

The Java Virtual Machine (JVM) Pat Morin COMP 3002 The Java Virtual Machine (JVM) Pat Morin COMP 3002 Outline Topic 1 Topic 2 Subtopic 2.1 Subtopic 2.2 Topic 3 2 What is the JVM? The JVM is a specification of a computing machine Instruction set Primitive

More information

- Applet java appaiono di frequente nelle pagine web - Come funziona l'interprete contenuto in ogni browser di un certo livello? - Per approfondire

- Applet java appaiono di frequente nelle pagine web - Come funziona l'interprete contenuto in ogni browser di un certo livello? - Per approfondire - Applet java appaiono di frequente nelle pagine web - Come funziona l'interprete contenuto in ogni browser di un certo livello? - Per approfondire il funzionamento della Java Virtual Machine (JVM): -

More information

Obfuscation: know your enemy

Obfuscation: know your enemy Obfuscation: know your enemy Ninon EYROLLES neyrolles@quarkslab.com Serge GUELTON sguelton@quarkslab.com Prelude Prelude Plan 1 Introduction What is obfuscation? 2 Control flow obfuscation 3 Data flow

More information

Glossary of Object Oriented Terms

Glossary of Object Oriented Terms Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction

More information

Java Application Developer Certificate Program Competencies

Java Application Developer Certificate Program Competencies Java Application Developer Certificate Program Competencies After completing the following units, you will be able to: Basic Programming Logic Explain the steps involved in the program development cycle

More information

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. buford@alum.mit.edu Oct 2003 Presented to Gordon College CS 311

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. buford@alum.mit.edu Oct 2003 Presented to Gordon College CS 311 The Java Virtual Machine and Mobile Devices John Buford, Ph.D. buford@alum.mit.edu Oct 2003 Presented to Gordon College CS 311 Objectives Review virtual machine concept Introduce stack machine architecture

More information

Java Interview Questions and Answers

Java Interview Questions and Answers 1. What is the most important feature of Java? Java is a platform independent language. 2. What do you mean by platform independence? Platform independence means that we can write and compile the java

More information

02 B The Java Virtual Machine

02 B The Java Virtual Machine 02 B The Java Virtual Machine CS1102S: Data Structures and Algorithms Martin Henz January 22, 2010 Generated on Friday 22 nd January, 2010, 09:46 CS1102S: Data Structures and Algorithms 02 B The Java Virtual

More information

picojava TM : A Hardware Implementation of the Java Virtual Machine

picojava TM : A Hardware Implementation of the Java Virtual Machine picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer

More information

Lecture 12: Software protection techniques. Software piracy protection Protection against reverse engineering of software

Lecture 12: Software protection techniques. Software piracy protection Protection against reverse engineering of software Lecture topics Software piracy protection Protection against reverse engineering of software Software piracy Report by Business Software Alliance for 2001: Global economic impact of software piracy was

More information

Java Programming. Binnur Kurt binnur.kurt@ieee.org. Istanbul Technical University Computer Engineering Department. Java Programming. Version 0.0.

Java Programming. Binnur Kurt binnur.kurt@ieee.org. Istanbul Technical University Computer Engineering Department. Java Programming. Version 0.0. Java Programming Binnur Kurt binnur.kurt@ieee.org Istanbul Technical University Computer Engineering Department Java Programming 1 Version 0.0.4 About the Lecturer BSc İTÜ, Computer Engineering Department,

More information

Java 6 'th. Concepts INTERNATIONAL STUDENT VERSION. edition

Java 6 'th. Concepts INTERNATIONAL STUDENT VERSION. edition Java 6 'th edition Concepts INTERNATIONAL STUDENT VERSION CONTENTS PREFACE vii SPECIAL FEATURES xxviii chapter i INTRODUCTION 1 1.1 What Is Programming? 2 J.2 The Anatomy of a Computer 3 1.3 Translating

More information

Computing Concepts with Java Essentials

Computing Concepts with Java Essentials 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. Computing Concepts with Java Essentials 3rd Edition Cay Horstmann

More information

Software Protection through Code Obfuscation

Software Protection through Code Obfuscation Software Protection through Code Obfuscation Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Aniket Kulkarni Roll No: 121022016

More information

Moving from CS 61A Scheme to CS 61B Java

Moving from CS 61A Scheme to CS 61B Java Moving from CS 61A Scheme to CS 61B Java Introduction Java is an object-oriented language. This document describes some of the differences between object-oriented programming in Scheme (which we hope you

More information

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner. Handout 1 CS603 Object-Oriented Programming Fall 15 Page 1 of 11 Handout 1 Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner. Java

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

DATA OBFUSCATION. What is data obfuscation?

DATA OBFUSCATION. What is data obfuscation? DATA OBFUSCATION What data obfuscation? Data obfuscations break the data structures used in the program and encrypt literals. Th method includes modifying inheritance relations, restructuring arrays, etc.

More information

Hardware/Software Co-Design of a Java Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada ken@csc.uvic.ca Micaela Serra University of Victoria

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection

More information

Stack Allocation. Run-Time Data Structures. Static Structures

Stack Allocation. Run-Time Data Structures. Static Structures Run-Time Data Structures Stack Allocation Static Structures For static structures, a fixed address is used throughout execution. This is the oldest and simplest memory organization. In current compilers,

More information

Chapter 1. Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages

Chapter 1. Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming

More information

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz 2007 03 16

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz 2007 03 16 Advanced compiler construction Michel Schinz 2007 03 16 General course information Teacher & assistant Course goals Teacher: Michel Schinz Michel.Schinz@epfl.ch Assistant: Iulian Dragos INR 321, 368 64

More information

Code Obfuscation Literature Survey

Code Obfuscation Literature Survey Code Obfuscation Literature Survey Arini Balakrishnan, Chloe Schulze CS701 Construction of Compilers, Instructor: Charles Fischer Computer Sciences Department University of Wisconsin, Madison December

More information

Applications of obfuscation to software and hardware systems

Applications of obfuscation to software and hardware systems Applications of obfuscation to software and hardware systems Victor P. Ivannikov Institute for System Programming Russian Academy of Sciences (ISP RAS) www.ispras.ru Program obfuscation is an efficient

More information

Cloud Computing. Up until now

Cloud Computing. Up until now Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines

More information

Jonathan Worthington Scarborough Linux User Group

Jonathan Worthington Scarborough Linux User Group Jonathan Worthington Scarborough Linux User Group Introduction What does a Virtual Machine do? Hides away the details of the hardware platform and operating system. Defines a common set of instructions.

More information

Chapter 5 Names, Bindings, Type Checking, and Scopes

Chapter 5 Names, Bindings, Type Checking, and Scopes Chapter 5 Names, Bindings, Type Checking, and Scopes Chapter 5 Topics Introduction Names Variables The Concept of Binding Type Checking Strong Typing Scope Scope and Lifetime Referencing Environments Named

More information

The C Programming Language course syllabus associate level

The C Programming Language course syllabus associate level TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming

More information

SmartArrays and Java Frequently Asked Questions

SmartArrays and Java Frequently Asked Questions SmartArrays and Java Frequently Asked Questions What are SmartArrays? A SmartArray is an intelligent multidimensional array of data. Intelligent means that it has built-in knowledge of how to perform operations

More information

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives Introduction to Programming and Algorithms Module 1 CS 146 Sam Houston State University Dr. Tim McGuire Module Objectives To understand: the necessity of programming, differences between hardware and software,

More information

Replication on Virtual Machines

Replication on Virtual Machines Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism

More information

Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI)

Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI) Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI) SYSTEM V APPLICATION BINARY INTERFACE Motorola M68HC05, M68HC08, M68HC11, M68HC12, and M68HC16 Processors Supplement Version 2.0

More information

Informatica e Sistemi in Tempo Reale

Informatica e Sistemi in Tempo Reale Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)

More information

1/20/2016 INTRODUCTION

1/20/2016 INTRODUCTION INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We

More information

Software Code Protection Through Software Obfuscation

Software Code Protection Through Software Obfuscation Software Code Protection Through Software Obfuscation Presented by: Sabu Emmanuel, PhD School of Computer Engineering Nanyang Technological University, Singapore E-mail: asemmanuel@ntu.edu.sg 20, Mar,

More information

Fundamentals of Java Programming

Fundamentals of Java Programming Fundamentals of Java Programming This document is exclusive property of Cisco Systems, Inc. Permission is granted to print and copy this document for non-commercial distribution and exclusive use by instructors

More information

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program. Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to

More information

Crash Course in Java

Crash Course in Java Crash Course in Java Based on notes from D. Hollinger Based in part on notes from J.J. Johns also: Java in a Nutshell Java Network Programming and Distributed Computing Netprog 2002 Java Intro 1 What is

More information

LASTLINE WHITEPAPER. Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade

LASTLINE WHITEPAPER. Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade LASTLINE WHITEPAPER Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade Abstract Malicious code is an increasingly important problem that threatens the security of computer systems. The

More information

Java's garbage-collected heap

Java's garbage-collected heap Sponsored by: This story appeared on JavaWorld at http://www.javaworld.com/javaworld/jw-08-1996/jw-08-gc.html Java's garbage-collected heap An introduction to the garbage-collected heap of the Java

More information

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages ICOM 4036 Programming Languages Preliminaries Dr. Amirhossein Chinaei Dept. of Electrical & Computer Engineering UPRM Spring 2010 Language Evaluation Criteria Readability: the ease with which programs

More information

Introduction to Program Obfuscation

Introduction to Program Obfuscation Introduction to Program Obfuscation p. 1/26 Introduction to Program Obfuscation Yury Lifshits Saint-Petersburg State University http://logic.pdmi.ras.ru/ yura/ yura@logic.pdmi.ras.ru Introduction to Program

More information

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T) Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating

More information

qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq

qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq Introduction to Programming using Java wertyuiopasdfghjklzxcvbnmqwertyui

More information

AP Computer Science Java Subset

AP Computer Science Java Subset APPENDIX A AP Computer Science Java Subset The AP Java subset is intended to outline the features of Java that may appear on the AP Computer Science A Exam. The AP Java subset is NOT intended as an overall

More information

SOURCE CODE OBFUSCATION BY MEAN OF EVOLUTIONARY ALGORITHMS

SOURCE CODE OBFUSCATION BY MEAN OF EVOLUTIONARY ALGORITHMS SOURCE CODE OBFUSCATION BY MEAN OF EVOLUTIONARY ALGORITHMS Sébastien Martinez 2011 Tutor : Sébastien Varrette Advisor : Benoît Bertholon University of Luxembourg, Faculty of Sciences, Technologies and

More information

Java Virtual Machine, JVM

Java Virtual Machine, JVM Java Virtual Machine, JVM a Teodor Rus rus@cs.uiowa.edu The University of Iowa, Department of Computer Science a These slides have been developed by Teodor Rus. They are copyrighted materials and may not

More information

An evaluation of the Java Card environment

An evaluation of the Java Card environment An evaluation of the Java Card environment Christophe Rippert, Daniel Hagimont Contact: Christophe Rippert, Sirac Laboratory INRIA Rhône-Alpes, 655 avenue de l Europe Montbonnot 38334 St Ismier Cedex,

More information

Pemrograman Dasar. Basic Elements Of Java

Pemrograman Dasar. Basic Elements Of Java Pemrograman Dasar Basic Elements Of Java Compiling and Running a Java Application 2 Portable Java Application 3 Java Platform Platform: hardware or software environment in which a program runs. Oracle

More information

1. Overview of the Java Language

1. Overview of the Java Language 1. Overview of the Java Language What Is the Java Technology? Java technology is: A programming language A development environment An application environment A deployment environment It is similar in syntax

More information

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva robert.geva@intel.com Introduction Intel

More information

Comp 411 Principles of Programming Languages Lecture 34 Semantics of OO Languages. Corky Cartwright Swarat Chaudhuri November 30, 20111

Comp 411 Principles of Programming Languages Lecture 34 Semantics of OO Languages. Corky Cartwright Swarat Chaudhuri November 30, 20111 Comp 411 Principles of Programming Languages Lecture 34 Semantics of OO Languages Corky Cartwright Swarat Chaudhuri November 30, 20111 Overview I In OO languages, data values (except for designated non-oo

More information

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser) High-Level Programming Languages Nell Dale & John Lewis (adaptation by Michael Goldwasser) Low-Level Languages What are disadvantages of low-level languages? (e.g., machine code or assembly code) Programming

More information

This section describes how LabVIEW stores data in memory for controls, indicators, wires, and other objects.

This section describes how LabVIEW stores data in memory for controls, indicators, wires, and other objects. Application Note 154 LabVIEW Data Storage Introduction This Application Note describes the formats in which you can save data. This information is most useful to advanced users, such as those using shared

More information

General Introduction

General Introduction Managed Runtime Technology: General Introduction Xiao-Feng Li (xiaofeng.li@gmail.com) 2012-10-10 Agenda Virtual machines Managed runtime systems EE and MM (JIT and GC) Summary 10/10/2012 Managed Runtime

More information

Semester Review. CSC 301, Fall 2015

Semester Review. CSC 301, Fall 2015 Semester Review CSC 301, Fall 2015 Programming Language Classes There are many different programming language classes, but four classes or paradigms stand out:! Imperative Languages! assignment and iteration!

More information

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing. CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let s review a few relevant hardware definitions:

More information

An Overview of Stack Architecture and the PSC 1000 Microprocessor

An Overview of Stack Architecture and the PSC 1000 Microprocessor An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which

More information

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science updated 03/08/2012 Unit 1: JKarel 8 weeks http://www.fcps.edu/is/pos/documents/hs/compsci.htm

More information

Start Oracle Insurance Policy Administration. Activity Processing. Version 9.2.0.0.0

Start Oracle Insurance Policy Administration. Activity Processing. Version 9.2.0.0.0 Start Oracle Insurance Policy Administration Activity Processing Version 9.2.0.0.0 Part Number: E16287_01 March 2010 Copyright 2009, Oracle and/or its affiliates. All rights reserved. This software and

More information

Secure Authentication and Session. State Management for Web Services

Secure Authentication and Session. State Management for Web Services Lehman 0 Secure Authentication and Session State Management for Web Services Clay Lehman CSC 499: Honors Thesis Supervised by: Dr. R. Michael Young Lehman 1 1. Introduction Web services are a relatively

More information

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam. Compilers Spring term Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.es Lecture 1 to Compilers 1 Topic 1: What is a Compiler? 3 What is a Compiler? A compiler is a computer

More information

IT UNIVERSITY OF COPENHAGEN. Abstract. Department of Software Development and Technology (SDT) Master s Thesis. Generic deobfuscator for Java

IT UNIVERSITY OF COPENHAGEN. Abstract. Department of Software Development and Technology (SDT) Master s Thesis. Generic deobfuscator for Java IT UNIVERSITY OF COPENHAGEN Abstract Department of Software Development and Technology (SDT) Master s Thesis Generic deobfuscator for Java by Mikkel B. Nielsen Obfuscation is a tool used to enhance the

More information

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1 The Role of Programming in Informatics Curricula A. J. Cowling Department of Computer Science University of Sheffield Structure of Presentation Introduction The problem, and the key concepts. Dimensions

More information

Attacking Obfuscated Code with IDA Pro. Chris Eagle

Attacking Obfuscated Code with IDA Pro. Chris Eagle Attacking Obfuscated Code with IDA Pro Chris Eagle Outline Introduction Operation Demos Summary 2 First Order Of Business MOVE UP AND IN! There is plenty of room up front I can't increase the font size

More information

Basic Programming and PC Skills: Basic Programming and PC Skills:

Basic Programming and PC Skills: Basic Programming and PC Skills: Texas University Interscholastic League Contest Event: Computer Science The contest challenges high school students to gain an understanding of the significance of computation as well as the details of

More information

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

KITES TECHNOLOGY COURSE MODULE (C, C++, DS) KITES TECHNOLOGY 360 Degree Solution www.kitestechnology.com/academy.php info@kitestechnology.com technologykites@gmail.com Contact: - 8961334776 9433759247 9830639522.NET JAVA WEB DESIGN PHP SQL, PL/SQL

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Hypercosm. Studio. www.hypercosm.com

Hypercosm. Studio. www.hypercosm.com Hypercosm Studio www.hypercosm.com Hypercosm Studio Guide 3 Revision: November 2005 Copyright 2005 Hypercosm LLC All rights reserved. Hypercosm, OMAR, Hypercosm 3D Player, and Hypercosm Studio are trademarks

More information

The programming language C. sws1 1

The programming language C. sws1 1 The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan

More information

Surreptitious Software

Surreptitious Software Surreptitious Software Obfuscation, Watermarking, and Tamperproofing for Software Protection Christian Collberg Jasvir Nagra rw T Addison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco

More information

Can You Trust Your JVM Diagnostic Tools?

Can You Trust Your JVM Diagnostic Tools? Can You Trust Your JVM Diagnostic Tools? Isaac Sjoblom, Tim S. Snyder, and Elena Machkasova Computer Science Discipline University of Minnesota Morris Morris, MN 56267 sjobl014@umn.edu, snyde479@umn.edu,

More information

NFS File Sharing. Peter Lo. CP582 Peter Lo 2003 1

NFS File Sharing. Peter Lo. CP582 Peter Lo 2003 1 NFS File Sharing Peter Lo CP582 Peter Lo 2003 1 NFS File Sharing Summary Distinguish between: File transfer Entire file is copied to new location FTP Copy command File sharing Multiple users can access

More information

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

An Introduction to Assembly Programming with the ARM 32-bit Processor Family An Introduction to Assembly Programming with the ARM 32-bit Processor Family G. Agosta Politecnico di Milano December 3, 2011 Contents 1 Introduction 1 1.1 Prerequisites............................. 2

More information

Semantic Analysis: Types and Type Checking

Semantic Analysis: Types and Type Checking Semantic Analysis Semantic Analysis: Types and Type Checking CS 471 October 10, 2007 Source code Lexical Analysis tokens Syntactic Analysis AST Semantic Analysis AST Intermediate Code Gen lexical errors

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

Storing Measurement Data

Storing Measurement Data Storing Measurement Data File I/O records or reads data in a file. A typical file I/O operation involves the following process. 1. Create or open a file. Indicate where an existing file resides or where

More information

The Real Challenges of Configuration Management

The Real Challenges of Configuration Management The Real Challenges of Configuration Management McCabe & Associates Table of Contents The Real Challenges of CM 3 Introduction 3 Parallel Development 3 Maintaining Multiple Releases 3 Rapid Development

More information

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing The scanner (or lexical analyzer) of a compiler processes the source program, recognizing

More information

Lumousoft Visual Programming Language and its IDE

Lumousoft Visual Programming Language and its IDE Lumousoft Visual Programming Language and its IDE Xianliang Lu Lumousoft Inc. Waterloo Ontario Canada Abstract - This paper presents a new high-level graphical programming language and its IDE (Integration

More information

CPU Organization and Assembly Language

CPU Organization and Assembly Language COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:

More information

Introduction to Java

Introduction to Java Introduction to Java The HelloWorld program Primitive data types Assignment and arithmetic operations User input Conditional statements Looping Arrays CSA0011 Matthew Xuereb 2008 1 Java Overview A high

More information

Variables, Constants, and Data Types

Variables, Constants, and Data Types Variables, Constants, and Data Types Primitive Data Types Variables, Initialization, and Assignment Constants Characters Strings Reading for this class: L&L, 2.1-2.3, App C 1 Primitive Data There are eight

More information

The Hotspot Java Virtual Machine: Memory and Architecture

The Hotspot Java Virtual Machine: Memory and Architecture International Journal of Allied Practice, Research and Review Website: www.ijaprr.com (ISSN 2350-1294) The Hotspot Java Virtual Machine: Memory and Architecture Prof. Tejinder Singh Assistant Professor,

More information

Sources: On the Web: Slides will be available on:

Sources: On the Web: Slides will be available on: C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,

More information

Evolution of the Major Programming Languages

Evolution of the Major Programming Languages 142 Evolution of the Major Programming Languages Object Oriented Programming: Smalltalk Object-Oriented: It s fundamental characteristics are: Data abstraction, Inheritance and Dynamic Binding. The essence

More information

1.2 Using the GPG Gen key Command

1.2 Using the GPG Gen key Command Creating Your Personal Key Pair GPG uses public key cryptography for encrypting and signing messages. Public key cryptography involves your public key which is distributed to the public and is used to

More information

Java Card. Smartcards. Demos. . p.1/30

Java Card. Smartcards. Demos. . p.1/30 . p.1/30 Java Card Smartcards Java Card Demos Smart Cards. p.2/30 . p.3/30 Smartcards Credit-card size piece of plastic with embedded chip, for storing & processing data Standard applications bank cards

More information