Lecture 12: Software protection techniques. Software piracy protection Protection against reverse engineering of software

Lecture topics Software piracy protection Protection against reverse engineering of software Software piracy Report by Business Software Alliance for 2001: Global economic impact of software piracy was $11bln About 40% of commercial software in use is pirated Study included 85 countries Top offenders (percent of pirated software): Vietnam 94% China 92% Indonesia 88% Ukraine 87% Russia 87% Based on tracking 26 popular business applications

What can software companies do to prevent software piracy Ultimately, not a whole lot, for mainstream software A determined attacker can deal with (reverse engineer) binary code Bypass protection mechanisms Protection mechanisms that can be used: License keys License files CDs, floppies Special-purpose dongles Code encryption Application server model License keys Typical use: Encrypt a unique string to obtain a key Require a user to enter the key (e.g. during installation) When a key is entered, decrypt and compare to the original string Key minting if the encryption key is stored somewhere in the code Use of digital signatures can help somewhat Reverse engineering --- an attacker finds the code that checks for the key and removes it

License files Use similar to license keys, but license files contain more information Usually something specific to the user Can be used for giving temporary licenses The file will store expiration dates Can be used for enabling only certain features in the software The file will identify these features The software may check with the license file every time a particular functionality is requested Digitally signed license files are typical The public key is embedded in software code Changing the system clock to extend temporary licenses Reinstalling the system when the license expires Reverse engineering License servers Used in networking environments to service multiple installations of the software E.g., a floating license may limit the number of concurrent users to 10; when a user starts the software, it obtains a run token from the server; when the user exits the software, the token is returned FlexLM is a popular commercial license server Usability (organization as a whole) Reverse engineering

Challenge-based license schemes The software issues challenges to the user The user has to respond correctly E.g. with information available in the documentation Usability Users can share challenge secrets Reverse engineering CDs and floppies Part of code of the system can be placed on a removable disk Can only run if the disk is present Popular for games What about CD-burning? Macrovision SafeDisk: store a key on a CD in a way that a typical CD writer won t duplicate this key Again, reverse engineering

Special-purpose dongles A dongle is a hardware device that connects to a computer port and carries some information Can be used to store code or keys used by the software Expensive User-unfriendly Reverse engineering Code encryption Software code is stored on the disk encrypted Decrypted right before the code has to run Encrypted again after the code finishes running Computationally expensive The problem of key distribution and storage At some point, code is in unencrypted form; an attacker can intercept it at this point Possible solutions: Execute-only memory (XOM): contains code that is executed but cannot be viewed Dedicated cryptographic hardware Trusted Computing Platform Alliance The goal is definition of specifications for a hardware-assisted, OS based, trusted subsystem that will become an integral part of personal computing platforms Relies on public key cryptography and infrastructure Secure storage, trusted paths within the system, security co-processor Far from clear that this will succeed

Application server model Do not give software code to end users; run this code from a trusted server Performance Scalability Cost Software aging A radical approach Relies on periodic updates of the software Each update is done in a way that makes older versions little usable E.g. using incompatible file formats A software pirate will be forced to provide his/her customers with frequent updates Easier to catch Cryptographic techniques can be used to ensure that older versions cannot use data from newer versions Inconvenience of frequent updates Can be automated to a large degree Sharing of data becomes dependent on everyone having an up-to-date version Is not applicable to all domains May work well with Microsoft Word, but not single-user games

So, protecting against piracy is difficult, because of reverse engineering Reverse engineering is the process of understanding the purpose and function of a software program from its code Illegal reverse engineering is harmful Intellectual property theft Illegal alteration of software functionality Theft of security sensitive information embedded in the program code Modern high-level languages make reverse engineering easy Scripting languages often do not have compiled form Java bytecodes are high-level A number of automated reverse engineering tools are widely available For Java: Mocha, Jad, Soot Tamper-proofing Techniques for software tamper-proofing generally check if (a part of) the software has been modified and if yes, don t run it The basic techniques include computing checksums and checking timestamps Problem: reverse-engineering attacks! Guards (Chang and Atallah) Each guard is code that performs a small check A large number of guards is created for a program The guards are inter-related Cover overlapping portions of code Cover other guards If some guards are removed, other guards will likely detect that A tool based on this technique is marketed by Arxan

Tamper-proofing (cont.) Checking intermediate program results E.g., it is known that variable j has to be positive at a specific point in the program Insert a check that fails the program if the value of j is negative or zero Related to the technique of assertions in software quality The reason for an unexpected value can be bug, not tampering The program is not able to fail gracefully Performance hit if the number of checks is large Difficult to automate There is no guarantee that, after tampering, the program will always produce an invalid intermediate result Also, tampering may be detected when it s too late Software obfuscation A necessary component of reverse engineering is understanding how the code works By making code difficult to understand, reverse engineering may be made uneconomical Obfuscation techniques obscure program code Program functionality has to remain unchanged Obfuscations should: Make code more difficult to understand by manual inspection Be impossible or hard to reverse using automated tools Be stealthy --- look similar to surrounding code Have low overheads --- not to slow down the program overmuch or take much more memory to run Obfuscations are also important for protecting other software protection mechanisms

Existing obfuscation techniques Layout obfuscating transformations Comments are removed Line delimiters are removed Identifiers are scrambled Data obfuscating transformations Splitting and merging variables and arrays Re-ordering elements in arrays Converting static data into functions Control obfuscating transformations Inlining methods Outlining statements Unrolling loops Reordering expressions and statements Inserting irrelevant code All existing techniques are low-level, ignore design issues Idea In object-oriented programming, design is represented largely by decomposition into classes Scramble this decomposition! Design obfuscations --- Sosonkin, Naumovich, Memon Class1 Class2 Class123 Class41 Class3 Class4 Class51 Class5 Class6

OO design obfuscations Class coalescing Several classes in the original program are replaced with one class Class splitting A single class in the original program is replaced with a number of new classes Interfacification A number of new light-weight types created to obscure places where a specific class is used Class coalescing Car -int id +int getid() PersonalCar + PersonalCar() +Person getowner() Truck -double capacity + Truck(double capacity) +double getcapacity() Bus -int capacity + Bus(int capacity) +int getcapacity() Truck truck = new Truck(3.5); truck.getcapacity() ObfuscatedCar car = new ObfuscatedCar(3.5, 14); car.getcapacity1() ObfuscatedCar -int id -double capacity1 -int capacity2 + ObfuscatedCar(double cap1, int cap2) +Person getowner() +double getcapacity1() +int getcapacity2()

Class splitting Truck -double capacity -int numbercylinders + Truck(double capacity) +double getcapacity() +int getnumbercylinders() +double getmaxweight() Truck truck = new Truck(3.5); truck.getcapacity() C1 -double capacity + C1(double capacity) +double getcapacity() C1 car = new C2(3.5); car.getcapacity1() C2 -int numbercylinders + C2(double capacity) +int getnumbercylinders() +double getmaxweight() Interfacification interface I1 interface I2 interface I3 +double getcapacity() +int getnumbercylinders() +double getmaxweight() Truck -double capacity -int numbercylinders + Truck(double capacity) +double getcapacity() +int getnumbercylinders() +double getmaxweight() Truck -double capacity -int numbercylinders + Truck(double capacity) +double getcapacity() +int getnumbercylinders() +double getmaxweight() Truck truck = new Truck(3.5); truck.getcapacity() I3 car = new Truck(3.5); ((I1)car).getCapacity1()

Experimental data: class coalescing Experimental data: class splitting

Experimental data: interfacification Software watermarking When everything else fails Watermarking is commonly used for proving authenticity of physical objects Also used in digital media and hardware to prove ownership Can be applied to software code It should be possible to show in the court of law who is the rightful owner of the software Software watermarks should be stealthy, encode enough data, and not increase resource requirements too much Should be resilient to different types of attacks: Removal Distortion Second watermark Fingerprinting is a related mechanism Like watermarking, but different data for different versions of the software

Types of software watermarks Static Stored in the program executable, object, or source code Static data watermarks E.g., store a copyright string as a static class field Static code watermarks Use redundancy --- when it s possible to do something in many different ways, do it in one specific way E.g. if two adjacent statements are independent, they can appear in arbitrary order; always put them in lexicographic order Dynamic Stored in the execution state of the program Easter Egg watermarks --- use some very unusual input to produce identification of ownership Data structure watermarks --- embed a message in the dynamic state of the program (e.g. using object references) Execution trace watermarks --- usually embedded as a statistical property E.g. some constraint on the use of registers, when the program executes on some unusual input Watermarking instruction group frequency Static code watermark Applied to assembly and machine code Select some commonly occurring groups of instructions, count the number of their occurrences in the code Add redundant code in such a way that the counts are distributed in a seemingly random but controlled fashion De-compilation is likely to defeat this technique

Single procedure watermarking Static code watermark for Java A method in one of the program classes is created for watermarking purposes The method contains standard Java bytecodes Does not do anything useful except encoding the watemark Dynamic data structure watermarking Encode a watermark using object references A number of different possible encodings The watermark is constructed for a special input The watermark is demonstrated using either A debugger A special-purpose tool that can examine the run-time state of the program An important part is identification of the start point in the data structure encoding the watermark Improvement: use existing program data structures and add extra fields to encode watermarks