DATA OBFUSCATION. What is data obfuscation?



Similar documents
2) Write in detail the issues in the design of code generator.

Cyber Security Workshop Encryption Reference Manual

The Subnet Training Guide

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

1/20/2016 INTRODUCTION

CHAPTER 5. Obfuscation is a process of converting original data into unintelligible data. It

Code Obfuscation. Mayur Kamat Nishant Kumar

Vector storage and access; algorithms in GIS. This is lecture 6

NCPC 2013 Presentation of solutions

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Obfuscation: know your enemy

Assessment Plan for CS and CIS Degree Programs Computer Science Dept. Texas A&M University - Commerce

Implementation of an Obfuscation Tool for C/C++ Source Code Protection on the XScale Architecture *

OKLAHOMA SUBJECT AREA TESTS (OSAT )

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

COMPUTER SCIENCE TRIPOS

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Glossary of Object Oriented Terms

Secret Communication through Web Pages Using Special Space Codes in HTML Files

Hypercosm. Studio.

Module: Software Instruction Scheduling Part I

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

Persistent Binary Search Trees

2) What is the structure of an organization? Explain how IT support at different organizational levels.

XML Processing and Web Services. Chapter 17

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

The Real Challenges of Configuration Management

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

How To Encrypt With A 64 Bit Block Cipher

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

Architecture Design & Sequence Diagram. Week 7

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Static Analysis of Virtualization- Obfuscated Binaries

STEGANOGRAPHY: TEXT FILE HIDING IN IMAGE YAW CHOON KIT CA10022

50 Computer Science MI-SG-FLD050-02

Lab 2: Swat ATM (Machine (Machine))

Computer Networks. Network Security 1. Professor Richard Harris School of Engineering and Advanced Technology

Network Security. HIT Shimrit Tzur-David

Streaming Lossless Data Compression Algorithm (SLDC)

Hill s Cipher: Linear Algebra in Cryptography

Data Structures and Data Manipulation

Encoding Text with a Small Alphabet

Perfion Output Using Special Barcode fonts

Compiler-Assisted Binary Parsing

Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010

Secure Data transfer in Cloud Storage Systems using Dynamic Tokens.

C Compiler Targeting the Java Virtual Machine

Assessment for Master s Degree Program Fall Spring 2011 Computer Science Dept. Texas A&M University - Commerce

Code Obfuscation Literature Survey

SMALL INDEX LARGE INDEX (SILT)

Safer data transmission using Steganography

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

Secure Large-Scale Bingo

ML for the Working Programmer

Lecture 12: Software protection techniques. Software piracy protection Protection against reverse engineering of software

Optimization of SQL Queries in Main-Memory Databases

Chapter 6: Programming Languages

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

Regular Expressions and Automata using Haskell

HIGH DENSITY DATA STORAGE IN DNA USING AN EFFICIENT MESSAGE ENCODING SCHEME Rahul Vishwakarma 1 and Newsha Amiri 2

Introduction. What is RAID? The Array and RAID Controller Concept. Click here to print this article. Re-Printed From SLCentral

Authentication and Security in Mobile Phones

Surreptitious Software

Technical Investigation of Computational Resource Interdependencies

Requirements for a Long-term Viable, Archive Data Format

A block based storage model for remote online backups in a trust no one environment

Microsoft Office 2010: Access 2010, Excel 2010, Lync 2010 learning assets

Data Storage: Each time you create a variable in memory, a certain amount of memory is allocated for that variable based on its data type (or class).

Solutions to Problem Set 1

Moving from CS 61A Scheme to CS 61B Java

A Survey and Analysis of Solutions to the. Oblivious Memory Access Problem. Erin Elizabeth Chapman

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

SAS Data Set Encryption Options

Pre-lab Preparation: 1. Read thoroughly and prepare the experiment sheet. 2. You must bring a printed copy of this experiment with you to the lab.

The Noisy Query Layer: How Brands Can Avoid Chasing Their Tails

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

The Relational Data Model

Split Based Encryption in Secure File Transfer

File System Encryption in C#

National Frozen Foods Case Study

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

AP Computer Science AB Syllabus 1

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

1. What are Data Structures? Introduction to Data Structures. 2. What will we Study? CITS2200 Data Structures and Algorithms

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur

Transcription:

DATA OBFUSCATION What data obfuscation? Data obfuscations break the data structures used in the program and encrypt literals. Th method includes modifying inheritance relations, restructuring arrays, etc. Data obfuscations thoroughly change the data structure of a program. They make the obfuscated codes so complicated that it impossible to recreate the original source code. Data obfuscations operate on the data structures used in the program. Data storage obfuscations change the type of storage for variables. One example converting a local variable into a global variable. The obfuscator would ensure that different methods use the variable at different times but none of them use it at the same time. A data encoding obfuscation changes the way a program interprets stored data. For example, you can replace all references that initialize an index variable i by the expression 8*i+3. When the code needs to use the index value, the obfuscator inserts the expression (i- 3)/8. Finally, instead of incrementing the variable by one, you add eight to the value. Basically, the obfuscation scales and offsets the index from the desired value and only computes the real index when it's going to be used. A data aggregation obfuscation alters how data grouped together in memory. An example turning a 2D array into a 1D array or vice versa. The basic idea to change the familiar conceptual mapping to a less common, in-memory representation so that it's more difficult for a person to understand your algorithms. For example, a chessboard often modeled in a program as a matrix, but changing it to a one-dimensional array works just as well for the CPU. A data ordering obfuscation changes how data ordered. In C-based languages, it common to see the ith element of a collection of data accessed by indexing to position i in an array. A data ordering obfuscation would determine the index in the array of the data by calling some function f(i). Again, th simply rearranges the storage of information in a way that less closely models the normal conceptual model. 1 P a g e

Understanding a simple algorithm such as sorting elements of an array easy. Applying a simple data transformation on such algorithm can make it hard for someone to understand the code. We will apply a data transformations on the following piece of code: for(i=0;i<10;i++) for(j=i;j<10;j++) if(a[j]>a[i]) swap(a[i],a[j]); Aggregation The first data transformation we would like to dcuss restructuring arrays. Arrays can be split,merged, folded or flattened. We will merge two or more arrays into one: Applying th transformation to our example will force the attacker to evaluate details of the algorithm if he wants to understand it. The test and swap lines will be transformed into the next piece of code, assuming that a the array on the odd indices of the interleaved array. if(a[2j+1]>a[2i+1]) swap(a[2j+1],a[2i+1]); Finding similar transformations for arrays not hard and implementing them into the right tool neither. As it already difficult in TXL to get type information, it makes th data transformation impossible to apply in a safe way. E.g., modifying a datastructure, requires the location of every instance of that data structure. On a parse tree th non-trivial as the same name might be used in different scopes for different datastructures. While the parse tree does contain sufficient information to deduce the type of datastructure when, it a more straightforward to perform th on an intermediate representation which contains a symbol table. 2 P a g e

Ordering An obfuscation transformation which reorders arrays neither difficult in SUIF. A symbol table at our dposal so each pointer to the array known, which makes finding all accesses to the array straight forward. The indices used to access the array can be changed by a function mapping the original position i into its new position of the reordered array. The test and swap lines of our example will be changed into the next piece of code which will no longer order the array as in the original program. Although, all indices will be changed in the program, so the resulting code stays functionally equivalent with the original one. if(a[f(i)]>a[f(j)]) swap(a[f(i)],a[f(j)]); Storage and encoding Data flow optimizations such as common subexpression elimination and constant propagation are able to undo very trivial data obfuscations. For example when splitting constant 10 into subexpression 2+8, constant propagation will undo th transformation. Nontrivial data obfuscations such as these shown above always survive the compilation process because these transformations change the context of the program. While a compiler only has optimizing transformations at h dposal, he unable to undo such context changing data transformations. On the other hand variable splitting a deoptimization transformation and applying such transformation should take into account the optimizations performed by the compiler.we had a look at binary obfuscators and found out that no non-trivial data transformations were implemented. Only trivial data transformations such as constant splitting are implemented at binary level and without further obfuscation, an optimization run afterwards could remove these transformations. It not astonhing that binary obfuscators only contain trivial data transformations as the types of datastructures are lost during compilation. Passing extra information to do such transformations at a binary level feasible, but intensive and rather artificial if these transformations can be a source code level and afterwards survive the compiler optimizations. 3 P a g e

Why would you want to merely obfuscate data, rather than use a strong encryption algorithm? A good example would be an audit report on a medical system. Th report may be generated for an external auditor, and contain sensitive information. The auditor will be examining the report for information that indicates possible cases of fraud or abuse. Assume that management has required that Names, Social Security Numbers and other personal information should not be available to the auditor except on an as needed bas. The data needs to be presented to the auditor, but in a way that allows the examination of all data, so that patterns in the data may be detected. Encryption would be a poor choice in th case, as the data would be rendered into ASCII values outside of the range of normal ASCII characters. Th would be impossible to read. A better choice might be to obfuscate the data with a simple substitution cipher. While th not considered encryption, it may be suitable for th situation. When the auditor finds a possible case of abuse, he will need the real name and SSN of the party involved. He could obtain th by calling a customer service representative at the insurance company that supplied the report, and ask for the real information. The obfuscated data read to the customer service rep, who then inputs it into an application that supplies the real data. The importance of using pronounceable characters becomes very clear. Strong encryption would render th impossible. Here s some simple example code to do the obfuscation: create or replace package obfs function obfs( varchar2 in ) return varchar2; pragma restrict_references( obfs, WNPS, WNDS ); 4 P a g e

function unobfs( varchar2 in ) return varchar2; pragma restrict_references( unobfs, WNPS, WNDS ) create or replace package body obfs xlate_from varchar2(62) := 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ; xlate_to varchar2(62) := nopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklm ; function obfs ( clear_text_in varchar2 ) return varchar2 begin return translate( clear_text_in, xlate_from, xlate_to ); function unobfs ( obfs_text_in varchar2 ) return varchar2 begin return translate( obfs_text_in, xlate_to, xlate_from ); / Here some sample output: SSN OBFS SSN ---------- ---------- 540407786 srnrnuuvt 542800170 srpvnnoun 5 P a g e

542802063 srpvnpntq 541466830 srorttvqn As you can see, it wouldn t be very difficult to decipher th scheme given enough data. A somewhat more effective method involves chopping the text into segments and rearranging it as well as obfuscating it. Below some sample output from th algorithm. OBFS OBFS ---------- ---------- 540407786 &24B23B&Z 542800170-4B*23&&& 542802063-4Z&23-&_ 541466830 *2_423ZZ& While th still not encryption, th data would be more difficult to decipher without the key. Source code for th in PL/SQL available at the URL provided at the end of th article. Another way to hide sensitive data through masking. Th different from the previous example in that the clear text cannot be reconstructed from the dplayed data. Th useful in situations where it only necessary to dplay a portion of the data. A good case for th method the receipts printed at gas stations and convenience stores. When a purchase made with a credit card, the last 4 digits of the credit are often dplayed as clear text, while the rest of the credit card number has been masked with a series of X s. Slop n Slurp 1 Stop Shop 5/25/2000 8:53 P.M. Football Burrito 1 2.49 2.49 6 P a g e Premium Gasoline 12.5 1.699 21.24 ===== 23.73

Th method can also be used for reports where the person reading the report requires only a portion of the sensitive data. Th method also commonly used for the account numbers on printed transactions from ATM s. 7 P a g e