SADFE Proceedings of the 10th International Conference on Systematic Approaches to Digital Forensic Engineering
|
|
|
- Claud Bates
- 9 years ago
- Views:
Transcription
1 SADFE 2015 Proceedings of the 10th International Conference on Systematic Approaches to Digital Forensic Engineering Carsten Rudolph Nicolai Kuntze Barbara Endicott-Popovsky Antonio Maña
2 Editors Carsten Rudolph Monash University Melbourne, Victoria, Australia Nicolai Kuntze Huawei European Research Center Frankfurt Am Main Area, Germany Barbara Endicott-Popovsky University of Washington Seattle, WA, USA Antonio Maña University of Malaga Malaga, Spain Proceedings of the 10th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE 2015) ISBN: Safe Society Labs (Spain) Copyright remains with authors of each publication. Authors retain the right to reproduce, distribute, display, adapt and perform their own work for any purpose. The proceedings of SADFE 2015 conference are published by Safe Society Labs as open access, and licensed under a Creative Commons Attribution- NonCommercial 4.0 International License 1. Typeset & Cover Design: Hristo Koshutanski (Safe Society Labs) 1 1
3 Preface This volume constitutes the proceedings of the 10th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE 2015). Over the years, SADFE has been a venue that established new interdisciplinary relations and connections and has been the source of new initiatives and collaborations. One example of such an activity was the 2014 Dagstuhl Seminar "Digital Evidence and Forensic Readiness" with participants from 4 continents. This year, the SADFE steering committee took two risks. Most importantly, it is the first SADFE since 2007 that is not co-located with another event. Second, it is the first SADFE in Europe highlighting the necessity of international co-operation in the area of digital forensics. Nevertheless, SADFE will continue to have the character of a workshop. Single track, so that all participants share the same information and sufficient time and space for interaction and discussions. In response for the 2015 SADFE call for papers, 39 submissions from 16 different countries on 5 continents were received and reviewed. Of the papers submitted, 18 were accepted for presentation at the conference, of those 12 selected for publication in the Journal of Digital Forensics, Security and Law ( The program also included key-note talks by Michael M Losavio on "Smart Cities, Digital Forensics and Issues of Foundation and Ethics" and by Klaus Walker on "The careless application of digital evidence in German criminal proceedings". In addition, a panel on the topic of "Digital Forensics: Future Challenges for Security Forces and Government Agencies" was held with the participation of representatives from different law enforcement agencies from around the world, such as The Netherlands, UK, United Arab Emirates and Spain. Many people contributed to the organisation and preparation of this conference, including the program committee and the SADFE steering committee. A special thanks goes to the host and General Chair Antonio Maña. He took care of countless tasks including the overall organisation of the conference, the SADFE 2015 website, publication and proceedings, venue, social events, final program, and many others. SADFE 2015 would have been impossible without his commitment and experience. Last, but certainly not least, thanks go to all the authors who submitted papers and all the attendees. We hope this year's program will once again stimulate exchange and discussions beyond the conference, and we look forward to the next 10 years of SADFE. September 2015 Carsten Rudolph, Nicolai Kuntze, Barbara Endicott-Popovsky Program Co-chairs SADFE
4 Organization Steering Committee: Deborah Frincke, (Co-Chair), Department of Defense, USA Ming-Yuh Huang, (Co-Chair), Northwest Security Institute, USA Michael Losavio, University of Louisville, USA Alec Yasinsac, University of South Alabama, USA Robert F. Erbacher, Army Research Laboratory, USA Wenke Lee, George Institute of Technology, USA Barbara Endicott-Popovsky, University of Washington, USA Roy Campbell, University of Illinois, Urbana/Champaign, USA Yong Guan, Iowa State University, USA General Chair: Antonio Maña, University of Malaga, Spain Program Committee Co-Chairs: Carsten Rudolph, Huawei European Research Center, Germany Nicolai Kuntze, Huawei European Research Center, Germany Barbara Endicott-Popovsky, University of Washington, USA Publication Chair: Ibrahim Baggili, University of New Haven, USA Publicity Chair Europe: Joe Cannataci, University of Malta, Malta Publicity Chair North-America: Dave Dampier, Mississippi State University, USA Publicity Chair Asia: Ricci Ieong, University of Hong Kong, Hong Kong 3
5 Program Committee Sudhir Aggarwal Florida State University, USA Galina Borisevitch Perm State University, Russia Frank Breitinger University of New Haven, USA Joseph Cannatacci University of Groningen, Netherlands Long Chen Chongqing University of Posts and Telecommunications, China Raymond Choo University of South Australia, Australia K.P. Chow University of Hong Kong, Hong Kong David Dampier Mississippi State University, USA Hervé Debar France Telecom R&D, France Barbara Endicott-Popovsky University of Washington, USA Robert Erbacher Northwest Security Institute, USA Xinwen Fu UMass Lowell, USA Simson Garfinkel Naval Postgraduate School, USA Brad Glisson University of Glasgow, UK Lambert Großkopf Universität Bremen, Germany Yong Guan Iowa State University, USA Barbara Guttman National Institute of Standards and Technology, USA Brian Hay University of Alaska Fairbanks, USA Jeremy John British Library, UK Ping Ji John Jay College of Criminal Justice, USA Andrina Y.L. Lin Ministry of Justice Investigation Bureau, Taiwan Pinxin Liu Renmin University of China Law School, China Michael Losavio University of Louisville, USA David Manz Pacific Northwest National Laboratory, USA Nasir Memon Polytechnic Institute of New York University, USA Mariofanna Milanova University of Arkansas at Little Rock, USA Carsten Momsen Leibniz Universität Hannover, Germany Kara Nance University of Alaska Fairbanks, USA Ming Ouyang University of Louisville, USA Gilbert Peterson Air Force Institute of Technology, USA Slim Rekhis University of Carthage, Tunisia Golden Richard University of New Orleans, USA Corinne Rogers University of British Columbia, Canada Ahmed Salem Hood College, USA Viola Schmid Technische Universität Darmstadt, Germany Clay Shields Georgetown University, USA Vrizlynn Thing Institute for Infocomm Research, Singapore Sean Thorpe Faculty of Engineering and Computing at University of Technology, Jamaica William (Bill) Underwood Georgia Institute of Technology, USA 4
6 Wietse Venema IBM T.J. Watson Research Center, USA Hein Venter University of Pretoria, South Africa Xinyuan (Frank) Wang George Mason University, USA Kam Woods University of North Carolina, USA Yang Xiang Deakin University, Australia Fei Xu Institute of Information Engineering, Chinese Academy of Sciences Alec Yasinsac University of South Alabama, USA SM Yiu Hong Kong University, Hong Kong Wei Yu Towson University, USA Nan Zhang George Washington University, USA 5
7 Sponsoring Institutions Safe Society Labs, S.L. The University of Malaga Journal of Digital Forensics, Security and Law 6
8 Table of Contents UFORIA - A Flexible Visualisation Platform for Digital Forensics and E-Discovery.. Arnim Eijkhoudt, Sijmen Vos, Adrie Stander Dynamic Extraction of Data Types in Android s Dalvik Virtual Machine Paulo R. Nunes de Souza, Pavel Gladyshev Chip-off by Matter Subtraction: Frigida Via.. David Billard, Paul Vidonne The EVIDENCE Project: Bridging the Gap in the Exchange of Digital Evidence Across Europe Maria Angela Biasiotti, Mattia Epifani, Fabrizio Turchi A Collision Attack on Sdhash Similarity Hashing.. Donghoon Chang, Somitra Kr. Sanadhya, Monika Singh, Robin Verma An empirical study on current models for reasoning about digital evidence.. Stefan Nagy, Imani Palmer, Sathya Chandran Sundaramurthy, Xinming Ou, Roy Campbell Data Extraction on MTK-based Android Mobile Phone Forensics Joe Kong Open Forensic Devices Lee Tobin, Pavel Gladyshev A study on Adjacency Measures for Reassembling Text Files... Alperen Şahin, Hüsrev T. Sencar An integrated Audio Forensic Framework for Instant Message Investigation... Yanbin Tang, Zheng Tan, K.P. Chow, S.M. Yiu Project Maelstrom: Forensic Analysis of the Bittorrent-powered Browser Jason Farina, M-Tahar Kechadi, Mark Scanlon Factors Influencing Digital Forensic Investigations: Empirical Evaluation of 12 Years of Dubai Police Cases Ibtesam Al Awadhi, Janet C Read, Andrew Marrington, Virginia N. L. Franqueira PLC Forensics based on CONTROL Program Logic Change Detection... Ken Yau, Kam-Pui Chow Forensic Acquisition of IMVU: A Case Study... Robert van Voorst, M-Tahar Kechadi, Nhien-An Le-Khac Cyber Black Box/Event Data Recorder: Legal and Ethical Perspectives and Challenges with Digital Forensics.. Michael Losavio, Pavel Pastukov, Svetlana Polyakova Tracking and Taxonomy of Cyberlocker Link Sharers based on Behavior Analysis. Xiao-Xi Fan, Kam-Pui Chow Exploring the Use of PLC Debugging Tools for Digital Forensic Investigations on SCADA Systems Tina Wu, Jason R.C. Nurse The Use of Ontologies in Forensic Analysis of Smartphone Content Mohammed Alzaabi, Thomas Martin, Kamal Taha, Andy Jones
9 UFORIA - A FLEXIBLE VISUALISATION PLATFORM FOR DIGITAL FORENSICS AND E-DISCOVERY Arnim Eijkhoudt & Sijmen Vos Amsterdam University of Applied Sciences Amsterdam, The Netherlands [email protected], [email protected] Adrie Stander University of Cape Town Cape Town, South Africa [email protected] ABSTRACT With the current growth of data in digital investigations, one solution for forensic investigators is to visualise the data for the detection of suspicious activity. However, this process can be complex and difficult to achieve, as there few tools available that are simple and can handle a wide variety of data types. This paper describes the development of a flexible platform, capable of visualising many different types of related data. The platform's back and front end can efficiently deal with large datasets, and support a wide range of MIME types that can be easily extended. The paper also describes the development of the visualisation front end, which offers flexible, easily understandable visualisations of many different kinds of data and data relationships. Keywords: cyber-forensics, e-discovery, visualisation, cyber-security, computer forensics, digital forensics, big data, data mining 1. INTRODUCTION With the growth of data that can be encountered in digital investigations, it has become difficult for investigators to analyse the data in the time available for an investigation. As stated by Teerlink & Erbacher (2006) A great deal of time is wasted by analysts trying to interpret massive amounts of data that isn t correlated or meaningful without high levels of patience and tolerance for error. Data visualisation might help to solve this problem, as the human brain is much faster at interpreting images than textual descriptions. The brain can also examine graphics in parallel, where it can only process text serially (Teerlink & Erbacher, 2006) According to Garfinkel (2010), existing tools use the standard WIMP model (Window, Icon, Menu, Pointing device). This model is poorly suited to representing large amounts of forensic data in an efficient and intuitive way. Research must improve! forensic tools to integrate visualisation with automated analysis, allowing investigators to interactively guide their investigations (Garfinkel, 2010). Many computer forensic tools are not ideally suited for identifying correlations among data, or for the finding of and visually presenting groups of facts that were previously unknown or unnoticed. These limitations of digital forensic tools are similar to the forensic analysis of logs in network forensics. For example, logs residing in routers, webservers and web proxies are often manually examined, which is a time-consuming and error-prone process (Fei, 2007). Similar considerations apply to analysis as well. Another issue with current tools is that they do not always scale well and will likely have problems dealing with the growth of data in digital investigations (Osborne, Turnbull, & Slay, 2010). Currently, there are few affordable tools suited to 8
10 and available for these use-cases or situations. Additionally, the available tools tend to be complex, requiring extensive training and configuration in order to be used efficiently. Investigative data visualisation is used to assist viewers with little to no understanding of the subject matter, in order to reconstruct a crime or item and to understand what is being presented, for example an investigator which is not familiar with a particular scenario. On the other hand, analysis visualisations can be used to review data and to assess competing scenario hypotheses for investigators that do have an understanding of the subject matter (Schofield & Fowle, 2013). A timeline is a valuable form of visualisation, as it greatly assists a digital forensic investigator in proving or disproving a hypothetical model proposed for the investigation. A timeline can also provide support for the mandate the digital forensic investigator received prior to commencing the investigation (Ieong, 2006). Interaction between role players can normally also be shown in network diagrams, so that the combination of a timeline and network diagram can generally answer many who and when answers. The aspects of what and where can often be answered by examining the contents of evidence items, such as s or the positional data of mobile phone calls. It is therefore important to be able to display the details of data with ease as well. This paper describes the development of a flexible platform, Uforia (Universal Forensic Indexer and Analyser), that can be used to visualise many different types of data and data relations in an easy and fast way. The platform consists of two sections, a back end and a front end, and is based on readily available open source technologies. The back end is used to pre-process the data in order to speed up the indexing and visualisation process handled by the front end. The resulting product is a simple and extremely flexible tool, which can be used for many types of data with little or no configuration. Very little training is needed to use Uforia, making it accessible and usable for forensic investigators without a background in digital investigations or systems, such as auditors. 2. ADVANTAGES Uforia offers many advantages, of which the first is very low cost. A second advantage is that the system scales well due to its use of multiprocessing and distributed technologies such as ElasticSearch, so that extremely large numbers of artefacts can be handled in a very short time. The processing of the Enron set, consisting of roughly s without attachments, typically takes less than ten minutes to complete on contemporary consumergrade hardware. This pre-processing step also ensures that little to no processing needs to be done at the time of visualisation. Thirdly, the Uforia's development heavily focused on making it as user- and developer-friendly as possible. Many forensic tools need a substantial amount of training and configuration to accomplish meaningful tasks. As this makes the systems difficult and expensive to use and develop for, it was considered paramount during Uforia's continued development to address these issues. Although a full UX study has not been conducted yet, the UI and feature set was developed using mock-ups and feedback from UX- and graphical designers, as well as potential users from several fields of expertise, such as process, compliance and risk auditors, forensic investigators and law enforcement officers, where none of the participants were given prior usage instructions. Another advantage is the extreme flexibility of the system. It is very easy to add new modules, e.g. for handling new MIME types, as the programming of such a module can normally be accomplished in a very short time using simple Python programming. Additionally, the front end is completely web based, and no special software needs to be installed to use it. This, combined with the following common web design and UX standards, suggests that even novice users can achieve meaningful results with little to no training. 3. BACK END 3.1 START-UP PHASE Uforia's back end is used to process the files containing the data that will eventually be indexed and used in the visualisation process. 9
11 The back end's first step is to create a MySQL table for the files. This table contains all metadata common to any digital file, as well as calculated metadata (such as NIST hashes). A second database table is then generated, and it contains information about the supported MIME types. This table is built by looking at a configurable directory containing the modules for the MIME types that can be handled by the system. Every module that can handle a specific MIME type is identified and added to this table. The table eventually contains zero, one or more 1:n key/value pairs for each of the supported MIME types and their respective module handlers. The module handlers are themselves stored as key/value pairs, with their original name as keys to the matching unique table name. These tables are then created for each module, so that Uforia can store the returned, processed data from each particular module in its unique table. Modules are self-contained files and extremely easy to develop. They only require the structure of their database table to be stored as a simple Python comment line in the particular module, starting with # TABLE:, and a predefined process function which should return the array of the data to be stored. 3.2 PROCESSING Once all tables are created, the processing of the files that need to be analysed can start. The first step is to build a list of the files involved. This is read from the config file. Once this list is completed, every file in the list is processed. The MIME type of the file is determined and then the relevant processing modules (0, 1... n) are called to process the file. The results returned by each module are then stored in the database table that was generated earlier for that particular module. When Uforia encounters a container format, it can deal with it efficiently by recursively calling itself. For instance, the Outlook PST module will unpack encountered PST files to a temporary directory and then call Uforia recursively for that temporary location. The unpacked individual s are then automatically picked up by the normal module and processed accordingly. Uforia can also deal efficiently with flat-file database(-like) formats by having modules return their results as a multi-dimensional array. Uforia's database engine turns these into multiple-row inserts into the appropriate modules' tables. Examples of modules that deal with flat-files in this fashion, are the modules that handle the mobile phone data (CSV-format) and the simple PCAP-file parser. Due to its highly-threaded operation, the back end can pre-process large volumes of data efficiently in relatively little time. Once the processing steps are completed, the stored data needs to be transferred from the back end storage in JSON-format to the ElasticSearch engine for use by the visualisation front end. 4. FRONT END The front end uses ElasticSearch, AngularJS and D3.js for the visualisation and administration interface. The first step during the visualisation process is to select the modules or file types that need to be visualised in the admin interface. The next step is to select (and possibly group any identical) fields that need to be indexed by the ElasticSearch engine. The administration interface will hint at similar field names in other supported data types to allow for the merging of data types into one searchable set. This makes it possible to correlate the timing of e.g. cell phone calls and E- mails. During or after the indexing and storing in ElasticSearch, one or more visualisations must then be assigned to the mapping in the admin interface. This also includes specifying the fields that should be laid out on the visualisation's axes. The data in ElasticSearch can then be searched and visualised, even if the index process has not been completed yet. Because the front-end uses ElasticSearch, searches are fast and highly scalable. Only when full detail views of selected evidence items are necessary, the underlying back-end database needs to be accessed. 10
12 5. USER INTERFACE The interface is designed with the goal of optimizing user-friendliness and ease of understanding. The user interface sports a 'responsive design', with UI elements automatically resizing and repositioning themselves for different screen sizes, such as with laptops, tablets and mobile phones, as can be seen in Figure EXAMPLES In this section, an examples can be seen of how Uforia is can be used to quickly determine the E- mail contacts of suspects. Despite limited available space in this paper, it is nevertheless possible to recreate similar scenarios for other data types. Figure 2 shows an example set of a network graph derived from a sample set of PST-files, where the content was searched for the words 'investigate', 'books', 'suspect' or 'trading' and shown as a network graph indicating which individuals communicated about these words, with the size of the node indicating the amount of communication received. This immediately indicates the links between several possible suspects, including one whose PST mailbox was not included in the dataset and processed by Uforia. Figure 1: Mobile Interface 1) The user selects an 'evidence type', which is the name used for the collection, as it was generated in the admin interface 2) Uforia then loads the module fields that have been indexed for that evidence type, e.g. 'Content' for s or documents. 3) The user selects whether the field should 'contain[s]' or 'omit[s]' the information in the last field. 4) Finally, the user selects one of the visualisations that have been assigned to the evidence type. 5) Uforia will now render the requested information using the selected visualisation, with some of the visualisations offering additional manipulation (such as a network graph). Lastly, all visualisations have one or more 'hot zones' where the user can 'clickthrough' to bring up a detailed view of the selected evidence item(s). Figure 2: Network Graph Another example is creating a timeline, as seen in Figure 3, to determine when messages were sent and which were sent around the time of the possible transgression. It is easy to determine the times of the messages by hovering over the intersections on the timeline, and to investigate the original s by clicking on the intersections (see Figure 4). 11
13 Uforia was tested on a number of real life scenarios, and in all cases it was able to produce real results in a fast and efficient way, requiring hardly any operator training. In conclusion, Uforia is fast, flexible and low cost solution for investigating large volumes of data. REFERENCES Figure 3: Timeline The timeline visualisation can handle multiple items like calls from a large number of mobile phones. Figure 4 shows anonymised data from a real case, illustrating how contacts and time can easily be determined. The horizontal axis indicates the flow of time, while the graph nodes and coloured lines indicate the moment of contact between the two phone numbers. By clicking on the intersections, the original data can once again be displayed. Fei, B. K. (2007). Data Visualisation in Digital Forensics. Pretoria, South Africa: Maters Dissertation, University of Pretoria. Garfinkel, S. L. (2010). Digital forensics research: The next 10 years. Digital Investigation, Ieong, R. S. (2006). FORZA - Digital forensics investigation Framework that incorporate legal issues. Digital Investigation(3), Osborne, G., Turnbull, B., & Slay, J. (2010). The Explore, Investigate and Correlate (EIC) conceptual framework for digitalforensics Information Visualisation. International Conference on Availability, Reliability and Security, (pp ). Schofield, D., & Fowle, K. (2013). Visualising Forensic Data : Evidence (Part 1). Journal of Digital Forensics, Security and Law, Vol. 8(1), Teerlink, S., & Erbacher, R. F. (2006). Foundations for visual forensic analysis. 7th IEEE Workshop on Information Assurance. Westpoint, NY: IEEE. Figure 4: Mobile Phone Timeline 7. CONCLUSION Uforia shows that it is possible to create a simple, user-friendly product that is nevertheless powerful enough to use in the most demanding investigations. It is easy to extend if any new MIME types are encountered or new features are needed. 12
14 DYNAMIC EXTRACTION OF DATA TYPES IN ANDROID S DALVIK VIRTUAL MACHINE Paulo R. Nunes de Souza, Pavel Gladyshev Digital Forensics Investigation Research Laboratory, University College Dublin, Ireland ABSTRACT This paper describes a technique to acquire statistical information on the type of data object that goes into volatile memory. The technique was designed to run in Android devices and it was tested in an emulated Android environment. It consists in inserting code in the Dalvik interpreter forcing that, in execution time, every data that goes into memory is logged alongside with its type. At the end of our tests we produced Probability Distribution information that allowed us to collect important statistical information that made us distinguish memory values between references (Class, Exception, Object, String), Float and Integer types. The result showed this technique could be used to identify data objects of interest, in a emulated environment, assisting in interpretation of volatile memory evidence extracted from real devices. Keywords: Android, Dalvik, memory analysis. 1. INTRODUCTION In digital forensic investigations, it is sometimes necessary to analyse and interpret raw binary data fragments extracted from the system memory, pagefile, or unallocated disk space. Event if the precise data format is not known, the expert can often find useful information by looking for human readable ASCII strings, URLs, and easily identifiable binary data values such as Windows FILETIME timestamps and SIDs. Figure 1 shows an example of a memory dump, where a FILETIME timestamp can be easily seen (a sequence of 8 random binary values ending in 01). To date, the bulk of digital forensic research focused on Microsoft Windows platform, this paper describes a systematic experimental study to find (classes of) easily identifiable binary data values in Android platform. Figure 1: Hexadecimal view of a memory dump 2. BACKGROUND Traditional digital forensics relies on evidences found in persistent storages. This is mainly due to the need to both sides of the litigation to reproduce and verify every forensic finding. The persistent storage can be forensically copied, providing a controllable way to repeat the analysis, getting to the same results. An alternative way is to combine the traditional forensics with the so called live forensics. The live forensics relies on evidences found in volatile memory to draw conclusions. This type of evidence features a lesser level of control and repeatability if compared with traditional evidences. On the other hand, live evidences may unravel key information to the progress of a case. However, the question regarding the reliability of the live evidence remains in place, mainly in two moments: the memory acquisition and the memory analysis. In the memory acquisition front, law enforcements and researchers are working to establish standard procedures to be used. These procedures could be based on physical or logical extraction. The physical extraction could need disassembling of the device or the use of JTAG as done by Breeuwsma 13
15 [2006]. The logical extraction can be more diverse, from interacting with the system with user privileges as done by Yen et al. [2009]; it could also gain system privileges through a kernel module as done by Sylve et al. [2012]; even use a virtual machine layer to have free access to the memory like done by Guangqi et al. [2014], among others. Regardless of the extraction method, there will be the need to analyse the extracted data. One challenge faced when analysing a memory dump is that application data is stored in memory following the algorithms of the program owning that memory space. Being aware of the variety of software running on nowadays devices, the task of interpreting the device s extracted memory is complex. Some researchers are tackling this challenge taking different approaches. Volatility [2015] provides a customizable way to identify kernel data structures from memory dumps; Lin et al. [2011] used graph-based signatures to identify kernel data structures, Hilgers et al. [2014] uses the Volatility framework to identify structures beyond the kernel ones, identifying static classes in the Android system. A deeper memory analysis tool that would consistently interpret data structures from application software has not yet being developed. The in-depth memory analysis is normally done in a adhoc basis, interpreting the memory dump from the light of the reversed engineered application s source code, as done by Lin [2011]. A broader approach, that would not depend on the application s source code, could be powerful to deep memory analysis. This approach, not based on the application source code, would have advantages and disadvantages. As an advantage, this approach could be used in situations where the source code is unknown, unavailable, or legally disallowed to be reversed engineered. On the other hand, without the source code to deterministically assert the meaning of each memory cell, this method would need to take a probabilistic approach. The foundation for such approach is a probabilistic understanding of the memory data associated with their respective type. This paper uses the Android OS as environment to present a technique to gather the memory information associated with its type, making possible to have an probabilistic understanding of that data. 3. ANDROID STRUCTURE The Android OS is an Operating System based on Linux, with extensions and modifications, maintained by Google. The OS was designed to run on a large variety of devices sharing same common characteristics [Ehringer, 2010]: (1) limited RAM; (2) little processing power; (3) no swap space; (4) powered by battery; (5) diverse hardware; (6) sandboxed application runtime. Figure 2: Architecture of Android OS To provide a system that could run on such diverse and resource limited devices, they decided to build a multi-layered OS(Figure 2). The 5 layers are: (1) Linux kernel; (2) Hardware Abstraction Layer (HAL); (3) Android runtime and Native libraries; (4) Android framework; (5) Applications. The Android OS is an hybrid of compiled and interpreted system. The boundary between compiled and interpreted execution is the Android runtime. The versions of the Android used in our experiments (android r1 and android-4.3 r2.1) feature Dalvik Virtual Machine (Dalvik VM) in the runtime package. All the programs running in the layers underneath Dalvik VM are compiled and all programs running in the layers above Dalvik VM are interpreted. The Dalvik VM hosts programs that were written in a Java syntax, compiled to an intermediary code level called bytecode and then packed to be loaded into Dalvik. When the software is launched inside Dalvik VM, each line of bytecode is interpreted into the machine code, normally in ARM architecture. 14
16 The Dalvik VM is implemented as a registerbased virtual machine. This mean that the instructions operate on virtual registers, being those virtual registers memory positions in the host device. The instruction set provided by the Dalvik VM consists of a maximum of 256 instructions, being some of them currently unused. Part of the used instructions is type specific, being those the ones chosen to be used to collect data and type information. The Dalvik VM instruction set is grouped in some categories: binop/lit8 is the set of binary operations receiving as one of the arguments a literal of 8 bits; binop/lit16 is the set of binary operations receiving as one of the arguments a literal of 16 bits; binop/2addr is the set of binary operations with only two registers as arguments, being the result stored in the first register provided; binop is the set of binary operations with three registers as arguments, two source registers and one destination register; unop is the set of unary operations with two registers as arguments, one source register and one destination register; staticop is the set of operations that perform over static object fields; instanceop is the set of operations that perform over instance object fields; arrayop is the set of operations that perform over array fields; cmpkind is the set of operations that perform comparison between two floating point or long; const is the set of operations that move a given literal to a register; move is the set of operations that move the content of a register to another register. Each of those categories has a number of instructions specifically designed to operate over some data type. The whole instruction set distinguishes 12 data types, namely: (1) Boolean; (2) Byte; (3) Char; (4) Class; (5) Double; (6) Exception; (7) Float; (8) Integer; (9) Long; (10) Object; (11) Short; (12) String. 4. MODULAR INTERPRETER (MTERP) As the Android OS is open source, the source code of the OS [Google, 2015], including the Dalvik VM, is available to be downloaded and modified. By inspecting the Dalvik VM source code in details, it was possible to identify that the interpreter 2 would be a strong candidate to host the 2 The interpreter is located on the following directory of the Android source tree: /android/dalvik/vm/mterp data collecting code. The features that most suit our needs are: (1) there is an different entry for each bytecode instruction, called opcode; (2) several of the opcodes of the Dalvik VM are type related. Therefore, it is a good point to place the code designed to collect the data, relating values and types that goes to memory. Even though the Dalvik interpreter is conceptually the central point from where every single line of Dalvik bytecode should pass through, there is one exception. The Android OS features an optimization element called Just In Time (JIT) compilation that can bypass the Dalvik interpreter [Google, 2010]. The JIT compiler is designed to identify the most demanded tracks of code that run over the Dalvik VM. After identified, those tracks would be compiled and, next time they were demanded, the JIT would call the compiled track, instead of calling the interpreter. This way, the code we use to collect our data would not be executed and the collected data would not be accurate. JIT configuration # of instructions logged WITH JIT = true 2,676,540 WITH JIT = false 3,643,739 Table 1: Number of instructions logged during the Android booting process In our tests, the JIT compiler would skip, on average, 26.5% of the type bearing instructions during the Android booting process(table 1). To avoid this source of error, it was necessary to deactivate the JIT compiler on our test Android OS. The Android system contains an environment variable WITH JIT that is used to deploy an Android system with or without JIT. In order to deactivate the Just In Time compilation, we edited the makefile Android.mk 3 and forced the WITH JIT to be set to false. Having deactivated the JIT, it is necessary to insert the logging code into the interpreter. The interpreter source code is put together in a modular fashion, for this reason it is called modular interpreter (mterp). For each target architecture variant there will be a configuration file in the mterp folder 4. The 3 The Android.mk is located on the following directory of the Android source tree: /android/dalvik/vm 4 The mterp folder is located on the following directory of the Android source tree: /android/dalvik/vm/mterp 15
17 configuration will define, for each Dalvik VM instruction, which version of ARM architecture will be used and where the corresponding source code is located. In order to log all the designed instructions, several ARM source code files, scattered in the mterp folder, will need to be edit accordingly, and any extra subroutine could be inserted in the file footer.s. After all the codes are edited, it is required to run a script called rebuild.sh, located in the mterp folder, that will deploy the interpreter 5. Finally, the Android system, that will contain the modified interpreter, need to be built. When executing the deployed Android OS, the data extraction takes place. The extracted data is stored in a single file with one entry per line as shown in Listing 1. The key information we can find in each entry are the two last columns, containing the type and the hexadecimal value stored in memory. Listing 1: Unprocessed log sample D(285:298) Object = <0x41a1fc68> D(285:298) Int = <0x > D(285:298) Object = <0x41a1fc68> D(285:298) Int = <0x00011db5> D(285:298) Byte = <0x2f> D(285:298) Int = <0x > D(285:298) Int = <0x f> D(285:298) Char = <0x2f> Having this file, we process it to separate one data type on each file and exclude any extra information apart from the hexadecimal value, as depicted in the Figure 3. Android Emulator mterp.log Extraction Figure 3: Log processing Log Processing Summing up, to extract the memory values associated with their respective types we needed to do the following: deactivate the JIT Compiler from an Android OS; inject code in the Dalvik Interpreter to log types and values on each interpreted typebearing instruction ; run the adjusted Android OS to collect data on the logs; process the logged data; Boolean.log Byte.log The deactivation of the JIT compiler and the modification in the Dalvik interpreted code, expectedly, generated an execution overhead. Considering the average booting time, the logging procedure seems to have effected more the response time than the JIT deactivation. The Table 2 shows the average booting times with and without JIT, as well as with and without the logging code.. String.log Log = off Log = on WITH JIT = true 62s 2176s WITH JIT = false 62s 3026s Table 2: Average booting time in seconds 5. RESULTS 5 The interpreter is located on the following directory of the Android source tree: /android/dalvik/vm/mterp/out! Having all the processed logs, it was possible to extract some statistical information from them. The Table 3 shows in what proportion each type appear 16
18 in the logs. The table makes clear that the Int type prevail over the other types, with 54.3% of the appearances. Other types with a rather common rate of occurrence are Byte (8.17%), Char (13.19%) and Object (24.00%). The remainder of the types have a percentage lower than 1%. Type # of occurrences % of total Bool 6, % Byte 297, % Char 444, % Class 1, % Double % Exception % Float 6, % Int 1,978, % Long 7, % Object 874, % Short 3, % String 22, % Total 3,643, % Table 3: Booting time in seconds At this point, the 32-bit types are being highlighted. They are: (1) Class; (2) Exception; (3) Float; (4) Integer; (5) Object; (6) String. Each of those 6 types have its own probability distribution of values plotted on the Figure 4. From the distributions it is possible to spot the similarity among the types: (1) Class; (2) Exception; (3) Object; (4) String. All 4 of them have a predominant peak a little after the value 0x This similarity can be explained by the fact that those 4 types are indeed references, therefore, pointers to a memory address. If focusing only on the values around 0x , the Float type could be confused with the reference ones, because it also displays a peak around 0x , however a much broader one, moreover, it has an second lower peak around 0xc The Int type displays occurrences along the whole spectrum of values, featuring two more relevant peaks. One peak around 0x and the other peak around 0xffffffff. Those two peaks could be explained by an greater occurrence of integer with small absolute values, being them of positive and negative signal, respectively. Figure 4: Probability distribution of values by 32- bit type (Log scale) 6. CONCLUSION This paper explained a technique to capture memory data along with their corresponding data type in an emulated Android OS. This technique required deactivation of the optimization process called Just In Time compilation and the modification of the interpreter ARM code. The technique creates an expected overhead on the Android execution time. As this technique was only designed to run in emulated Android, this overhead is not an issue. The technique allowed us to collect important statistical information that made us distinguish memory values between references (Class, Exception, Object, String), Float and Integer 17
19 types. Beyond this specific test case, this technique could be use to build an statistical data corpus of Android memory content. This data corpus may become a tile on the work of paving the ground to the development of a consistent deep memory analysis tool. 7. ACKNOWLEDGEMENTS This work was supported by research grants (BEX 9072/13-6) from Science Without Borders implemented by CAPES Foundation, an agency under the Ministry of Education of Brazil. REFERENCES Ing. M.F. Breeuwsma. Forensic imaging of embedded systems using JTAG (boundary-scan). Digital Investigation, 3 (1):32 42, ISSN doi: David Ehringer. The dalvik virtual machine architecture, Google. Google i/o a jit compiler for android s dalvik vm. Google Developers, May URL Accessed 6th March Google. Android source code repository. repo, URL plataform/manifest. Accessed 11th February Liu Guangqi, Wang Lianhai, Zhang Shuhui, Xu Shujiang, and Zhang Lei. Memory dump and forensic analysis based on virtual machine. In Mechatronics and Automation (ICMA), 2014 IEEE International Conference on, pages , Aug doi: /ICMA C. Hilgers, H. Macht, T. Muller, and M. Spreitzenbarth. Post-mortem memory analysis of cold-booted android devices. In IT Security Incident Management IT Forensics (IMF), 2014 Eighth International Conference on, pages 62 75, May doi: /IMF Zhiqiang Lin. Reverse Engineering of Data Structures from Binary. PhD thesis, CERIAS, Purdue University, West Lafayette, Indiana, August Zhiqiang Lin, Junghwan Rhee, Xiangyu Zhang, Dongyan Xu, and Xuxian Jiang. Siggraph: brute force scanning of kernel data structure instances using graph-based signatures. In 18th Annual Network & Distributed System Security Symposium Proceedings, Joe Sylve, Andrew Case, Lodovico Marziale, and Golden G. Richard. Acquisition and analysis of volatile memory from android devices. Digital Investigation, 8(34): , ISSN doi: Volatility. The volatility framework, URL Accessed 18th March Pei-Hua Yen, Chung-Huang Yang, and TaeNam Ahn. Design and implementation of a liveanalysis digital forensic system. In Proceedings of the 2009 International Conference on Hybrid Information Technology, ICHIT 09, pages , New York, NY, USA, ACM. ISBN doi: /
20 CHIP-OFF BY MATTER SUBTRACTION: FRIGIDA VIA David Billard 1, Paul Vidonne 2 1 University of Applied Sciences in Geneva, Switzerland [email protected] 2 LERTI, France [email protected] ABSTRACT This work introduces an unpublished technique for extracting data from flash memory chips, especially from Ball Grid Array (BGA) components. This technique does not need any heating of the chip component, as opposed to infrared or hot air de-soldering. In addition, it avoids the need of re-balling BGA in case of missing balls at the wrong place. Thus it enhances the quality and integrity of the data extraction. However, this technique is destructive for the device motherboard and has limitations when memory chip content is encrypted. The technique works by subtracting matter by micro-milling, without heating. The technique has been extensively used in about fifty real cases for more than one year. It is named frigida via, compared to the calda via of infrared heating. Keywords: Chip-off forensics, data extraction, BGA, data integrity preservation, micro-milling, infrared heating. 1. INTRODUCTION Forensics laboratories are daily facing the challenge of extracting data from embedded or small scale digital devices. In the better case, the devices are already known from commercial vendors of extraction tools and a proved method is available to the practitioner. In most cases, the devices are unknown, or broken, and then begins the fastidious search of a method to extract data from the device without jeopardizing the judicial value of the hypothetical concealed evidence. When no software-based method exists, the desoldering of the chip holding the data is accomplished. The chip is often a flash memory component, more and more of Ball Grid Array (BGA) technology. The de-soldering, even when routinely executed, is no error prone and induces a heavy stress on the component. Furthermore, the controlling of the heating is based on temperature probes which are not always accurate enough. This leads to chips being heated too much or chips being teared off. In the first case, the data content may be altered, even destroyed in some occasion. In the second case, some balls of the BGA will stay on the motherboard and the practitioner will have to reball the chip in order to extract data using a BGA reader. As an example, the BGA component shown in figure 1 comes from a cell phone motherboard. The labeling on the chip is very clear: it s a NAND chip and the edges of the chip are sharp. Figure 1: BGA from a cell phone motherboard The chip has been heated using infrared and the result is shown in figure 2. The component changed color (no more labeling visible) and the edges are blurred. The ball grid is also a bit wavy: the heating 19
21 Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering has a dramatic effect on the component. However, the component is still readable and data can be extracted. The ruler (in millimeters) has been added to give the reader of this paper a better idea of the component s size. later in this work. Figure 4: Milled Micron BGA recto and verso The paper is organized as follows: section 2 is a review of literature about data extraction from flash components; section 3 presents the principle of the milling process, the machine and the interaction with precision bar turning; section 4 lists some lessons learned in using this technique compared to infrared heating and presents a comparative table of pros and cons. Figure 2: Heated BGA recto and verso In this paper we propose a new method for taking off BGA chips from motherboard, without heating them. In fact, instead of taking the chip off, we remove the motherboard from under the chip. We use micro-milling technology and we subtract matter from the motherboard on the other side of the chip, until we reach the ball grid. The process is constantly monitored and controlled and it stops when reaching the balls. A result of this process is shown below. 2. RELATED WORKS An extensive literature exists about extracting data from flash (or eeprom) memory chips. Most of this literature assumes that the device is in working order. For instance, (Breeuwsma, 2006) addresses the use of JTAG (boundary-scan) in order to bypass or trick the processor or the memory controller. In (Sansurooah, 2009), the author is addressing the use of flasher tools in order to load a bootloader into the device memory; this bootloader is designed to gain access to low-level memory management, thus enabling the reading of all memory blocks. The Micron chip presented in figure 3 is still attached to the motherboard. The labeling is clear, and the edges of the chip are sharp. Some papers, like (Fiorillo, 2009) are using hot air de-soldering to compare the content of flash memory chips before / after some writing of data. In (Willassen, 2005), several ways of desoldering chips are mentioned, all based on heating the component (hot air, infrared,...). In a remarkable presentation, (van der Knijff, 2007) presents an overview of most techniques for chipoff and JTAG access. Figure 3: Micron BGA on the motherboard Commercial products like (Cellebrite, 2015) or (Microsystemation, 2015) are based on several techniques in order to gain access to the low-level memory. Although these tools are not suited for chip-off, they provide the ability to decode memory dumps extracted from flash memory chips. Once the milling process is done, the chip labeling is still as clear on the recto, and the grid balls are all present at the verso, as shown in figure 4. Since no heating has been applied, the chip content has been cleared of any stress and is intact. We have been using and refining this technique for about one year on fifty real cases. We had an issue with only one particular case which is presented To our knowledge, the memory reading of broken / dismantled digital devices is done either by heating 20
22 and chip-off or sometimes by entirely reconstructing the device around the flash memory. Our paper brings an unpublished approach, requiring no heating, thus enhancing the integrity and quality of the data extraction. It is especially designed for broken devices but also for running devices, with some limitations, discussed in Sec SUBTRACTING MATTER 3.1 PRINCIPLE The aim of the technique is to subtract matter around the component. Concerning a BGA component, it sums up to obliterate the motherboard and its other components, leaving the BGA component alone. The technique can be summarized into the following steps: 1. Localization step: since the motherboard is milled, at its verso, just under the memory chip, the cutting tool has to be directed to the localization of the chip, while the chip is hidden by the motherboard. Thus it is necessary to locate the chip on the verso side of the motherboard by measuring distances from the board sides to the chip sides on the recto side. Then using the measures to draw the shape of the chip on the verso of the motherboard. Figure 5 presents a photography of the drawing of the shape of the chip, on the verso of the motherboard. 2. Revolving step: turning on itself the BGA component, still attached to its part of motherboard, in order to have the motherboard facing up (thus the component facing down). Drawing the chip shape at the verso Figure 5: Localization step 3. Peeling step: using a milling cutter to cut, layer by layer, the motherboard, until short of arriving to the grid balls. Sometimes it means also cutting layers of BGA components when the grid balls are lightly encased into the chip. Figure 6 presents a photography of the milling cutter sawing through the motherboard until the grid balls are exposed. Milling to the grid balls Figure 6: Peeling step For this milling step, it is of utmost importance that the milling cutter head and the motherboard be perfectly aligned at 90. Even a very small angle deviation may lead to a catastrophic bite of the milling cutter into the BGA component. In that case, the component may be utterly destroyed. 4. Cleansing step: removing the last bits of motherboard layer and epoxy that may still adhere to the grid balls. Once those steps are finished, there is no need of re-balling the component, since no ball has been lost. The component can be used straight away in a flash reader, provided that the practitioner has the right pinout module. The upper image in figure 7 represents a sectional view of a BGA, taken from (Guenin, 2002). The lower image represents the working of the milling cutter, subtracting the motherboard and leaving the grid balls exposed. 21
23 Sectional view of BGA, soldered to the motherboard Sectional view of BGA, detached by milling Figure 7: Process illustrated 3.2 VARIANT In some case, in particular when processor and memory are piled one on top of the other, before the localization step, the motherboard has to be cuted all around the component, either by drilling holes close to the four sides (like old fashioned stamps) or by drilling one hole and using a fretsaw all around the BGA component. This operation is called the punching step and figure 8 presents a photography of such step. 3.3 MACHINE The machine used for the milling is a standard precision micro milling machine from Proxxon (Proxxon, 2015). It must be capable of 0.05 millimeter steps (0.002 inch) with a rotating speed varying from 5,000 to 20,000 rpm (revolutions per minute). The milling cutters have usually a diameter between 1 and 3 millimeters (0.04 to 0.12 inch). A watchmaker grade magnifier, or a digital magnifier, is needed to control and verify the peeling step. 3.4 PRECISION BAR TURNING The idea to implement this frigida via technique comes from interaction with specialists of precision bar turning. These people are specialized in manufacturing tiny pieces of hardware, like gear wheels one can find in mechanical watches or complex components with special alloys used in space satellites. We were facing more and more devices locked to investigation due to their poor condition: cell phone with a bullet hole, GPS retrieved from a sunken boat or tablet barely surviving a plane crash. Using commercial tools or flash boxes was not an option and infrared heating was adding additional stress on components already submitted to heavy stress. Therefore, instead of thinking like repairing firms whose job is to detach an object in order to repair or analyze the failure of the whole device, we thought about isolating the memory from its external surroundings. In other words: obliterating the surrounding area, in order to leave the component exposed. One of the first case prompting us to use the milling was the investigation of a cell phone, retrieved after a car chase between the police and three drug dealers. The motherboard was badly damaged and we feared that using infrared on the memory chip may inadvertently damage further the chip. After extensive testing on spare devices, the milling process was applied to the device remnants and information was successfully extracted. 4. LESSONS LEARNED AND METHOD COMPARISON 4.1 ENCRYPTION The technique explained in this paper has to be used with prudence when dealing with encrypted devices. In a real case about narcotics, a BlackBerry 9720 was seized. It had a keyboard lock that the owner was not willing to depart from. The frigida via was successfully used and figure 9 presents the recto and verso images of the SKhynix chip. Separating the component from the others Figure 8: Punching step 22
24 motherboard, even if the grid balls are melted. We did not find if the epoxy glues together the chip and the motherboard at heating time or if it is done during the assembly of the motherboard. In that case, even a heavy heating cannot desolder the chip, and will more likely destroy the content of the component. Figure 9: Milled SKhynix BGA recto and verso But after reading the chip, it appeared that all the component content was encrypted. Finally, after some weeks, the password is supplied. Unfortunately, this password alone is not sufficient to decrypt the content: it must be used in conjunction with some hardware information, contained in other components of the motherboard. Thus, even with the password, the memory remains encrypted. 4.2 PROCESS DURATION & COMPARISON The milling technique takes between thirty minutes to one hour, depending on the quality of the motherboard. Namely, if the motherboard is flat, without any deformation, it takes less than thirty minutes, and if the motherboard has been retrieved after a helicopter crash, it takes about one hour. Once the chip is off the motherboard, it is immediately available for reading and the first contact in the reader socket is usually the good one. The infrared (or hotair) method is usually shorter in time for the chip-off, thirty minutes being the upper limit of the process. However, the process can be impeded in many ways. First, the chip can loose grid balls during the process; some of them staying attached to the motherboard. After cooling the chip, many tries are needed to find which grid balls are missing and additional time is needed to re-ball the chip, even if not all the grid balls need to be present, only the useful ones. The heating process also leaves residues of matter that have to be scrapped off using toothbrushes or special treatment. Then several tries are also needed to place the chip correctly into the reader socket, since the edges of the chip are no more rectilinear. Furthermore, the epoxy layer between the chip and the motherboard can glue the chip to the In table 1 we summarize the main differences between calda via and frigida via. Table 1: Comparison Infrared vs Milling Calda via: Infrared Heat damage Re-balling necessary Extensive cleansing Resoldering possible Frigida via: Milling Same process duration No heat applied No need of re-balling Light cleansing No resoldering The table 1 shows the most obvious differences between infrared and milling. But even if milling seems superior in many aspects with respect to infrared, we are still using the two techniques on the cases. The choice of the technique to apply is dictated by several factors, among which: 1. the availability of the machines; 2. the risk of finding encrypted data linked tohardware components; 3. the risk of damaging the chip by heating; 4. the likeness of epoxy presence gluing thememory chip and the motherboard; 5. the training of the practitioner. When facing a chip-off, we are applying a riskbased decision matrix in order to decide between calda and frigida via. 5. CONCLUSION In this paper, we present a new technique for extracting data from flash memory chips, especially from Ball Grid Array (BGA) components. This technique, called frigida via (or milling), is complementary of infrared or hot air chip-off processes and offers many new possibilities. Instead of relying on the heating of the solder of BGA component, in the hope that the component 23
25 will detach cleanly from the motherboard, the technique presented in this paper rely on subtracting the motherboard from the component. The motherboard is milled under the chip until exposing the grid balls. At the end of the process, the chip is freed from the motherboard and can be placed on a reader socket for further analysis. Since this technique does not require any heating of the chip component, as opposed to infrared or hot air de-soldering, it avoids the inadvertent degradation of the memory. As a matter of fact the component may be already weakened by external causes, or simply of fragile design, and using heating, even with careful controlling of temperature, may lead to the destruction of the memory content. Therefore, the frigida via is more respectful of the data integrity since it does not impose additional stress on the memory chip, and the quality of the data extraction is enhanced. In addition, the frigida via avoids the need of reballing BGA in case of missing balls at the wrong place. It also eliminate the problem of the epoxy gluing memory chip and motherboard in some devices. However, this technique is destructive for the device motherboard and re-soldering of the chip component is impossible. That impossibility is a severe limitation when the memory content is encrypted by a combination of password and hardware-related information. The technique works and has been used in about fifty real cases, for more than one year. Guenin, B. (2002, February). The many flavors of ball grid array packages. Electronics Cooling. Retrieved from cooling.com/42002/02/the2many2flavors2ofball2 grid2array2packages/ Microsystemation. (2015). Xry mobile forensics. Retrieved from Proxxon. (2015). Precision lathe and milling systems. Retrieved from Sansurooah, K. (2009, December). A forensics overview and analysis of usb flash memory devices. Proceedings of the 7th Australian Digital Forensics Conference, van der Knijff, R. (2007). 10 good reasons why you should shift focus to small scale digital device forensics. Retrieved from proceedings/vanderknijff4 pres.pdf Willassen, S. Y. (2005). Abstract forensic analysis of mobile phone internal memory. Retrieved from files/mobile2memory2forensics.pdf REFERENCES Breeuwsma, I. M. (2006). Forensic imaging of embedded systems using jtag (boundaryscan). Digital Investigation, 3(1), doi: DOI: /j.diin Cellebrite. (2015). Ufed mobile forensics. Retrieved from Fiorillo, S. (2009, December). Theory and practice of flash memory mobile forensics. Proceedings of the 7th Australian Digital Forensics Conference,
26 THE EVIDENCE PROJECT: BRIDGING THE GAP IN THE EXCHANGE OF DIGITAL EVIDENCE ACROSS EUROPE Maria Angela Biasiotti, Mattia Epifani, Fabrizio Turchi Institute of Legal Information Theory and Techniques of the Italian National Research of Council Florence, Italy, ABSTRACT Based upon the assumption that the very nature of data and information held in electronic form makes it easier to manipulate than traditional forms of data, that all legal proceedings rely on the production of evidence in order to take place and that electronic evidence is no different from traditional evidence in that is necessary for the party introducing it into legal proceedings, to be able to demonstrate that it is no more and no less than it was, when it came into their possession the EVIDENCE Project aims at providing a road map (guidelines, recommendations, technical standards) for realising the missing Common European Framework for the systematic and uniform application of new technologies in the collection, use and exchange of evidence. This road map incorporating standardized solutions aims at enabling all involved stakeholders to rely on an efficient regulation, treatment and exchange of digital evidence, having at their disposal as legal/technological background a Common European Framework allowing them to gather, use and exchange digital evidences according to common standards, rules, practises and guidelines. EVIDENCE activities will also aim at enabling the implementation of a stable network of experts in digital forensics communicating and exchanging their opinions and contributing as well to the building up of a stable communication channel between the public and the private sectors dealing with electronic evidence. Keywords: digital evidence, digital evidence exchange, metadata, formal languages. 1. THE CONTEXT All legal proceedings rely on the production of evidence in order to take place. Electronic Evidence is no different from traditional evidence in that is necessary for the party introducing it into legal proceedings, to be able to demonstrate that it is no more and no less than it was, when it came into their possession. In other words, no changes, deletions, additions or other alterations have taken place. The very nature of data and information held in electronic form makes it easier to manipulate than traditional forms of data. When acquired and exchanged integrity of the information must be maintained and proved. Legislations on criminal procedures in many European countries were enacted before these technologies appeared, thus taking no account of them and creating a scenario where criteria are 25 different, uncertain, regulations are not harmonized and aligned and therefore exchange among EU Member States jurisdictions and at transnational level is very hard to be realized. What is missing is a Common European Framework to guide policy makers, law enforcement agencies and judges when dealing with digital evidence treatment and exchange. The EVIDENCE project interpreted this request by defining it as: the need for a common background for all actors involved in the Electronic Evidence lifecycle: Policy makers, LEAs, Judges and Lawyers; the need for a common legal layer devoted to the he regulation of Electronic Evidence in Courts the need for standardized procedures in the use, collection and exchange of Electronic
27 Evidence (across EU member States). In response to the above needs and gaps the EVIDENCE project aims at providing a Road Map (guidelines, recommendations, technical standards) for realizing the missing Common European Framework for the systematic and uniform application of new technologies in the collection, use and exchange of evidence. This Road Map incorporating standardized solutions would enable policy maker to realize an efficient regulation, treatment and exchange of digital evidence, LEAs as well as judges/magistrates and prosecutors and lawyers practising in the criminal field to have at their disposal as legal/technological background a Common European Framework allowing them to gather, use and exchange digital evidences according to common standards and rules. In order to produce this common, unique European way/ approach to the treatment and exchange of electronic evidence, the EVIDENCE project has identified as relevant the following steps: Developing a common and shared understanding on what electronic evidence is and which are the relevant concepts of electronic evidence in involved domains and related fields (digital forensic, criminal law, criminal procedure, criminal international cooperation); Detecting which are rules and criteria utilized for processing electronic evidence in EU Member States, and eventually how is the exchange of evidence regulated; Detecting of the existence of criteria and standards for guaranteeing reliability, integrity and chain of custody requirement of electronic evidence in the EU Member States and eventually in the exchange of it; Defining operational and ethical implications for Law Enforcement Agencies all over Europe; Defining implications on data Privacy issues; Identifying and developing technological functionalities for a Common European Framework in gathering and exchanging electronic evidence; Seizing the EVIDENCE market. The project is now at its halfway mark and step are completed whilst step are on the way to produce final assessment. 2. PRELIMINARY REMARKS ON THE CONCEPT OF ELECTRONIC EVIDENCE Before going for any kind of classification the very first issue at stake has been to set the right scenario and to fix the range and scope of the categorization task with respect to the Project aims and goals. In this sense, our aim is to develop a framework for the application of new technologies in the collection, use and exchange of evidence between Courts of the EU Member states. So, the main keywords to be considered are: Source of Evidence, Authenticity, Evidence, ICT and Exchange. The use of ICT associated with evidence is often described utilizing two main expressions: Electronic Evidence and Digital Evidence. Is the first one different from the second or are they just synonyms? We know for sure that both electronic and digital evidence originate from the so called sources of evidence and that there is a specific need to carry on a forensics analysis in order to identify the evidence itself. We are also aware of the fact that these sources might be electronic, or non electronic and that in the latter case it can acquire the status of digital/electronic evidence if digitized. The analysis of the most significant sources of information demonstrated that there is no uniform use of the terms that identify this domain. Indeed, both digital evidence and electronic evidence are accepted terms in the scientific community. For instance the International Standard Document, ISO/IEC 2703, Guidelines for identification, collection, acquisition, and preservation of digital evidence, prefers the term digital evidence, because it refers to data that is already in a digital format and does not cover the conversion from analogical data into digital one. On the other hand, authoritative sources such as the Council of Europe have opted for the term Electronic evidence in the recently published Electronic evidence guide (Council of Europe, 2013). 26
28 Moreover there are many different definitions of Electronic/Digital Evidence, each of them highlighting some, but not all, essential features. The following are the main definition we have collected/analysed so far (Mason, 2012): any data stored or transmitted using a computer that support or refute a theory of how an offense occurred or that address critical elements of the offense such as intent or alibi (Carrier, 2006); digital evidence is any data stored or transmitted using a computer that support or refute a theory of how an offense occurred or that address critical elements of the offense such as intent or alibi (Casey, 2011). None of the above cited definitions of digital evidence or electronic evidence matched our needs, therefore we finally decided to adopt the following original definition: Electronic Evidence is any data resulting from the output of an analogue device and/or a digital device of potential probative value that are generated by, processed by, stored on or transmitted by any electronic device. Digital evidence is that electronic evidence which is generated or converted to a numerical format. Therefore, the EVIDENCE Project activities are based upon its own core definition, capable, in our opinion to catch all various sides, challenges of Electronic Evidence, relying on its very general abstraction level. Based upon this definition our statement is that within the Electronic Evidence category both those evidence that are born digital and not born digital but that may have become such during their life-cycle are to be included. As a matter of fact electronic evidence and digital evidence in our conceptualization do coincide (see Figure 1). Therefore, we will assume that semantically speaking Electronic Evidence is the broader class including both those records born digital as well as those ones not born digital but digitized afterwards. Once the digitization process has been carried out the Evidence becomes electronic even if it was originally non electronic or analogical. Figure 1 depicts the relationship between the Electronic Evidence and the other forms in which it may appear, with a specific focus on: Figure 1: From Sources of Evidence to Electronic Evidence 27
29 Not Electronic items that should be digitized, and therefore are afterwards treated as they were born-digital, once the authenticity is assured as related to the original one; Electronic items - some sort of analogical form, which, as in the case of the Not Electronic items, should be digitized. In the same Figure 1 it is to be noted that: Arrows represent the process of transformation needed to generate the transition from Non Electronic or from Analogical to Digital items. Lines show that no process is needed and that the evidence is per se electronic. Of course the transition from Analogical or Not Electronic to Electronic is not an essential step; it may happen but is not mandatory. In this way we can include every type of evidence present in paper documents, objects, court hearings with witnesses and other, that, due to the increasing use of ICT, are frequently objects of digitization. Therefore, we prefer to use the term Electronic evidence that in our opinion comprises a larger range of items/potential evidence. 3. ELECTRONIC EVIDENCE LIFE CYCLE Starting from the relevant concepts extracted both manually and semi-automatically, this step of the project was focused on the identification and classification of the building blocks of the conceptual model oriented to the description of the Electronic Evidence domain. The structuring is mainly based upon the electronic evidence lifecycle as described in Figure 2. Having clarified starting point of the conceptualization and the choice of the term preferred for the categorization, it is worthwhile to describe which is the flow to which actions are referred in the digital forensics domain. Therefore a brief description of the digital forensics procedures will outline the process used to manage electronic evidence. The very first milestone starts with an incident, an unlawful criminal, civil or commercial act, and sets the scene for the electronic evidence life-cycle scenario. Indeed an artefact or a record enters into the forensic process only if an incident forces it to do so. Otherwise, for all of its natural lifespan the artefact or record will remain outside the forensic process and thus forensically irrelevant though it may continue to be very relevant to its user or owner. Figure 2: Electronic Evidence management timeline/life cycle 28
30 The phases we have taken into consideration are chiefly based on already existing investigative process models and ISO that represents a point reference with the aim of creating a harmonized model on the basis of about other existing models. The digital evidence management timeline/lifecycle consists of six main different phases, regarding the handling of electronic evidence, starting from the incident event: Case Preparation: this is the first step of the digital evidence management timeline and it comprises organizational, technical and investigative aspects Evidence Identification: this is the step consisting of examining/studying the crime scene in order to preserve, as much as possible, the original state of the digital/electronic devices that are going to be acquire. Evidence Handling: this is the step where it is defined which specific standard procedures are to be followed, based on the kind of device is being handled. Evidence Classification: this is the step consisting of identifying the main features and the status of the device, taking notes about Case ID, Evidence ID, Seizure place/date/made by/ Evidence type, picture, status, etc. Evidence Acquisition: this is one of the most critical phase within the digital evidence handle processes: the forensics specialist must take care of the potential digital evidence in order to preserve its integrity during the following processes till to the presentation before a Court. Evidence Analysis: this is a process heavily affected by the kind of case under investigation, the type of evidence to be handled and the features related to each of the evidence to be examined (e.g. installed operating system, type of file system, etc.). Evidence Reporting: this is one of he most critical steps. After the completion of identification, acquisition and analysis activities digital evidence specialists have to complete their job producing a report with all the activities carried out and the outcome achieved. The report must contain all details to allow the specialists to testify before a Court only relying on that document. The investigation process model depicted in Figure 2 represents a simplified view of the whole process, because some concurrent processes have not been represented, such as Obtaining authorization, Documentation, Managing information flow, Preserving chain of custody, Preserving digital evidence. Furthermore it s not a sequential flow, it may be circular in some points and it might have back up to certain steps, Such example could be: The analysis can reveal that some references to data sources have not been acquired. During the acquisition phase it might be possible to reconsider the acquisition plan to include more data sources. During presentation some questions may arise requiring further analysis in order to provide satisfactory answers. More and more evidence may be generated in the course of most court hearings with witnesses being recorded and their testimony entered into the official court record, irrespective of whether a case is criminal or civil. Furthermore in our specific view once the reporting phase is accomplished, the electronic evidence may open to the scenario of Electronic Evidence Exchange. In this case the further step dedicated to the Presentation may take place before a National Court or before another EU Member State. 29
31 Figure 3: Overview of exchange data between Legal Authorities Figure 3 outlines at a high level of description the exchange process that takes place between the Requesting and Requested Legal Authorities involved in the case after the analysis or interpretation is completed. 4. MID-TERM RESULTS In order to produce the Road Map a specific set of objectives have been considered essential and a group of mid-term results have been achieved. 4.1 ELECTRONIC EVIDENCE DOMAIN CATEGORIZATION It has been developed, within the activities carried out in the Categorization work package 6, a common and shared understanding on what electronic evidence is and which are the relevant concepts of electronic evidence in involved domains and related fields such as digital forensic, criminal law, criminal procedure and criminal international cooperation. A mind map representation of the whole categorization is visible via the following address: LEGAL ISSUES PRELIMINARY RESULTS On of the main goal of the project, addressed by the Legal Issues work package 7, is the identification of a legal framework in the EU Member States, governing the implementation of new technologies in processing evidence, including trans- border exchange. Some general consideration have been achieved on the basis of a pilot comparative study: There is no comprehensive international or European legal framework relating to e-evidence, only few relevant legal instruments (e.g.: Cybercrime Convention); Although some regulation exists at national level, rules vary considerably even among countries with similar legal traditions (e.g. on admissibility issues); It has been gradually developing an interpretative evolution of the national criminal laws so to apply (also) to e-evidence (amendments to existing norms); There has been a increase in knowledge and expertise of actors involved in the handling of e- evidence, but lack of specific standards is still missing; Several national data protection laws have been modified as a consequence of the introduction of antiterrorism measures; Different Laws and practices of member states contribute to create a situation of legal and practical 6 The activities have been developed by the CNR-ITTIG (Italy) and CNR-IRPPS (Italy), partners of the Evidence project. 7 The activities have been developed by the University of Groningen (The Netherlands), partner of the Evidence project. 30
32 uncertainty. 4.3 DATA PROTECTION ISSUES Another crucial goal of the project, addressed by the Data Protection Issues work package 8, is the identification of data protection issues and remedies regarding the process of gathering and using electronic evidence. The following general consideration have been determined: Secondary law: there is no valid regulations addressing data protection issues related to the collection of electronic evidence Conventions: Cybercrime Convention contains procedural regulations on the collection of electronic evidence and data protection safeguards European Convention on Mutual Assistance in Criminal Matters addresses the exchange of evidence Art. 82 (2) TFEU: The EU has a legal competence to harmonise particular aspects of criminal procedure law such as: admissibility, which includes rules on means of collecting electronic evidence; This competence could be used to set up a minimum standard of privacy safeguards to be established in relation to the use of certain means of collecting electronic evidence. Moreover in most domestic legal frameworks rather few and not necessarily sufficient and/or congruent privacy safeguards related to electronic evidence exist. Such examples could be: Procedural Law: Structure and Rules - very few definitions of electronic evidence exist; Cross-Border Scenarios & International Law - in Cloud computing environments legal issues are not sufficiently or not at all addressed by law; Investigative Measures - Existing rules often apply both to physical and electronic evidence Admissibility - Not regulated specifically 8 The activities have been developed by the Leibniz Universität Hannover (Germany), partner of the Evidence project. 4.4 DIGITAL FORENSICS TOOLS CATALOGUE Starting from the Digital Evidence life-cycle shown in Figure 2, there are already standards for many of phases depicted. In particular for the acquisition and investigative processes the ISO 27043, ISO and ISO represent points of reference. In composing the overview of existing standard for the handling of electronic evidence, within the activities related to the Standard Issues work package 9, a huge number of digital forensics tools have been gathered and there has been created a Digital Forensics Tools Catalogue, concerning tools for the Acquisitive and Analysis phases as described at different levels of details by the ISO/IEC standards, above mentioned. The Catalogue represents the overview of forensics tools for handling digital evidence, generally accepted in the EU member states. The Catalogue, in its current version 1.0 dated February 2015, comprises over tools divided into two main branches: Acquisition and Analysis. The Digital Forensics Catalogue is visible via the following URL: MARKET SIZE MAP OF ACTORS Another relevant goal of the project, addressed by the Market Size work package 10, is the identification and classification of the main types of actors involved in the "social arena" of electronic evidence. There are two type of actors having a direct interest in electronic evidence: Process Actors: public and private actors involved in handling the electronic evidence; Context Actors: actors providing technical solution and assistance in this field. Furthermore there are nine typological areas of Process Actors, in turn comprising a total of 40 types of actors: Public law enforcement and Intelligence 9 The activities have been developed by the CNR-ITTIG (Italy), partner of the Evidence project 10 The activities have been developed by the Laboratory of Citizenship Sciences (Italy), partner of the Evidence project. 31
33 agencies (e.g. Law enforcement officers, Detectives, Intelligence agencies); Actors of legal criminal trial (e.g. Judges, Prosecutors, Lawyers, etc.); Notaries; Public register actors (e.g. Business register actors, Civil acts register actors, Landregister actors, etc.); Forensic examiners (e.g. Fraud examiner, Forensic laboratory staff member, Digital Evidence First Responder, etc.); Private investigators; Hardware producers (e.g. Hardware producers for Computer Forensics, for Mobile Forensics, etc.); Technology/software producers (e.g. Software houses that produce complete commercial toolkits for forensic analyses, that make software for specific commercial analyses, etc.); Service providers (e.g. Major consulting firms, Associated professional studios, etc.). Finally ten typological areas of Context Actors, in turn containing twenty six types of actors, can be enumerated: Specialized International Organizations (e.g. UN agencies concerned with justice and technological innovation, etc.); Law making bodies (e.g. European organizations, National governments); Technological innovation actors linked to the Internet (e.g. Internet service providers, Cloud technology providers); Legal and forensic associations and networks (e.g. General legal and forensic associations and networks, Associations and networks concerned with issues linked to new technologies); Research bodies, associations and networks (e.g. Organizations and associations concerned with Internet and ICT, Academic institutions concerned with ICT, etc.); Actors involved in the field of human rights (e.g. Civil rights organizations, Privacy protection organizations, etc.); 32 The media (e.g. Traditional and Social media, etc.); Enterprises interested in the proper functioning of justice (e.g. Individual firms, Business associations); Transnational projects (e.g. Digital forensics research projects and training); Other actors collecting evidence (e.g. Public and Private actors that collect data / potential evidence). 5. ELECTRONIC EVIDENCE EXCHANGE STAUS QUO OVERVIEW As far as the Exchange process (see Figure 3) is concerned, there is no standard published or proposed, furthermore it represents one of the essential points of the EVIDENCE Project that aims to facilitate and foster the exchange between different authorities and across the EU Member States. The project aims at defining functional specifications for exchanging digital evidence, in such way that no matter what forensic tool is being used by an examiner, the results of his or her examination must be verifiable by another examiner, independent of the tool being used as long as the tools are comparable in specification and function. On the basis of the information gathered so far, it seems that, at the moment, in cross-borders criminal cases, cooperation is mostly based upon international agreement or letter rogatory to the foreign Court. Independently from the legal framework identified by the EU Member States, the cooperation is mostly human based where the electronic evidence exchange is carried out between judicial stakeholders from a source EU authority to another judicial authority in the target EU member state. This approach is similar across countries and, at first glance, the Exchange does not appear based on any electronic means at all. In most cases the forensics copy of the original source of evidence is exchanged: a judicial/police authority from an EU member state A (requested authority) requests an EU member state B (requesting authority) to generate a forensics copy, based on mutual trust between the two competent authorities. Later the exchange of the forensic copy will be attained on human based: the authority from
34 country A instructs someone to take the copy or the copy is delivered by a secure courier to the requested authority. In any case it has to emphasize that no electronic means is involved in the exchange process. To facilitate human cooperation, institutions such as EuroJust, EuroPol, InterPol put in place systems or platform in order to communicate/share relevant information. There are two different cross-borders cooperation levels: the judicial cooperation based, almost exclusively, via the regular international procedures for mutual assistance in criminal matters, regulated by strict procedures, time-consuming and unpredictable, but the only way for an evidence exchange, the investigation cooperation simpler and quicker but only for operational, technical information or coordination activities. During investigations there may be an information exchange that cannot be used during the trial over the pleading stage. In many cases judicial authorities act relying on international agreement established through Eurojust to coordinate investigations and prosecutions between the EU Member States when dealing with cross-border crime. The exchange of the electronic evidence should take place in a secure environment, relying on a service for exchanging the evidence in a secure manner. In order to achieve this goal such a service will rely on digital certificates in order to certify the proprietary of a public key. This would allow any judicial authorities (relying parties) to rely upon signatures or assertions made by the private key that corresponds to the certified public key. 6. ELECTRONIC EVIDENCE EXCHANGE: EXISTING PLATFORMS There are already existing platforms for the information exchange, but, for confidential reasons it has almost been impossible to collect detailed information about their architecture and the kind of information exchanged. The main important system in the evidence exchange is: SIENA, that stands for Secure Information Exchange Network Application. It is a secure communication system managed by Europol, dedicated to the EU law 33 enforcement community and based on Universal Message Format (UMF) standard. SIENA is used for exchanging personal information related to the crime areas within the mandate of Europol, including EU restricted information. 7. ELECTRONIC EVIDENCE EXCHANGE: PROPOSED STANDARDS The requirement upon a standard language to represent a broad range of forensics information and processing result has become an increasing need within the forensics community. For the electronic evidence exchange a similar need has to be addressed even though the aim of the exchange may address different issues, for example malware analysis, relevant artifacts exchange, tools result comparison. Research activities conducted in this field have been used to develop and propose many languages. CybOX (Cyber Observable expression) is one of the most important languages that have been recently proposed. It has been devised along with other related languages, by Mitre.org such as CAPEC (Common Attack Pattern Enumeration and Classification), STIX (Structured Threat Information expression) and TAXII (Threat Automated exchange of Indicator Information). The use of standard languages for the information exchange has been dealt in recent scientific contributions, published in 2014, by the European Union Agency for Network and Information Security (ENISA) and in particular Actionable information for Security Incident Response and in Standards and tools for exchange and processing of actionable information. Another relevant resource is a recent document (Casey, 2014), that proposed DFAX (Digital Forensic Analysis expression), that leverages CybOX for representing the technical information. 8. ELECTRONIC EVIDENCE EXCHANGE CHALLENGES The regular international procedures for mutual assistance in criminal matters are time-consuming and unpredictable, but they represent, at the moment, the only way for the evidence exchange. Nevertheless the current situation may pose obstacles for fighting against serious cross-border
35 and organized crime especially in investigative case where time is crucial. Furthermore, when it comes to Electronic Evidence Exchange, a group of questions are to be born in mind: What information should be exchanged? When may the exchange take place? How the information could be exchanged, even taking into consideration security issues? Which kind of stakeholders are involved? The present situation raises three main issues: exchange evidence procedures may be slow. This aspect must be especially born in mind in investigative cases where time is crucial for fighting against serious cross-border and organized crime; exchange evidence procedures may involve big expenses, such in case of travelling abroad to take the original/copy source of evidence to be handled; Judicial and Police authorities must invest lots of money to keep up with the development of forensics technology. In order to address the issues a possible solution could be using a cloud environment, centralized or distributed, for exchanging/sharing evidence where the users could be competent authorities (e.g. judicial, police, etc.) but private subjects as well. This platform could speed up the exchange procedures and it could avoid, except for special cases, travelling abroad to take the original source of evidence. Moreover, through a digital platform, a wider cooperation could be put in place and, for example, specific technical support could be requested through the same digital platform, from a police authority to another located in a different EU member state. A more developed technological cooperation among the involved authorities could optimize costs and better distribute resources. 9. CONCLUSIONS At the moment, there is no standard for the exchange and it is mostly human based. Only in case of data held by third-parties there is a wellestablished cooperation between judicial authorities and Internet Service Providers (ISP). In this context the exchange is managed through platforms provided by ISPs via web. This scenario may pose serious issues: exchange evidence procedures may be slow: it must be especially born in mind in investigative cases where time is crucial for fighting against serious cross-border and organized crime; exchange evidence procedures may involve big expenses, such as in the case of traveling abroad to take the original/copy source of evidence to be handled; Judicial and Police authorities must invest lots of money to keep up with the development of forensics technology: expenses related to software updating and keeping up personnel competencies; exchange desperately needs trusted procedures and environments between involved stakeholders So the way forward for the electronic evidence exchange would be introducing a cloud environment to be used from judicial and police authorities and by private stakeholders in order to speed up the process, optimize costs and foster a more developed cooperation and trust among the involved competent authorities. Moreover, using this platform could be possible to carry out an electronic evidence exchange using specific meta data along with the data related to the source of evidence. This meta data, expressed in an open standard language could describe the digital evidence in a unique way and be used by software companies/producers to represent the widest range of forensic information and forensic processing results in order to share structured information between independent tools and organizations. REFERENCES Carrier, B. (2006). Hypothesis-Based Approach to Digital Forensic Investigations. Center for Education and Research in Information Assurance and Security. Purdue University. Casey, E. (2011). Digital Evidence and Computer Crime. Forensic Science, Computers, and the Internet. Elsevier, Third Edition. Casey, E., Back, G., Barnum, S. (2015). Leveraging CybOX to standardize representation and 34
36 exchange of digital forensic information. Digital Investigation, 12S, Elsevier. Council of Europe. (2013). Electronic Evidence Guide. Retrieved on February 2015 from rime/cybercrime/documents/electronic%20evid ence%20guide/default_en.asp Daniel, L., Daniel, L. (2011). Digital Forensics for Legal Professionals. Syngress Media Inc. ISO/IEC (2012). Guidelines for identification, collection, acquisition and preservation of digital evidence. Retrieved on March 2015 from atalogue_detail.htm?csnumber=44381 ISO/IEC (2015). Incident investigation principles and processes. Retrieved on March 2015 from atalogue_detail.htm?csnumber=44407 Garfinkel, S. L. (2012). Digital forensics XML and the DFXML toolset. Digital Investigation. Elsevier. Mason, S. (2012). Electronic Evidence, third edition. LexisNexis Butterworths. Peterson, G., Sujeet, S. (2012). Advances in Digital Forensics VIII, Editors: Peterson, Gilbert, Shenoi. Springer. 35
37 A COLLISION ATTACK ON SDHASH SIMILARITY HASHING Donghoon Chang, Somitra Kr. Sanadhya, Monika Singh, Robin Verma, Indraprastha Institute of Information Technology Delhi (IIIT-D), India. ABSTRACT Digital forensic investigators can take advantage of tools and techniques that have the capability of finding similar files out of thousands of files up for investigation in a particular case. Finding similar files could significantly reduce the volume of data that needs to be investigated. Sdhash is a well-known fuzzy hashing scheme used for finding similarity among files. This digest produces a score of similarity on a scale of 0 to 100. In a prior analysis of sdhash, Breitinger et al. claimed that 20% contents of a file can be modified without influencing the final sdhash digest of that file. They suggested that the file can be modified in certain regions, termed gaps, and yet the sdhash digest will remain unchanged. In this work, we show that their claim is not entirely correct. In particular, we show that even if 2% of the file contents in the gaps are changed randomly, then the sdhash gets changed with probability close to 1. We then provide an algorithm to modify the file contents within the gaps such that the sdhash remains unchanged even when the modifications are about 12% of the gap size. On the attack side, the proposed algorithm can deterministically produce collisions by generating many di erent files corresponding to a given file with maximal similarity score of 100. Keywords: Fuzzy hashing, similarity digest, collision, anti-forensics. 1. INTRODUCTION The modern world has been turning increasingly digital: conventional books have been replaced by ebooks, letters have been replaced by s, paper photographs have been replaced by digital image and compact audio and video cassettes have been replaced by mp3 and mp4 CD/DVD s. Due to the reducing costs of storage devices and their ever increasing size, people tend to store several (maybe, slightly di erent) versions of a file. In case a person is suspected of some illegal activity, security agencies typically seize their digital devices for investigation. Manual forensic investigation of enormous volume of data is hard to complete in a reasonable amount of time. Therefore, it may be helpful for an investigator to reduce the data under investigation by eliminating similar files from the suspect s hard disk. On the other hand, in some situations, the investigator might be interested in looking only at files similar to a given file in order to investigate modifications to that file. Most forensic software packages contain tools which check for similarity between files. Automatic filtering is normally done by measuring the amount of correlation between files. However, correlation method does not work well if the adversary deliberately modifies the file in such a manner that the correlation value becomes very low. For example, a C program can be modified by changing the names of variables, writing looping constructs in a di erent way, adding comments etc. Ideally, an investigator would like to e ciently know the percentage change in two versions of a file so that he can concentrate on files which are slightly di erent from a desired file. Using Cryptographic Hash Function (CHF) as a digest of the file does not work in this situation as even a single bit change in the file content 36
38 is expected to modify the entire digest randomly by the application of a CHF. Approximate Matching is a technique for finding similarity among given files, typically by assigning a similarity score. An approximate matching technique can be characterized into one of the following categories: Bytewise Matching, Syntactic Matching and Semantic Matching (Breitinger, Guttman, McCarrin, & Roussev, 2014). Bytewise Matching relies on the byte sequence of the digital object without considering the internal structure of the data object. These techniques are known as fuzzy hashing or similarity hashing. Syntactic Matching relies on the internal structure of the data object. It is also called Perceptual Hashing or Robust Hashing. Semantic Matching relies on the contextual attributes of the digital objects. Sdhash, proposed by Roussev (Roussev, 2010a) in 2010, is one of the most widely used fuzzy hashing schemes. It is used as a third party module in the popular forensic toolkit Autospy/Slueuth-kit 1 and in another toolkit BitCurator 2. Breitinger et al. analyzed sdhash in (Breitinger, Baier, & Beckingham, 2012; Breitinger & Baier, 2012) and commented that approximately 20% of the input bytes do not influence the similarity digest. Thus it is possible to do undiscovered modifications within gaps. In this work, we show that this claim is not entirely correct. We show that if data between the gaps is randomly modified then the digest changes even when the modifications are only about 2% of the gap size. After that we propose an algorithm which can generate multiple files having sdhash similarity score of 100 corresponding to a given file, by modifying upto 12% of the gap size. The proposed algorithm can also be used to carry out an antiforensic mechanism that defeats the purpose of digital forensic investigation by filtering out similar files from a given storage media. An attacker could generate multiple dissimilar files corresponding to a particular file with 100% matching sdhash digest using our technique. 1 Autopsy 3rd Party Modules 2 The rest of the paper is organized as follows: We discuss related literature in 2. Notations and definitions used in the paper are provided in 3. The sdhash scheme is explained in 4 and existing analysis of the scheme is presented in 5. 6 contains our analysis and attack on sdhash, followed by our proposed algorithm. Finally, we conclude the paper in 7 and 8 by proposing solutions to mitigate our attack on sdhash. 2. RELATED WORK The first fuzzy hashing technique, Context Triggered Piecewise Hashing (CTPH) was proposed by Kornblum (Kornblum, 2006) in his tool named ssdeep. The CTPH scheme is based on the spamsum algorithm proposed by Andrew et al. (Tridgell, 2002) for spam detection. The ssdeep tool computes a digest of the given file by first dividing the file into several chunks and then by concatenating the least significant 6- bits of the hash value of each chunk. A hash function named FNV is used to compute the hash of each chunk. Chen et al. (Chen & Wang, 2008) and Seo et al. (Seo, Lim, Choi, Chang, & Lee, 2009) proposed some modifications to ssdeep to improve its e ciency and security. Baier et al. (Baier & Breitinger, 2011) presented thorough security analysis of ssdeep and showed that it does not withstand an active adversary for blacklisting and whitelisting. Roussev et al. (Roussev, 2009, 2010a) proposed a new fuzzy hashing scheme called sdhash. The basic idea of sdhash scheme is to identify statistically improbable features based on the entropy of consecutive 64 byte sequence of file data (which is called a feature ) in order to generate the final hash digest of the file. Breitinger et al. (Breitinger & Baier, 2012) showed some weaknesses in sdhash and presented improvements to the scheme. Detailed security and implementation analysis of sdhash was done in (Breitinger et al., 2012) by the same authors. This work uncovered several implementation bugs and showed that it is possible to beat the similarity score by tampering a given file without changing the perceptual behavior of this file (e.g. image files look almost same despite the tampering). 37
39 3. NOTATIONS Following notations are used throughout this work: D denotes the input data object of N bytes, D=B 0 B 1 B 2...B N,whereB i is the i th byte of D. f k is a L byte subsequence of consecutive bytes of data object D. It is termed the kth feature of the data object. In the sdhash implementation, L = 64, f k = B k+0 B k+1 B k+2...b k+63 where 0apple k< n and n is the total number of features of the data object D. Thus n = N - L + 1. H(X) represents the entropy of random variable X. H max (X) represents the maximum entropy of random variable X. H min (X) represents the minimum entropy of random variable X. H norm (X) denotes the normalized entropy of random variable X. nbf k denotes the next byte of feature f k of data object D. R prec,d (f k ) denotes the precedence rank of feature f k of data object D. R pop,d (f k ) denotes the popularity score of feature f k of data object D. bf denotes the bloom filter of 256 bytes. 4. DESCRIPTION OF SDHASH We now describe the working of sdash using the notation defined in 3. Given a data object D of length N bytes (B 0 B 1 B 2...B N 1 ), a feature f k is a subset of L (= 64) consecutive bytes of D, that is f k :B k+0 B k+1 B k+2...b k+63 where 0applek<n and n = N-L+1. In order to generate sdhash fingerprint, the first step is to calculate the normalized entropy of each feature. Entropy of a random variable X with probability P X and alphabet is defined as H(X) = X x2 (P [X = x]log 2 P [X = x]) The entropy of X attains its maximum value if P [X = x] = 1, 8x 2 ; that is, if all possibilities for X are equiprobable. This maximum value of entropy H max (X) is log 2. Similarly, the entropy is minimum if 9x 2 : P [X = x] = 1; hence H min (X) = 0. Entropy of a random variable ranges between 0 to log 2. Normalized entropy of a random variable X is defined as H norm (X)= H(X) H. max(x) The normalized entropy ranges between 0 to 1. Random variable in the context of a feature (f k ) is the next byte of the feature (f k ), represented as nbf k. In sdhash implementation, is the set of all possible 256 values of x. The probability distribution of nbf k is defined as: 8x 2 P[nbf k = x] = {j B k+j = x, 0 apple j<64} 64 where f k =B k+0 B k+1 B k+2...b k+63. The Entropy of nbf k is: bf represents the number of features within bloom filter bf. bf denotes number of bits set to one within the bloom filter bf. t denotes some threshold (sdhash uses t = 16). SF score (bf 1,bf 2 ) represents the similarity score of bloom filter bf 1 and bf H(nbf k )= P x2 (P [nbf k = x]log 2 P [nbf k = x]) H max (nbf k )= log 2 = 8 and H min (nbf k )=0. H(nbf Normalized entropy of nbf k is k ) H max(nbf k ) = H(nbf k ) 8. Range of normalized entropy of nbf k is 0 to 1. It is being scaled up to the range 0 to 1000 and represented by H norm (nbf k ): H norm (nbf k )= 1000 H(nbf k) 8
40 After calculating the normalized entropy of each feature, a precedence rank is assigned to the respective feature of the data object D based on the empirical observation of probability density function for normalized entropy of experimental data set. Let Q is the experimental data set of q data objects D 1 D 2 D 3...D q of same type and same size. Here the random variable is normalized entropy of next data object s nbf k of set Q, represented as nenfd Q. Let A is a set of integers from 0 to 1000 i.e. 0,1,2,...,1000. For a2a P [nenfd Q = a] = {(i,k) H norm(nbf i k )=a,0applek<n,0applei<q} qn where q is number of data object in set Q, nbf i k is next byte of feature f k of D i data object, 0applei<q, 0applek<n. H norm (nbf i k ) is normalized entropy of nbf i k. Each Di consists n features. A characteristic probability distribution of each type of data object (i.e. doc, html, gz etc.) can be found. Figure 1: Empirical probability density function for experimental data set of doc files taken from (Roussev, 2009, 2010a) Based on the probability distribution, each element of set A (all possible outcomes) is now assigned a rank. Let t a =Pr[nenfdQ=a] 8 a2a, where A is the set of integers {0,1,2,3,....,1000}. t 0 =Pr[nenfdQ=0], t 1 =Pr[nenfdQ=1],... t 1000 =Pr[nenfdQ=1000]. We assign a rank r i to each t i as follows: r i =1000 if t i is the largest, and r i =0 if t i is the smallest. Now each feature f k of D is assigned a precedence rank R prec,d (f k ) as follows: 8 f k of D, R prec,d (f k )=r i,where Pr[nenfd Q=H norm (nbf k )]=t i where D is the given data object, n is number of features of data object D and 0applek<n. Data type of data object D and data objects D i (0applei<q) of set Q is the same. H norm (nbf k ) is normalized entropy of next byte of feature f k of data object D. Essentially, the least common f k gets the lowest rank whereas the most common one is assigned the highest rank. Now based on the precedence rank, each feature(f k ) is assigned a popularity score denoted by R pop,d (f k ). The non-zero popularity score of a feature f k of a data object D shows that there are (R pop,d (f k )+W-1) W-neighboring features of a feature f k such that the precedence rank of its left neighboring feature f i is greater than the precedence rank of f k ; and precedence rank of its right neighboring features f j is greater than or equal to precedence rank of f k where i<k, j>k; and number of f i + number of f j =(R pop,d (f k )+W-1). W-neighboring features of feature f k : A feature f nb is called a W-neighboring feature of feature f k if k-w <nb<k or k<nb< k+w. If k-w < nb < k: feature f nb is called left neighboring feature of feature f k. If k < nb < k+w : feature f nb is called right neighboring feature of feature f k. R pop,d (f k ) for 0applek<n is calculated as follows: Initialize R pop,d (f k ) = 0 for each 0applek<n. Consider a window of size W (64). For every sliding window leftmost feature f k with lowest R prec,d (f k ) is taken and value of R prec,d (f k ) is incremented by 1. Slide window by one and same steps are repeated (n-w) times,. Fig.2 shows an example of the R pop,d (f k ) calculation of a data object D where n=18. 39
41 Figure 2: Popularity Rank Calculation from (Roussev, 2009, 2010a) Now features with R pop,d (f k ) t(threshold) are selected (in sdhash implementation t=16). Selected features are the least likely features to occur in any data object. These features are called Statistically Improbable Features. These Statistically Improbable Features will be used to generate fingerprint of the data object D. Let {f s0,f s1,....,f sx } are the selected features, where 0< x<n, and n is the total number of features of data object D. SHA-1 hash of each selected feature is calculated. Then the resultant 160 bit hash is split into 5 chunks of 32 bits and least significant 11 bits of each chunk are used to address a bit in the bloom filter array. sdhash implementation uses 256 byte (2 11 bit) bloom filter with maximum 128 element per filter (i.e. 5 bits per feature hence 640 bits per bloom filter). Similarity between two di erent sdhash digests is defined as the number of overlapping bits of the corresponding bloom filters. Let bf 1 and bf 2 are two bloom filters. Then the similarity between two bloom filter is defined as follows and represented as SF score (bf 1,bf 2 ): ( 0, if e apple C SF score (bf 1,bf 2 )= [100 e C E max C ], otherwise where e = bf 1 \ bf 2 (number of overlapping bits), C is the cutto of minimum and maximum number of possible overlapping bits by chance, defined as C= (E max E min )+E min and E min, E min are minimum and maximum number of overlapping bits by chance respectively. 5. EXISTING RESULTS: Implementation and security analysis of sdhash has been done by Breitinger et al. (Breitinger et al., 2012). Two of the implementation bugs, Window size bug and left most bug mentioned in (Breitinger et al., 2012) still exist in the latest version 3.4 of sdhash implementation. Listing 1 shows the implementation of above stated bugs. At line number 13, there is an error in first condition that causes incorrect identification of minimum precedence rank (R prec,d (f k )), referred to as the Window size bug. This error can be removed by replacing the first condition of while loop with chunk ranks[i+pop win-1] min rank. There is another error in the if condition at line number and 26-27, that has been referred to as the Left most bug. If two features (f i,f j ) have equal precedence rank (R prec,d (f i ) =R prec,d (f j )) and are lowest within a popularity window, then this condition will cause the selection of right most feature that contradicts the proposed sdhash scheme. According to the proposed sdhash scheme (Roussev, 2010b), the left most feature with lowest precedence rank should get selected. In order to mitigate this bug, line numbers & should be removed from the code. We corrected the above mentioned bugs in the provided sdhash version 3.4 3, and the same corrected code is used to carry out experiments stated in 6. In the same work (Breitinger et al., 2012), the authors have indicated that undiscovered modifications to the input of sdhash are possible. However, details of how to achieve this are not provided. An undiscovered modification means that input can be modified without influencing the final sdhash digest. Therefore two or more files can generate same sdhash digest, which is called collision in terms of cryptographic hash functions and such files are called colliding files in rest of this paper. Collision detection violates one of the basic properties of similarity preserving hash functions called Coverage (Breitinger & Baier, 2012). Every byte of input is expected to influence the final sdhash digest. Breitinger et al. (Breitinger & Baier, 2012) have statistically shown that 20% of the input bytes are not part of any selected fea
42 Table 1: Di erent statistic on sdhash from (Breitinger & Baiber, 2012) Average improved original 1 filesize* 428, ,912 2 gaps count min gap* max gap* avg gap* ratio to file size 20.65% 21.21% ture. So, these bytes are not expected to influence the final sdhash digest, and are referred to as gap. Table 1 shows statistics of both original sdhash code and improved code (after correcting the bugs discussed above). 1 void 2 sdbf :: gen chunk scores( const uint16 t chunk ranks, const uint64 t chunk size, uint16 t chunk scores, int32 t score histo) { 3 uint64 t i, j ; 4 uint32 t pop win = config > pop win size ; 5 uint64 t min pos = 0; 6 uint16 t min rank = chunk ranks [ min pos ] ; 7 8 memset( chunk scores, 0, chunk size sizeof( uint16 t)); 9 if (chunk size > pop win) { 10 for( i=0; i<chunk size pop win ; i ++) { 11 // try sliding on the cheap 12 if( i>0 &&minrank>0) { 13 while( chunk ranks [ i+ pop win ] >= min rank && i <min pos && i< chunk size pop win+1) { 14 if( chunk ranks [ i+ pop win ] == min rank) 15 min pos = i+ pop win ; 16 chunk scores [ min pos ]++; 17 i++; 18 } 19 } 20 min pos = i ; 21 min rank = chunk ranks [ min pos ] ; 22 for( j=i+1; j<i+pop win ; j++) { 23 if( chunk ranks [ j ] < min rank && chunk ranks [ j ]) { 24 min rank = chunk ranks [ j ]; 25 min pos = j ; 26 } else if ( min pos == j 1 && c h u n k ranks [ j ] == min rank) { 27 min pos = j ; 28 } 29 } 30 if( chunk ranks [ min pos ] > 0) { 31 chunk scores [ min pos]++; 32 } 33 } 34 // Generate score histogram ( for b sdbf signatures) 35 if( score histo) { 36 for( i=0; i<chunk size pop win ; i++) 37 score histo [ chunk scores [ i]]++; 38 } 39 } 40 } Listing 1: sdfb core.cc from sdhash OUR CONTRIBUTION The purpose of fuzzy hashing or similarity hashing schemes is to filter similar or correlated files corresponding to a given file that an investigator needs to examine. These schemes reduce the search space and corresponding manual e ort of analysis for the investigator. The process of filtering the files by matching them with a set of already known to be bad files is called Blacklisting. We propose a scheme that can generate multiple similar files corresponding to a given file with a similarity score of 100 for the sdhash similarity hashing. The scheme shows a weakness of the sdhash algorithm that an attacker could exploit to confuse and delay the investigative process. An example attack scenario is explained in the following paragraph. Let us suppose a scenario where a suspected person X has accessed and downloaded some proprietary images from a commercial website A while she is logged in as a registered user. X, as an anonymous owner, runs a parallel website B that hosts content from the original website available for free from a hosting location in some di erent part of the world. She intends to popularize her website B to get a large viewership 41
43 that might attract web advertisers to put their ads on the website. A consistent viewership over a period would result in high chances of advertisement hits and consequently monetary returns for X. She would recover the membership cost gradually while the rest of the revenue is profit. The original website A eventually comes to know about the existence of website B which is hosting their proprietary content. Since the owner of the domain name is registered as anonymous on records, the only way to track her is her IP address. Fortunately, the country where the website is hosted follows anti-piracy and Intellectual property protection laws. The physical location of systems on which the data of website B is stored can be determined. X uploads content downloaded from original website after putting a watermark of his own website on each image. The use of cryptographic hash functions is ruled out in that case and investigators would need a similarity digest algorithm, possibly sdhash to find the files. Here, in this condition if X has any time to prepare herself for such an investigation, she could use our tool to generate multiple similar files, with same metadata, corresponding to each file. The approach is definitely heavy on storage but can help X in increasing the e ort of the investigation by forcing the investigators to analyze the files manually. Secondly, the investigation process could also be confused as by X s claim that she is innocent and it is a work of someone else who has access to her system or even a malware. In both the cases, investigation e ort is increased many folds. Moreover, the primary purpose of a similarity digest to help investigators quickly filter out files of interest is defeated. Breitinger et al. in (Breitinger et al., 2012) mentioned that 20% of the input data can be modify without influencing the final sdhash digest. We used two approaches to verify the number of undiscovered modification within gaps. These are (1) Random modification and (2) Deliberate Modification. In the random modification approach, gap bytes are filled with randomly chosen ASCII characters. Our experiments on text files show that random modification of only 2% of the gap bytes influences the sdhash digest with probability close to 1. In the second approach of Deliberate modification we propose an algorithm for careful modifications in order to increase the available bytes for modification within gaps. Experimental analysis of the proposed algorithm shows that by using this algorithm, around 12% of the gap bytes can be modified with maximal similarity score of Random Modification We randomly choose several byte positions within the gap and modify each with a randomly chosen ASCII character to find the maximum number of random modifications within the gap that do not influence the sdhash digest of the entire document. We performed experiments on a data set of 50 text files of variable size from the T5-corpus dataset. We found that even one byte of random modification within the gap would influence the sdhash digest with an average probability of 0.22, and the modification of all bytes in the 20% gap will impact the final hash digest with probability 1. So, we focused on finding the minimum number of modifications that would influence the final sdhash digest with probability 1. We started with single byte modifications and generated more than 5000 files with only one byte tampering and evaluated its influence the hash digest. We gradually increased number of modifications until the hashes for all 5000 files got influenced. It was found that with a random modification of only around 2% bytes of the gap there is an influence on the sdhash digest of each of the randomly generated file which is on an average 0.42% of the respective file size. Experimental results for a small sample of 8 files is given in table 2. As described in 3, only the selected features (statistically improbable features) participate in the generation of final similarity digest. Therefore gaps (the data bytes which are not part of any selected feature) are expected not to influence the final hash digest. However, 42
44 Table 2: Minimum number of random modification, that modifies final sdhash digest with probability 1. S.No. File size Gap Random Modification (In KBs) (In Bytes) Bytes Gap% File% % 3% % 0.21% % 0.14% % 0.07% % 0.03% % 0.00% % 0.00% % 0.00% On an avg % 0.42% This file is not from T5-corpus database as we showed in the experiments, these bytes do influence the sdhash digest. This happens since each feature in the sdhash construction is highly correlated to its neighbors. Each feature di ers from its left and right neighbor by only one byte. For example, let D be a data object under investigation which has the following byte sequence and features. B 0 B 1 B 2 B 3 B 4 B 5...B 63 B 64 B 65 B 66 B 67...B N f 0 B 0 B 1 B 2 B 3 B 4 B 5...B 63 B 64 B 65 B 66 B 67...B N f 1 B 1 B 2 B 3 B 4 B 5 B 6...B 64 B 65 B 66 B 67...B N f 2 B 2 B 3 B 4 B 5 B 6 B 7...B 65 B 66 B 67...B N... f n B N 63 B N 62...B N 2 B N 1 B N where N is the number of bytes in the data object D, and n is the number of features in D (n=n-l+1). Each byte is part of atleast one and at-most L (i.e. 64) features. Each byte (B k ), except the first L-1 and the last L-1 bytes (LapplekappleN- L+1), is part of exactly L features. Change in any byte, B k will reflect in a change in features f k to f k L+1, which may lead to a change in the precedence ranks R prec,d (f k L+1 )tor prec,d (f k ). A change in the rank of any feature(r prec,d (f k )) will reflect in a change in the popularity score of features of D, which may a ect the list of selected features. Any modification in the list of selected features will lead to changes in the final hash digest. 6.2 Deliberate Modification The experiment results from 6.1 show that the entire 20% gap of any file cannot be modified by random modification. We now propose an algorithm that performs careful modifications in order to increase the number of changes within the gaps while still ensuring no change in the similarity digest Algorithm Description As discussed in 6.1, modification in any byte B k will influence the rank of all features containing B k. This might cause changes in the list of selected statically improbable features. In the sdhash construction, a feature with leftmost lowest rank gets selected in a popularity window. If the rank of a feature is leftmost lowest in t or more than t (threshold) popularity windows then it gets selected as a statistically improbable feature. These selected statistically improbable features participate in the computation of the final sdhash digest. Let D be a data object with f S1 and f S2 as two consecutive statistically improbable features. f 0 f 1 f 2...f S1 f S f S1 +63 f S f S2 B 0 B 1 B 2.. B S1 B S B S1 +63 B S B S2 1 f S2...f n 1 B S2..B n...b N where f s1 :B s1 B s1 +1B s B S1 +L+1 f s2 :B s2 B s2 +1B s B S2 +L+1 Data bytes B S1 +64 to B S2 1 are not a part of any selected features. The aim is to modify these bytes in such a way that modified features never get selected over f S1 and f S2. For every data byte B k,wheres 1 +L apple k apple S 2 1, a specific value among all possible ASCII characters satisfying the following two conditions is chosen: 1. R prec,d (f 0 j ) > R prec,d(f S2 ) AND R prec,d (f 0 j ) R prec,d (f S1 ) 2. R prec,d (f 0 j ) R prec,d(f j ) where (k-l+1) apple j apple k and (S 1 +L) apple k apple S 2-1 and f 0 j is modified feature f j obtained as the result modification of byte B k. The above two conditions ensure that all the modified features f 0 j have rank R prec,d(f 0 j ) greater than the rank of the right selected statistically improbable feature (j<s 2 )i.e. R prec,d (f S2 ). At the same time, 43
45 R prec,d (f 0 j ) is greater than or equal to the rank of the left selected (j>s 1 ) statistically improbable feature, i.e. R prec,d (f S1 ). It can be equal to this value because even if two features have equal rank, the left most feature always gets selected. Ultimately, no other feature gets selected over both the statistically improbable features. The above mentioned conditions are not enough if (S 2-1)-(S 1 +L) t, where L is the feature length and t is the threshold. Even if each modification satisfies both the conditions, still new features may get selected. The reason this happens is that if the distance between two selected features is more than L+t, then after modification, the rank of some modified features may become local minimum among their t or more neighbors. Since t is the threshold for a feature to get selected, it may get selected as a statistically improbable feature and hence may influence the final sdhash digest. In the case mentioned above, it needs to be verified that no modification causes any change in the list of selected statistically improbable features. To mitigate this problem, after modification of the gaps bytes being considered, the popularity score(r pop,d ) of all the features of D is calculated. If any new feature, f 0 j contains the popularity score R pop,d(f j ) > t then all the previous modifications are discarded. Similarly the gaps between each adjacent pair of selected improbable features are modified. Algorithm 1 and 2 generate the multiple colliding files corresponding to a given data object with maximal similarity score. Each execution of algorithm 1 produces a di erent file with dissimilar modification and di erent number of modifications. Therefore, we can generate G 256 di erent files with maximal similarity corresponding to a given file, where G denotes the total number of gap bytes in the data object. The attacker can easily confuse the investigator by generating a huge number of files corresponding to a malicious or desired file. Since our current implementation is focused on text files, so we have chosen the characters only from the set of 95 printable ASCII characters, starting from char 32 till char 126. The maximum number of files that can be generated are G 95, which is su ciently large even Table 3: Number of modification with maximal similarity score through proposed algorithm S.No. File size Gap Random Modification (In KBs) (In Bytes) Bytes Gap% File% % 5.99% % 2.41% % 2.13% % 1.50% % 1.09% % 3.73% % 0.56% % 0.88% On an avg % 2.28% This file is not from T5-corpus database for G = 2. We ran the proposed algorithms for the same data set of 50 text files which were used for our earlier random experiment. We found that around 12% of the gap bytes can be modified with maximal similarity score of 100 using the proposed algorithm. This is a huge improvement over the random modification case when even 2% of the gap bytes cannot be modified without changing the final sdhash digest. Experimental results for a small sample of 8 files are presented in table COUNTER MEASURES In order to reduce the amount of undiscovered modifications, we propose the following two mitigations. 7.1 Minimization of popularity score threshold Decrease in the threshold of popularity score in selection of statistically improbable features will increase the number of selected features. This, in turn, will result in the reduction of gap bytes that could be modified without a ecting the final sdhash digest. 7.2 Bit level feature formation In the sdhash scheme, each feature di ers from its neighboring features by one byte. Therefore, the attacker has 2 8 possible choices to modify the feature without influencing its neighboring feature. If each neighboring feature di ers by 44
46 Algorithm 2 Byte Modification algorithm 1: bu er. Input Data object 2: indx. index of the selected feature 3: lst indx. index of last selected feature 4: RANK(bu er,i). function that returns rank of i th feature of data object bu er 5: SCORE(bu er,i,j). function that calculates popularity score of i th feature to j th feature of data object bu er and returns an array containing popularity scores 6: flag. AbooleanVariable 7: rank indx. Unsigned int variable for rank of selected feature 8: rank lst indx. Unsigned int variable for rank of last selected feature 9: rank k, rank i. Unsigned int; Temporary variable 10: procedure modify bytes(bu er, indx, lst indx, pop win size) 11: bu er copy bu er. Creating one copy of data object 12: rank indx RANK(bu er,indx). Rank of selected feature of bu er 13: rank lst indx RANK(bu er,lst indx). Rank of last selected feature of bu er 14: for i indx 1tolst indx +1do. Run through all intermediate bytes between two selected features byte by byte 15: ch bu er[i]. ch is a char variable 16: rank i RANK(bu er copy,i). Rank of ith feature of unmodified bu er 17: for j 0to255do. Run through all ASCII value 0 to 255 until all conditions are satisfied. 18: temp rand()% : bu er[i] temp. i th byte will be replaced by randomly chosen ASCII char temp 20: flag true 21: if RANK(buffer, i) > RANK indx AND RANK(buffer, i) > RANK lst indx AND RANK(buffer, i) rank i then. Rank of Modified features should be greater than rank of selected neighboring features 22: for k i (w 1) to lst indx do. Run through features those consist i th Byte 23: rank k RANK(bu er copy,k). Rank of k th feature of unmodified bu er 24: if RANK(buffer, k) apple RANK indx OR RANK(buffer, k) <RANKlst indx OR RANK(buffer, k) < RANK k then 25: flag false 26: break. Go out of the current loop and check for other values of j 27: end if 28: end for 29: else 30: flag false 31: break. Change the i th byte to ASCII character j, check for next byte 32: end if 33: end for 34: if flag == false then 35: bu er[j] ch;. reset the j th charecter to its actual value 36: end if 37: end for 38: score SCORE(bu er,lst indx+1,indx-1) 39: high false; 40: for x 0to(indx lst indx 1) do 41: if score[x] > 16 then 42: high true break; 43: end if 44: end for 45: if high == true then 46: for z (indx 1) to lst indx do 47: bu er[z] bu er copy[z]. Revert all the changes 48: end for 49: end if 50: end procedure 45
47 Algorithm 1 1: bu er. Data object 2: chunk size. Size of data object 3: chunk score. Array of score of each feature of the data object 4: pop win size. Window size: default is 64 5: t. Threshold: default is 16 6: indx. index of the selected feature 7: lst indx. index of last selected feature: initialize with 0. 8: for i 0tochunk size pop win size do. Run through input byte by byte 9: if chunk scores[i] >tthen. Selected features 10: modify bytes(bu er, indx, lst indx, pop win size) 11:. Processing is in next algorithm 12: lst indx indx 13: 14: end if 15: end for only 1 bit (in place of the original one byte), it will reduce the number of possible choices with the attacker from 256 to 2. Hence the probability of modifying each bit without a ecting the final hash will also get reduced substantially. However, it will increase the number of features and hence the selected features, thereby causing some loss in e ciency. Increase in the number of selected improbable features will not only increase the computation time, it will also cause an increase in the size of the final sdhash digest. 8. CONCLUSION Currently sdhash is one of the most widely used byte-wise similarity hashing scheme. It is possible to do undiscovered modification to a file and yet obtain exactly the same sdhash digest. We have proposed a novel approach to do maximum number of byte modification with maximal similarity score of 100. We also provided a method to do an anti-forensic attack in order to confuse or delay the investigation process. REFERENCES Baier, H., & Breitinger, F. (2011). Security aspects of piecewise hashing in computer forensics. In IT security incident management and IT forensics (IMF), 2011 sixth intl. conference on (pp ). Breitinger, F., & Baier, H. (2012). Properties of a similarity preserving hash function and their realization in sdhash. In 2012 Information Security for South Africa, johannesburg, 2012 (pp. 1 8). Breitinger, F., Baier, H., & Beckingham, J. (2012). Security and implementation analysis of the similarity digest sdhash. In First international baltic conference on network security & forensics (nesefo). Breitinger, F., Guttman, B., McCarrin, M., & Roussev, V. (2014). Approximate matching: definition and terminology. URL nist. gov/publications/drafts/ /sp draft. pdf. Chen, L., & Wang, G. (2008). An e cient piecewise hashing method for computer forensics. In Knowledge discovery and data mining, first intl. workshop on (pp ). Kornblum, J. (2006). Identifying almost identical files using context triggered piecewise hashing. Digital investigation, 3, Roussev, V. (2009). Building a better similarity trap with statistically improbable features. In System sciences, nd hawaii intl. conference on (pp. 1 10). Roussev, V. (2010a). Data fingerprinting with similarity digests. In Advances in digital forensics vi (pp ). Roussev, V. (2010b). Data fingerprinting with similarity digests. In Advances in digital forensics vi (pp ). Seo, K., Lim, K., Choi, J., Chang, K., & Lee, S. (2009). Detecting similar files based on hash and statistical analysis for digital forensic investigation. In nd international conference on computer science and its applications. Tridgell, A. (2002). Spamsum readme. 46
48 AN EMPIRICAL STUDY ON CURRENT MODELS FOR REASONING ABOUT DIGITAL EVIDENCE Stefan Nagy 1, Imani Palmer 1, Sathya Chandran Sundaramurthy 2, Xinming Ou 2, Roy Campbell 1 1 Department of Computer Science University of Illinois at Urbana-Champaign Urbana-Champaign, IL 61801, USA 2 Department of Computing and Information Sciences Kansas State University 234 Nichols Hall Manhattan, KS 66506, USA ABSTRACT The forensic process relies on the scientific method to scrutinize recovered evidence that either supports or negates an investigative hypothesis. Currently, analysis of digital evidence remains highly subjective to the forensic practitioner. Digital forensics is in need of a deterministic approach to obtain the most judicious conclusions from evidence. The objective of this paper is to examine current methods of digital evidence analysis. It describes the mechanisms for which these processes may be carried out, and discusses the key obstacles presented by each. Lastly, it concludes with suggestions for further improvement of the digital forensic process as a whole. Keywords: digital evidence, forensic reasoning, evidence reliability, digital forensics 1. INTRODUCTION As the use and complexity of digital devices continues to rise, the field of digital forensics remains in its infancy. The investigative process is currently faced with a variety of problems, ranging from the limited number of skilled practitioners, to the difficulty of interpreting different forms of evidence. Investigators are challenged with leveraging recovered evidence to find a deterministic cause and effect. Without reliable scientific analysis, judgments made by investigators can easily be biased, inaccurate and/or unprovable. Conclusions drawn from digital evidence can vary largely due to differences in their respective forensic systems, models, and terminology. This persistent incompatibility severely impacts the reliability of investigative findings as well as the credibility of the forensic analysts. Evidence reasoning is a fundamental part of investigative efficacy, however, the digital forensic process currently lacks the scientific rigor necessary to function in this capacity. This paper presents an overview of several recent methods that propose a deterministic approach to reasoning about digital evidence. Section 2 examines past discussion on the digital forensic process. Section 3 discusses the application of differential analysis. In section 4, we review several popular probabilistic reasoning 47 models. Section 5 discusses the formalization of event reconstruction. In section 6, we consider a model that combines probabilistic reasoning with event reconstruction. Lastly, section 7 holds our conclusions and suggestions for additions to the field. 2. BACKGROUND The standard for the admissibility of evidence stems from the Daubert trilogy, which establishes the requirements of relevancy and reliability [25]. NIST describes the general phases of the forensic process as: collection, examination, analysis and reporting [23]. Formalization is necessary to ensure consistent repeatability for all investigative scenarios. In recent years, literature has addressed the need for formalization of the digital forensic process, but primarily focused on evidence collection and preservation [2]. Ieong [24] highlights the need for an explicit, unambiguous representation of knowledge and observations. While a pedagogical investigative framework exists, there is yet to be a congruous system for digital evidence reasoning within the examination and analysis phases. Currently, digital forensic analysts use a variety of methods to develop conclusions about recovered evidence, yet the results are often marred with conflicting bias or are shrouded in a veil of uncertainty. There have been
49 numerous proposed reasoning frameworks, typically relying on applied mathematics, statistics & probabilities as well as, logic. However, before we can employ any particular methodology, there is a need to examine, review and explore all options in order to carry out the investigative process with the utmost precision. As the context of investigation is expanded, so does the difficulty to identify noise [9]. 3. IFFERENTIAL ANALYSIS Differential analysis is described as a method of data comparison used for reporting differences between two digital objects. Historically, it has been part of computer science for quite some time. Unix s diff command was implemented in the early 1970 s, and is commonly used for fast comparison of binary and text files [3]. Continued advancements in hashing and metadata have since paved the way for more thorough differential analysis. It is flexible and adaptable to nearly all types of digital objects; Windows Registry hives, binary files, and disk images can all be compared for evidence of modification or tampering [4]. Nonforensic applications include security procedures of operating systems, such as Windows use of file signatures to verify integrity of downloaded driver packages [5]. Modern investigative tools such as EnCase [6], FTK [7] and SleuthKit [8] have incorporated modules for streamlining differential analysis of collected evidence, although each require significant training to become competent with the software features. Garfinkel et al. [3] formalize a model for differential analysis in the context of digital evidence; two collected objects a baseline object and a final object are compared for evidence of modification both before and after events of interest. Ideally, the process will highlight the most significant changes made from baseline! to final!, assuming those transformations resulted from actions taken by the suspect in question. In this context, differential analysis is often used to detect malware, file and registry modifications [3]. While the strategy of differential analysis is fundamentally the same regardless of which system level is being examined, each level possesses a certain degree of noise. In discussing differential analysis, will define noise as information resulting from comparison between baseline and final that is wholly irrelevant to the investigation. 48 Figure 1. Knowledge management understanding hierarchy [9]. A potential form of noise presents itself as benign modifications made to digital objects resulting from normal operation of a system. For example, an investigator may wish to examine the presence of a suspicious binary on a particular system apart of an enterprise network. The investigator selects a disk image of an identical, unmodified system from the same enterprise network to serve as the baseline for comparison. Differential analysis may reveal that the image of the system in question is incredibly anomalous compared to the baseline. This could potentially lead to the injudicious assumption that the most anomalous system is the most malicious [4], when in reality, it might have only been the result of benign modifications arising from differences in installed software. While files at the kernel level are generally protected from tampering, files in user directories are much more vulnerable to modification. Although noise is often assumed to be unintentional, it is very possible that it could be inserted on purpose. When dealing with instances of steganography, differential analysis compares objects that are known to be hiding information with those that do not. Fiore [10] describes a framework by which selective redundancy removal can be used to prepare HTML files for carrying out linguistic steganography. Since the information is being hidden through the otherwise normal process of HTML file optimization,
50 differential analysis will only appear to reveal benign occurrences, such as differences in HTML tag styling. Future research is needed to expand metrics for identifying and accounting for different forms of noise in digital evidence. Mead [1] explains the National Software Reference Library s effort to create a library of hashes of commercial software packages. Through combining hashing with differential analysis, investigators can drill-down the scope of inquiry by cross-referencing evidence with a database of known hash values. Eliminating evidence matching existing hashes can reduce the amount of noise arising from benign objects that is commonly problematic when dealing with larger systems, and better isolates the few remaining questionable objects. Further improvement of such databases, robust hashing algorithms, and perhaps a formal technique would be of benefit to investigators. 4. PROBABILISTIC MODELS Conventional forensic analysis has long included models of statistical inference to assess the degree of certainty for which hypotheses and corresponding evidence can be causally linked [11]. This casual linkage is expressed by the following: if a cause! is responsible for effect!, and! has been observed, then! must have occurred [12]. For example, researchers know that the probability of two identical DNA fingerprints belonging to two different individuals is close to one in one billion [13]. If holding an item leaves fingerprints on it, and fingerprints found on the weapon at a murder scene match the suspect s own, then investigators can conclude there is over 99% certainty that the suspect held that weapon. Because criminal investigations are ultimately abductive, probabilistic techniques have become widely accepted in the forensic reasoning process [14] [12]. 4.1 CLASSICAL PROBABILITY Several recent criminal investigations have seen classical probability used to reason about contradicting scenarios regarding the presence of incriminating digital evidence. Examining two cases originating in Hong Kong, Overill et al. [15] reasoned the likelihood that the respective defendants intentionally downloaded various forms of child pornography versus accidentally downloading it among other benign content. In each case, the amount of child pornography seized was very small compared to the total amount of miscellaneous benign content, and in both instances were found to have been downloaded over a long period of time. In each case, it was determined that the probability of unintentionally downloading a small amount of child pornography is significantly below 10% [15]. While this method can indeed provide a quantitative assessment of the likelihood of guilt, it is limited to investigations where only few characteristics of the evidential traces are known. In both examples above, the defendants pleaded guilty, and thus metadata was disregarded [15]. It was assumed that the incriminating files had been downloaded over long periods of time, but had metadata been collected, the original hypothesis may have changed entirely. An example would be the offending content timestamped to a one-hour browsing period, thus invalidating the original hypothesis of accidental download. The growing importance of preserving metadata creates the need for probabilistic models that can integrate it into reasoning. 4.2 BAYESIAN NETWORKS In the last decade, Bayesian inference has gained popularity in the scientific community. Unlike Frequentist inference that reasons with frequencies of past events, Bayesian inference reasons with subjective beliefs estimations, and allows room for new evidence to revise these beliefs [12]. Kwan et al. [14] introduced the idea of reasoning about digital evidence in the form of Bayesian networks: directed acyclic graphs whose leaf nodes represent observed evidence and interior nodes represent unobserved causes. The root node represents the central hypothesis to which all unobserved causes serve as sub-hypotheses. The model uses Bayes theorem to determine the conditional probability of evidence E resulting from hypothesis!:!(!!) =!(!)!(!!);!!(!) is the prior probability of evidence!;!(!)!is the prior probability of! when no evidence exists;!(!!) is the posterior probability such that! has occurred when! is detected. 49
51 Figure 2. Bayesian network connections: (a) Serial; (b) Diverging; (c) Converging [14]. The construction of a Bayesian model begins with the defining of a root hypothesis. An example would be The seized computer was used to send this malicious file. The possible states of the hypothesis Yes, No, and Uncertain are assigned equal probabilities. As more evidence is discovered, sub-hypotheses and their corresponding probabilities are added beneath the root hypothesis. The process is repeated until refinement produces a most likely hypothesis. However, Bayesian networks are dependent on the assignment of prior probabilities to posterior evidence [14]. In scenarios where uncertainty is present, fuzzy logic methodology is incorporated to quantify likelihood as a value between 1 (absolute truth) and 0 (false) [16]. The case study presented in [14] based its prior probabilities on results from questionnaires sent to several law enforcement agencies. Since human-computer interactions are non-deterministic, there is no systematic way to reason posterior evidential probabilities with complete certainty; conditional probabilities inferred from demonstrably normal behavior of one network might differ with those from another. Discrepancies in prior evidential probabilities can significantly impact the overall outcome of the Bayesian network, and thus, there is difficulty in soundly applying this method to digital forensic investigations. 4.3 DEMPSTER-SHAFER THEORY One of the limiting factors of using Bayesian analysis in security is that it requires the assignment of prior and conditional probabilities for the nodes in the reasoning model. Often times, the numbers are very hard to obtain. For example, how does one compute the prior probability for a particular 50 registry key being modified? As another example, how does one compute the conditional probability of a particular registry key being modified given that the malware did not gain privileged access? Bayesian analysis works very well when the reasoning structure is well known and the probabilities are easy to obtain. In the real world, it is very hard to obtain those numbers and there is a high degree of uncertainty in the obtained evidence. Dempster-Shafer theory (DST) is a reasoning technique that provides a way to encode uncertainty more naturally [17]. Contrasting with Bayesian analysis, DST does not require one to provide a prior probability for the hypothesis of interest. DST also does not require the use of conditional probabilities thus addressing the other major limitation of Bayesian analysis techniques. The presence of certain evidence during forensic analysis does not necessarily indicate a malicious activity. For example, a change in registry key could be either due to a malware or by a benign application. There is always a degree of uncertainty in the obtained evidence at any given stage of the forensic analysis process. DST enables one to account for this uncertainty by assigning a number to a special state of the evidence don t know. For example, a sequence of registry key modifications might indicate that a malware of specific family might have been downloaded. Based on empirical evidence, let us assume one believes that with 10% confidence. A probabilistic interpretation would then mean that one would believe that there is a 90% chance that the malware was not downloaded which is not intuitive. When using DST one would assign 10% to the hypothesis that the malware was downloaded and 90% to the hypothesis that I am not sure. One can explain the difference between DST and probability theory using a coin toss example. When tossing a coin with unknown bias probability theory will assign a probability value of 0.5 to both the outcomes Head and Tail. This representation does not capture the inherent uncertainty in the outcome. DST, on the other hand, will assign 0 to the outcomes {Head} and {Tail} while assigning a value of 1 to the set {Head, Tail}. This exactly captures the reasoning process of a human in that when you toss a coin (with unknown bias) the only thing you are sure about the outcome is that it could be either Head or Tail. In general, when calculating
52 the likelihood of a hypothesis DST allows admittance of ignorance on the confidence of evidence. DST provides rules for combining multiple evidences to calculate the overall belief in the hypothesis. The challenge of using DST is analogous to Bayesian analysis, though much better, in that no prior values have to be assigned to evidences. 5. EVENT RECONSTRUCTION MODELS The ability to reconstruct events is of great importance to the digital forensic process. Al- Kuwari and Wolthusen [18] proposed a general framework to reconstruct missing parts of a target trace. This can be used for various areas of an investigation. This algorithm graphs a multi-modal scenario, determining all of possible routes connecting the gaps of a specific trace. Additional information may be included in the graph and marked appropriately. The broadcast algorithm used to determine all possible routes may require exponential time, suggesting that the search area should be bounded [18]. This approach relies on a specific target and would best be used to determine if an attack on a system occurred. However, this approach poses problems for the algorithm if a specific target is not identified. Event reconstruction is not unique to digital forensics, and the ability to apply existing techniques could yield effective results. 5.1 FINITE STATE MACHINES Modern computer systems are often modeled as a series of finite states, graphically presented as a Finite State Machine (FSM). It is expressed as the quintuple!=(!,!,!,!0,!), where: Q is the finite, non-empty set of machine states Σ is the finite, non-empty alphabet of event symbols!:!!! is the transition function mapping events between machine states in Q for each event symbol in Σ s0! is the starting state of the machine!! is the set of final machine states Nodes represent possible system states Arrows represent transitions between states [19] Gladyshev and Patel [20] introduced a formalization of this model into digital forensics. By back-tracing event states, investigators are presented with a reconstruction of events and can thus select the timeline most relevant to the available evidence. For finite state machine models to perform accurately comprehensive event reconstruction, investigators must be able to account for all possible system states. Complex events, such as those resulting from advanced persistent threats, are incredibly difficult to analyze. In addition, changing factors such as software updates may affect the resulting machine states. Carrier [19] proposes the development of a central repository for hosting information about machine events. Likening it to existing forensic databases on gun cartridges, an exhaustive, continuously updated library of system events would be of invaluable aide to investigators performing event reconstruction. However, an investigator may wish to explore other characteristics of events, such as the odds of a particular investigative hypothesis, or the real time distributions of reconstructed events. To compute answers to such questions, the formalization of event reconstruction must be extended with additional attributes that describe statistical and real-time properties of the system and incident [20]. 6. COMBINING PROBABILITY WITH EVENT RECONSTRUCTION Attack graphs are typically used for intrusion analysis, where each path represents a unique method of intrusion by a malicious actor. It is possible to use attack graph techniques in the event reconstruction process. Attack graphs are directed graphs where nodes represent pre and post conditions of machine events, and directed edges are conditions met between these nodes; the root node represents the singular event of interest to which all other nodes serve as precursors [21]. While attack graphs are helpful in identifying mechanisms of intrusion, their lacking of any probabilistic inference hinders their usefulness in quantitative evidential reasoning. Investigators presented with attack graphs must select the most probable attack scenarios, but there are currently no clear metrics for assessing likelihood. To address this, Xie et al. [22] combined attack graphs with 51
53 Bayesian networks. By transferring attack graphs into acyclic Bayesian networks, this method utilizes conditional probability tables for nodes with parents, and prior probabilities for nodes without parents. Like in regular Bayesian networks, this approach relies on the investigator supplying accurate conditional and prior probabilities for each event. Estimating prior probabilities has traditionally relied on feedback from the community in the form of surveys. This becomes incredibly difficult as scale increases; a large attack graph would require that the investigator survey and obtain probability information for every unique event, making analysis costly. 7. FUTURE DIRECTION AND CONCLUSIONS Evidence reasoning models are an important part of the forensic process. Unlike traditional forensic sciences, digital forensics deals almost exclusively with objects of nondeterministic nature; there is great difficulty in analyzing and scrutinizing digital evidence. Fundamental flaws hinder current evidence analysis models in their ability to assess accurately the likelihood of crime occurrence. Furthermore, conclusions based on probabilities complicate explanations in the courtroom, as demonstrated in the legal arguments surrounding Shonubi I-V [26]. These flaws must be identified and understood to avoid the possibility of injudicious assumptions resulting from the forensic process. Differential analysis of digital evidence becomes difficult when the scope of investigation is widened; unintentional noise in the form of benign modifications may lead to dubious conclusions about system integrity. Furthermore, recent obfuscation techniques have successfully averted detection by traditional methods. Event reconstruction models are limited in their ability to provide investigators with clear attack scenarios, because they rely on the exhaustive identification of possible machine states; there is yet to be a resource providing such information. Probabilistic reasoning models rely on prior probabilities known to the investigator, which have so far mainly been determined from surveying others in the field. Besides the obvious expenditure of time and effort in conducting such surveys, it is reckless to underestimate the potential for entropy and reason that small samples of observed probabilities hold true for all investigations. It can be concluded that each of these techniques is only applicable to a small niche of forensic scenarios. The increasing rate of software development places a burden on forensic examiners to keep up with the latest software packages, both commercial and free. Each of the models discussed in this paper lacks a comprehensive database of information to conduct analysis with the highest accuracy. We highlight the need for a community-driven, updated catalogue of file hashes, machine states, and probability metrics for use in forensic analysis. The changing nature of technology and software necessitates that researchers and law enforcement collaborate to ensure the digital forensic process is as reliable as possible. 8. ACKNOWLEDGEMENT This research is partially supported by the National Science Foundation under Grants No and Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. This material is based on research sponsored by the Air Force Research Laboratory and the Air Force Office of Scientific Research, under agreement number FA The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. REFERENCES [1] Mead, S. Unique File Identification in the National Software Reference Library. Digit. Investig. 3, 3 (September 2006), [2] Stallard, T., Levitt, K Automated Analysis for Digital Forensic Science: Semantic Integrity Checking. in Proceedings of the 19th Annual Computer Security Applications Conference (ACSAC '03). IEEE Computer Society, Washington, DC, USA, [3] Garfinkel, S., Nelson, A., Young, J. A General Strategy for Differential Forensic Analysis. in 52
54 Digital Forensics Research Workshop 2012, August 2012, pages S50--S59. [4] Gielen, M.W Prioritizing Computer Forensics Using Triage Techniques. University of Twente. [5] Microsoft Windows. Microsoft. [6] EnCase. Guidance Software. [7] Forensic Toolkit (FTK). Access Data. [8] The Sleuth Kit. Carrier, D. [9] Nunamaker, N.J.J., Romano J., Briggs, R. A Framework for Collaboration and Knowledge Management. in Proceedings of the 34th Annual Hawaii International Conference on System Sciences, January [10] Fiore, U. Selective Redundancy Removal: A Framework for Data Hiding. in Proceedings of Etude de la notion de pile application à l'analyse syntaxique , [11] Overill, R.E., Silomon, J.A.M. Digital Meta- Forensics: Quantifying the Investigation. in Proceedings of the Fourth International Conference on Cybercrime Forensics Education and Training, 2010 [12] Huygen, P.E.M. Use of Bayesian Belief Networks in Legal Reasoning. in 17th BILETA Annual Conference, Amsterdam 2002 [13] Overill, R.E. Quantifying Likelihood in Digital Forensics Investigations. Journal of Harbin Institute of Technology, Vol.21, No.6, 2014 [14] Kwan, M., Kam-Pui Chow, Law, F., Lai, P. Reasoning About Evidence Using Bayesian Networks. in Advances in Digital Forensics IV, Fourth Annual IFIP WG 11.9 Conference on Digital Forensics, Kyoto University, Kyoto, Japan, January 28-30, 2008 [15] Overill, R.E., Silomon, J.A.M., Kam-Pui Chow, Tse, H. Quantification of Digital Forensic Hypotheses Using Probability Theory. in Systematic Approaches to Digital Forensic Engineering (SADFE), 2013 Eighth International Workshop on, vol., no., pp.1,5, Nov [16] Stoffel, K., Cotofrei, P., Han, D. Fuzzy Methods for Forensic Data Analysis. in Soft Computing and Pattern Recognition (SoCPaR), 2010 International Conference of, vol., no., pp.23,28, 7-10 Dec [17] Shafer, G. Probability Judgment in Artificial Intelligence and Expert Systems. Statistical Science, Vol.2, No.1 (Feb., 1987), pp [18] Al-Kuwari, S., Wolthusen, S.D. Fuzzy Trace Validation: Toward an Offline Forensic Tracking Framework. in Systematic Approaches to Digital Forensic Engineering (SADFE) IEEE Sixth International Workshop on, pages 1 4, IEEE, [19] Carrier, D A Hypothesis-based Approach to Digital Forensic Investigations. Purdue University. [20] Gladyshev, P., Patel, A. Finite State Machine Approach to Digital Event Reconstruction. Digit. Investig., 1(2): , June [21] Liu, C., Singhal, A., Wijesekera, D. Using Attack Graphs in Forensic Examinations. in Availability, Reliability and Security (ARES), 2012 Seventh International Conference on, vol., no., pp.596,603, Aug [22] Xie, P., Li, J.H., Ou, X., Liu, P., Levy, R. Using Bayesian Networks for Cyber Security Analysis. in Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on, vol., no., pp.211,220, June July [23] NIST. Guide to Integrating Forensic Techniques into Incident Response. [24] Ricci S. C. Ieong FORZA - Digital forensics investigation framework that incorporate legal issues. Digit. Investig. 3 (September 2006), [25] Vickers, A. Leah. "Daubert, Critique and Interpretation: What Empirical Studies Tell Us About the Application of Daubert." USFL Rev. 40 (2005): 109. [26] Izenman, J.A. Introduction to Two Views on the Shonubi Case. Temple University. 53
55 DATA EXTRACTION ON MTK-BASED ANDROID MOBILE PHONE FORENSICS Joe Kong Mphil Student in Computer Science The University of Hong Kong ABSTRACT In conducting criminal investigations it is quite common that forensic examiners need to recover evidentiary data from smartphones used by offenders. However, examiners encountered difficulties in acquiring complete memory dump from MTK Android phones, a popular brand of smartphones, due to a lack of technical knowledge on the phone architecture and that system manuals are not always available. This research will perform tests to capture data from MTK Android phone by applying selected forensic tools and compare their effectiveness by analyzing the extracted results. It is anticipated that a generic extraction tool, once identified, can be used on different brands of smartphones equipped with the same CPU chipset. Keywords: Mobile forensics, MTK Android phones, Android forensics, physical extraction, flash memory, MT6582. Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 54
56 OPEN FORENSIC DEVICES Lee Tobin, Pavel Gladyshev Digital Forensics Investigation Research Laboratory, University College Dublin, Ireland ABSTRACT Cybercrime has been a growing concern for the past two decades. What used to be the responsibility of specialist national police has become routine work for regional and district police. Unfortunately, funding for law enforcement agencies is not growing as fast as the amount of digital evidence. In this paper, we present a forensic platform that is tailored for cost effectiveness, extensibility, and ease of use. The software for this platform is open source and can be deployed on practically all commercially available hardware devices such as standard desktop motherboards or embedded systems such as Raspberry Pi and Gizmosphere's Gizmo board. A novel user interface was designed and implemented, based on Morphological Analysis. Keywords: Forensic device, open source, write-blocker, forensic imaging, morphological analysis, user interface design. Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 55
57 A STUDY ON ADJACENCY MEASURES FOR REASSEMBLING TEXT FILES Alperen Şahin, Hüsrev T. Sencar TOBB University of Economics and Technology, Ankara, Turkey ABSTRACT Recovery of fragmented files relies on the ability to accurately evaluate the adjacency of two fragments. Text-based files typically organize data in a very weakly structured manner; therefore, fragment reassembly remains a challenging task. In this work, we evaluate existing adjacency measures that can be used for assembling fragmented test files. Our results show that individual performances of existing measures are far from adequately addressing this need. We then introduce a new approach that attempts to exploit the limited structural characteristics of text files which utilize constructs for description, presentation, and processing of file data. Our approach builds a statistical model of the ordering of file-type specific constructs and incorporates this information into adjacency measures for more reliable fragment reassembly. Results show that reassembly accuracy increases significantly with this approach. Keywords: File carving, text files, fragmentation, file reassembly. Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 56
58 AN INTEGRATED AUDIO FORENSIC FRAMEWORK FOR INSTANT MESSAGE INVESTIGATION Yanbin Tang, Zheng Tan, K.P. Chow, S.M. Yiu Department of Computer Science, The University of Hong Kong, China {ybtang, ztan, chow, ABSTRACT Voice chat of instant message (IM) apps are getting popular. Huge amount of manpower is required to listen, analyze, and identify relevant chat files of IM apps in a forensic investigation. This paper proposes a semi-automatic integrated framework to deal with audio forensic investigation for IM apps by applying modern technologies. The main objective is to reduce the amount of manpower in the investigation. This is the first work that applies speech to text technology in voice chat of IM apps forensic. Both text and audio features are extracted to reconstruct the dialog conversation. Experiments with real case data show that the framework is promising. The framework is able to translate dialog into readable text and improve the efficiency during investigation with reconstructed conversation. Keywords: Audio, voice chat, instant message, smartphone, digital forensics. Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 57
59 PROJECT MAELSTROM: FORENSIC ANALYSIS OF THE BITTORRENT-POWERED BROWSER Jason Farina, M-Tahar Kechadi, Mark Scanlon School of Computer Science, University College Dublin, Ireland ABSTRACT In April 2015, BitTorrent Inc. released their distributed peer-to-peer powered browser Project Maelstrom into public beta. The browser facilitates a new alternative website distribution paradigm to the traditional HTTP based, client-server model. This decentralised web is powered by each of the users accessing each Maelstrom hosted website. Each user shares their copy of the website with other new visitors to the website. As a result, a Maelstrom hosted website cannot be taken offline by law enforcement or any other parties. Due to this open distribution model, a number of interesting censorship, security and privacy considerations are raised. This paper explores the application, its protocol, sharing Maelstrom content and its new visitor powered web-hosting paradigm. Keywords: Project Maelstrom, BitTorrent, Decentralised Web, Alternative Web, Browser Forensics. Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 58
60 FACTORS INFLUENCING DIGITAL FORENSIC INVESTIGATIONS: EMPIRICAL EVALUATION OF 12 YEARS OF DUBAI POLICE CASES Ibtesam Al Awadhi, Janet C Read University of Central Lancashire School of Computing, Engineering and Physical Sciences. Preston, UK {IAlawadhi, JCRead}@uclan.ac.uk Andrew Marrington Zayed University College of Technological Innovation. Dubai, UAE [email protected] Virginia N. L. Franqueira University of Derby College of Engineering and Technology. Derby, UK [email protected] ABSTRACT In Digital Forensics, person-hours spent on investigation is a key factor which needs to be kept to a minimum whilst also paying close attention to the authenticity of the evidence. The literature describes challenges behind increasing person-hours and identifies several factors which contribute to this phenomenon. This paper reviews these factors and demonstrates that they do not wholly account for increases in investigation time. Using real case records from the Dubai Police, an extensive study explains the contribution of other factors to the increase in person-hours. We conclude this work by emphasizing on several factors affecting the person-hours in contrast to what most of the literature in this area proposes. Keywords: Cyber forensics, Digital forensics, Empirical data, Forensic investigation, Dubai Police. Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 59
61 PLC FORENSICS BASED ON CONTROL PROGRAM LOGIC CHANGE DETECTION Ken Yau, Kam-Pui Chow University of Hong Kong, Hong Kong, China ABSTRACT Supervisory Control and Data Acquisition (SCADA) system is an industrial control automated system. It is built with multiple Programmable Logic Controllers (PLCs). PLC is a special form of microprocessor-based controller with proprietary operating system. Due to the unique architecture of PLC, traditional digital forensic tools are difficult to be applied. In this paper, we propose a program called Control Program Logic Change Detector (CPLCD), it works with a set of Detection Rules (DRs) to detect and record undesired incidents on interfering normal operations of PLC. In order to prove the feasibility of our solution, we set up two experiments for detecting two common PLC attacks. Moreover, we illustrate how CPLCD and network analyzer Wireshark could work together for performing digital forensic investigation on PLC. Keywords: PLC Forensics, SCADA Security, Ladder Logic Programming Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 60
62 FORENSIC ACQUISITION OF IMVU: A CASE STUDY Robert van Voorst National Police of the Netherlands Rotterdam, Netherlands [email protected] M-Tahar Kechadi, Nhien-An Le-Khac University College Dublin Dublin 4, Ireland {tahar.kechadi,an.lekhac}@ucd.ie ABSTRACT There are many applications available for personal computers and mobile devices that facilitate users in meeting potential partners. There is, however, a risk associated with the level of anonymity on using instant message applications, because there exists the potential for predators to attract and lure vulnerable users. Today Instant Messaging within a Virtual Universe (IMVU) combines custom avatars, chat or instant message (IM), community, content creation, commerce, and anonymity. IMVU is also being exploited by criminals to commit a wide variety of offenses. However, there are very few researches on digital forensic acquisition of IMVU applications. In this paper, we discuss first of all on challenges of IMVU forensics. We present a forensic acquisition of an IMVU 3D application as a case study. We also describe and analyse our experiments with this application. Keywords: Instant Messaging, forensic acquisition, Virtual Universe 3D, forensic process, forensic case study Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 61
63 CYBER BLACK BOX/EVENT DATA RECORDER: LEGAL AND ETHICAL PERSPECTIVES AND CHALLENGES WITH DIGITAL FORENSICS Michael Losavio University of Louisville Department of Criminal Justice Louisville, Kentucky U.S.A. Pavel Pastukov Perm State National Research University Department of Criminal Procedure and Criminalistics Perm, Russian Federation Svetlana Polyakova Perm State National Research University Department of English Language and Intercultural Communication Perm, Russian Federation ABSTRACT With ubiquitous computing and the growth of the Internet of Things, there is vast expansion in the deployment and use of event data recording systems in a variety of environments. From the ships logs of antiquity through the evolution of personal devices for recording personal and environmental activities, these devices offer rich forensic and evidentiary opportunities that smash against rights of privacy and personality. The technical configurations of these devices provide for greater scope of sensing, interconnection options for local, near, and cloud storage of data, and the possibility of powerful analytics. This creates the unique situation of near-total data profiles on the lives of others. We examine legal and ethical issues of such in the American and transnational environment. Keywords: event, data, recorder, legal, ethical, privacy Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 62
64 TRACKING AND TAXONOMY OF CYBERLOCKER LINK SHARERS BASED ON BEHAVIOR ANALYSIS Xiao-Xi Fan and Kam-Pui Chow The University of Hong Kong Department of Computer Science Hong Kong, {xxfan, ABSTRACT The growing popularity of cyberlocker service has led to significant impact on the Internet that it is considered as one of the biggest contributors to the global Internet traffic estimated to be illegally traded content. Due to the anonymity property of cyberlocker, it is difficult for investigators to track user identity directly on cyberlocker site. In order to find the potential relationships between cyberlocker users, we propose a framework to collect cyberlocker related data from public forums where cyberlocker users usually distribute cyberlocker links for others to download and identity information can be gathered easily. Different kinds of sharing behaviors of forum user are extracted to build the profile, which is then analyzed with statistical techniques. The experiment results demonstrate that the framework can effectively detect profiles with similar behaviors for identity tracking and produce a taxonomy of forum users to provide insights for investigating cyberlocker-based piracy. Keywords: identity tracking, taxonomy, user profiling, behavior analysis, cyberlocker, piracy Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 63
65 EXPLORING THE USE OF PLC DEBUGGING TOOLS FOR DIGITAL FORENSIC INVESTIGATIONS ON SCADA SYSTEMS Tina Wu, Jason R.C. Nurse Cyber Security Centre Department of Computer Science University of Oxford Oxford, United Kingdom {tina.wu,jason.nurse ABSTRACT The Stuxnet malware attack has provided strong evidence for the development of a forensic capability to aid in thorough post-incident investigations. Current live forensic tools are typically used to acquire and examine memory from computers running either Windows or Unix. This makes them incompatible with embedded devices found on SCADA systems that have their own bespoke operating system. Currently, only a limited number of forensics tools have been developed for SCADA systems, with no development of tools to acquire the program code from PLCs. In this paper, we explore this problem with two main hypotheses in mind. Our first hypothesis was that the program code is an important forensic artefact that can be used to determine an attacker's intentions. Our second hypothesis was that PLC debugging tools can be used for forensics to facilitate the acquisition and analysis of the program code from PLCs. With direct access to the memory addresses of the PLC, PLC debugging tools have promising functionalities as a forensic tool, such as the Snapshot function that allows users to directly take values from the memory addresses of the PLC, without vendor specific software. As a case example we will focus on PLC Logger as a forensic tool to acquire and analyse the program code on a PLC. Using these two hypotheses we developed two experiments. The results from Experiment 1 provided evidence to indicate that it is possible to acquire the program code using PLC Logger and to identify the attacker's intention, therefore our hypothesis was accepted. In Experiment 2, we used an existing Computer Forensics Tool Testing (CFTT) framework by NIST to test PLC Logger's suitability as a forensic tool to analyse and acquire the program code. Based on the experiment's results, this hypothesis was rejected as PLC Logger had failed half of the tests. This suggests that PLC Logger in its current state has limited suitability as a forensic tool, unless the shortcomings are addressed. Keywords: PLC Debugging, Program Code, SCADA, Digital Forensics, NIST, PLCs, Attackers. Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 64
66 THE USE OF ONTOLOGIES IN FORENSIC ANALYSIS OF SMARTPHONE CONTENT Mohammed Alzaabi, Thomas Martin, Kamal Taha, Andy Jones Khalifa University of Science, Technology and Research Sharjah, UAE ABSTRACT Digital forensics investigators face a constant challenge in keeping track with evolving technologies such as smartphones. Analyzing the contents of these devices to infer useful information is becoming more time consuming as the volume and complexity of data are increasing. Typically, such analysis is undertaken by a human, which makes it dependent on the experience of the investigator. To overcome such impediments, an automated technique can be utilized in order to aid the investigator to quickly and efficiently analyze the data. In this paper, we propose F-DOS; a set of ontologies that models the smartphone content for the purpose of forensic analysis. F-DOS can form a knowledge management component in a forensic analysis system. Its importance lies in its ability to encode the semantics of the smartphone content using concepts and their relationships that are modeled by F-DOS. Keywords: Digital Forensics, Forensic Analysis, Ontology. Full version of this paper is published in the Journal of Digital Forensics, Security and Law ( 65
67 Carsten Rudolph, Nicolai Kuntze, Barbara Endicott-Popovsky, Antonio Maña Proceedings of the 10th International Conference on Systematic Approaches to Digital Forensic Engineering The SADFE series feature the different editions of the International Conference on Systematic Approaches to Digital Forensics Engineering. Now in its tenth edition, SADFE has established itself as the premier conference for researchers and practitioners working in Systematic Approaches to Digital Forensics Engineering. SADFE 2015, the tenth international conference on Systematic Approaches to Digital Forensic Engineering was held in Malaga, Spain, September 30 October 2, Digital forensics engineering and the curation of digital collections in cultural institutions face pressing and overlapping challenges related to provenance, chain of custody, authenticity, integrity, and identity. The generation, analysis and sustainability of digital evidence require innovative methods, systems and practices, grounded in solid research and understanding of user needs. The term digital forensic readiness describes systems that are build to satisfy the needs for secure digital evidence. SADFE 2015 investigates requirements for digital forensic readiness and methods, technologies, and building blocks for digital forensic engineering. Digital forensic at SADFE focuses on variety of goals, including criminal and corporate investigations, data records produced by calibrated devices, as well as documentation of individual and organizational activities. Another focus is on challenges brought in by globalization and cross-legislation digital applications. We believe digital forensic engineering is vital to security, the administration of justice and the evolution of culture. ISBN:
UFORIA - A FLEXIBLE VISUALISATION PLATFORM FOR DIGITAL FORENSICS AND E-DISCOVERY
UFORIA - A FLEXIBLE VISUALISATION PLATFORM FOR DIGITAL FORENSICS AND E-DISCOVERY Arnim Eijkhoudt & Sijmen Vos Amsterdam University of Applied Sciences Amsterdam, The Netherlands [email protected], [email protected]
Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys
Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph Client: Brian Krzys June 17, 2014 Introduction Newmont Mining is a resource extraction company with a research and development
Digital Evidence Search Kit
Digital Evidence Search Kit K.P. Chow, C.F. Chong, K.Y. Lai, L.C.K. Hui, K. H. Pun, W.W. Tsang, H.W. Chan Center for Information Security and Cryptography Department of Computer Science The University
Defining Digital Forensic Examination and Analysis Tools Using Abstraction Layers
Defining Digital Forensic Examination and Analysis Tools Using Abstraction Layers Brian Carrier Research Scientist @stake Abstract This paper uses the theory of abstraction layers to describe the purpose
Example of Standard API
16 Example of Standard API System Call Implementation Typically, a number associated with each system call System call interface maintains a table indexed according to these numbers The system call interface
Automatic Timeline Construction For Computer Forensics Purposes
Automatic Timeline Construction For Computer Forensics Purposes Yoan Chabot, Aurélie Bertaux, Christophe Nicolle and Tahar Kechadi CheckSem Team, Laboratoire Le2i, UMR CNRS 6306 Faculté des sciences Mirande,
Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham
Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control Phudinan Singkhamfu, Parinya Suwanasrikham Chiang Mai University, Thailand 0659 The Asian Conference on
An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases
An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases Paul L. Bergstein, Priyanka Gariba, Vaibhavi Pisolkar, and Sheetal Subbanwad Dept. of Computer and Information Science,
The evolution of DAVE
The evolution of DAVE The evolution of a large IDL application Data Analysis and Visualization Environment (DAVE) project. Richard Tumanjong Azuah Larry Kneller Yiming Qui John Copley Robert M Dimeo Craig
IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
Digital Forensics, ediscovery and Electronic Evidence
Digital Forensics, ediscovery and Electronic Evidence By Digital Forensics What Is It? Forensics is the use of science and technology to investigate and establish facts in a court of law. Digital forensics
Intelligent Log Analyzer. André Restivo <[email protected]>
Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.
ESET Endpoint Security 6 ESET Endpoint Antivirus 6 for Windows
ESET Endpoint Security 6 ESET Endpoint Antivirus 6 for Windows Products Details ESET Endpoint Security 6 protects company devices against most current threats. It proactively looks for suspicious activity
Introducing Performance Engineering by means of Tools and Practical Exercises
Introducing Performance Engineering by means of Tools and Practical Exercises Alexander Ufimtsev, Trevor Parsons, Lucian M. Patcas, John Murphy and Liam Murphy Performance Engineering Laboratory, School
The BSN Hardware and Software Platform: Enabling Easy Development of Body Sensor Network Applications
The BSN Hardware and Software Platform: Enabling Easy Development of Body Sensor Network Applications Joshua Ellul [email protected] Overview Brief introduction to Body Sensor Networks BSN Hardware
Research and Design of Universal and Open Software Development Platform for Digital Home
Research and Design of Universal and Open Software Development Platform for Digital Home CaiFeng Cao School of Computer Wuyi University, Jiangmen 529020, China [email protected] Abstract. With the development
How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip
Load testing with WAPT: Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. A brief insight is provided
A Mind Map Based Framework for Automated Software Log File Analysis
2011 International Conference on Software and Computer Applications IPCSIT vol.9 (2011) (2011) IACSIT Press, Singapore A Mind Map Based Framework for Automated Software Log File Analysis Dileepa Jayathilake
Understanding Data: A Comparison of Information Visualization Tools and Techniques
Understanding Data: A Comparison of Information Visualization Tools and Techniques Prashanth Vajjhala Abstract - This paper seeks to evaluate data analysis from an information visualization point of view.
Improved metrics collection and correlation for the CERN cloud storage test framework
Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report
Lecture 1 Introduction to Android
These slides are by Dr. Jaerock Kwon at. The original URL is http://kettering.jrkwon.com/sites/default/files/2011-2/ce-491/lecture/alecture-01.pdf so please use that instead of pointing to this local copy
www.coveo.com Unifying Search for the Desktop, the Enterprise and the Web
wwwcoveocom Unifying Search for the Desktop, the Enterprise and the Web wwwcoveocom Why you need Coveo Enterprise Search Quickly find documents scattered across your enterprise network Coveo is actually
Web Dashboard User Guide
Web Dashboard User Guide Version 10.2 The software supplied with this document is the property of RadView Software and is furnished under a licensing agreement. Neither the software nor this document may
Design of a NAND Flash Memory File System to Improve System Boot Time
International Journal of Information Processing Systems, Vol.2, No.3, December 2006 147 Design of a NAND Flash Memory File System to Improve System Boot Time Song-Hwa Park*, Tae-Hoon Lee*, and Ki-Dong
A Real Time, Object Oriented Fieldbus Management System
A Real Time, Object Oriented Fieldbus Management System Mr. Ole Cramer Nielsen Managing Director PROCES-DATA Supervisor International P-NET User Organisation Navervej 8 8600 Silkeborg Denmark [email protected]
Security Intelligence Services. Cybersecurity training. www.kaspersky.com
Kaspersky Security Intelligence Services. Cybersecurity training www.kaspersky.com CYBERSECURITY TRAINING Leverage Kaspersky Lab s cybersecurity knowledge, experience and intelligence through these innovative
Advanced ANDROID & ios Hands-on Exploitation
Advanced ANDROID & ios Hands-on Exploitation By Attify Trainers Aditya Gupta Prerequisite The participants are expected to have a basic knowledge of Mobile Operating Systems. Knowledge of programming languages
Dr. Lodovico Marziale Managing Partner 504ENSICS, LLC [email protected]
Dr. Lodovico Marziale Managing Partner 504ENSICS, LLC [email protected] Education Ph.D. in Computer Science, University of New Orleans, 2009. Dissertation Topic: Advanced Techniques for Improving the
Clearswift SECURE File Gateway
Security solutions for a changing world You wouldn t leave your front door unlocked if you were going out for the day, so why do the same with your business? In today s rapidly evolving business environment,
High End Information Security Services
High End Information Security Services Welcome Trion Logics Security Solutions was established after understanding the market's need for a high end - End to end security integration and consulting company.
Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage
Outline 1 Computer Architecture hardware components programming environments 2 Getting Started with Python installing Python executing Python code 3 Number Systems decimal and binary notations running
Lecture outline. Computer Forensics and Digital Investigation. Defining the word forensic. Defining Computer forensics. The Digital Investigation
Computer Forensics and Digital Investigation Computer Security EDA263, lecture 14 Ulf Larson Lecture outline! Introduction to Computer Forensics! Digital investigation! Conducting a Digital Crime Scene
IBM SAP International Competence Center. Load testing SAP ABAP Web Dynpro applications with IBM Rational Performance Tester
IBM SAP International Competence Center Load testing SAP ABAP Web Dynpro applications with IBM Rational Performance Tester Ease of use, excellent technical support from the IBM Rational team and, of course,
Component visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University [email protected]
Domains and Competencies
Domains and Competencies DOMAIN I TECHNOLOGY APPLICATIONS CORE Standards Assessed: Computer Science 8 12 I VII Competency 001: The computer science teacher knows technology terminology and concepts; the
Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)
Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs) 1. Foreword Magento is a PHP/Zend application which intensively uses the CPU. Since version 1.1.6, each new version includes some
The Implementation of Wiki-based Knowledge Management Systems for Small Research Groups
International Journal of Computer Information Systems and Industrial Management Applications (IJCISIM) ISSN 2150-7988 Vol.1 (2009), pp. 68 75 http://www.mirlabs.org/ijcisim The Implementation of Wiki-based
Validating Tools for Cell Phone Forensics
Validating Tools for Cell Phone Forensics Neil Bhadsavle and Ju An Wang Southern Polytechnic State University 1100 South Marietta Parkway Marietta, GA 30060 (01) 678-915-3718 {nbhadsav, jwang}@spsu.edu
A Survey on Mobile Forensic for Android Smartphones
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 2, Ver. 1 (Mar Apr. 2015), PP 15-19 www.iosrjournals.org A Survey on Mobile Forensic for Android Smartphones
Semester Thesis Traffic Monitoring in Sensor Networks
Semester Thesis Traffic Monitoring in Sensor Networks Raphael Schmid Departments of Computer Science and Information Technology and Electrical Engineering, ETH Zurich Summer Term 2006 Supervisors: Nicolas
Design: Metadata Cache Logging
Dana Robinson HDF5 THG 2014-02-24 Document Version 4 As an aid for debugging, the existing ad-hoc metadata cache logging functionality will be made more robust. The improvements will include changes to
ANDROID BASED MOBILE APPLICATION DEVELOPMENT and its SECURITY
ANDROID BASED MOBILE APPLICATION DEVELOPMENT and its SECURITY Suhas Holla #1, Mahima M Katti #2 # Department of Information Science & Engg, R V College of Engineering Bangalore, India Abstract In the advancing
BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: 10.1.1. Security Note
BlackBerry Enterprise Service 10 Secure Work Space for ios and Android Version: 10.1.1 Security Note Published: 2013-06-21 SWD-20130621110651069 Contents 1 About this guide...4 2 What is BlackBerry Enterprise
Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis
, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying
Whitepaper. The Top 10 Advantages of 3CX Phone System. Why your next phone system should be software based and by 3CX
Whitepaper The Top 10 Advantages of 3CX Phone System Why your next phone system should be software based and by 3CX This whitepaper outlines the top 10 advantages of choosing 3CX Phone System, a Windows
Whitepaper Enhancing BitLocker Deployment and Management with SimplySecure. Addressing the Concerns of the IT Professional Rob Weber February 2015
Whitepaper Enhancing BitLocker Deployment and Management with SimplySecure Addressing the Concerns of the IT Professional Rob Weber February 2015 Page 2 Table of Contents What is BitLocker?... 3 What is
Index Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
Netwrix Auditor for Active Directory
Netwrix Auditor for Active Directory Quick-Start Guide Version: 7.1 10/26/2015 Legal Notice The information in this publication is furnished for information use only, and does not constitute a commitment
Memory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
1-04-10 Configuration Management: An Object-Based Method Barbara Dumas
1-04-10 Configuration Management: An Object-Based Method Barbara Dumas Payoff Configuration management (CM) helps an organization maintain an inventory of its software assets. In traditional CM systems,
Chapter 3 Operating-System Structures
Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual
Chapter 2 System Structures
Chapter 2 System Structures Operating-System Structures Goals: Provide a way to understand an operating systems Services Interface System Components The type of system desired is the basis for choices
Image Search by MapReduce
Image Search by MapReduce COEN 241 Cloud Computing Term Project Final Report Team #5 Submitted by: Lu Yu Zhe Xu Chengcheng Huang Submitted to: Prof. Ming Hwa Wang 09/01/2015 Preface Currently, there s
A universal forensic solution to read memory chips developed by the Netherlands Forensic Institute. The NFI Memory Toolkit II
A universal forensic solution to read memory chips developed by the Netherlands Forensic Institute The NFI Memory Toolkit II The NFI Memory Toolkit II The NFI Memory Toolkit II is a universal forensic
FioranoMQ 9. High Availability Guide
FioranoMQ 9 High Availability Guide Copyright (c) 1999-2008, Fiorano Software Technologies Pvt. Ltd., Copyright (c) 2008-2009, Fiorano Software Pty. Ltd. All rights reserved. This software is the confidential
MaaS360 Mobile Enterprise Gateway
MaaS360 Mobile Enterprise Gateway Administrator Guide Copyright 2014 Fiberlink, an IBM Company. All rights reserved. Information in this document is subject to change without notice. The software described
KPI, OEE AND DOWNTIME ANALYTICS. An ICONICS Whitepaper
2010 KPI, OEE AND DOWNTIME ANALYTICS An ICONICS Whitepaper CONTENTS 1 ABOUT THIS DOCUMENT 1 1.1 SCOPE OF THE DOCUMENT... 1 2 INTRODUCTION 2 2.1 ICONICS TOOLS PROVIDE DOWNTIME ANALYTICS... 2 3 DETERMINING
An Easier Way for Cross-Platform Data Acquisition Application Development
An Easier Way for Cross-Platform Data Acquisition Application Development For industrial automation and measurement system developers, software technology continues making rapid progress. Software engineers
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
Digital Forensics Tutorials Acquiring an Image with FTK Imager
Digital Forensics Tutorials Acquiring an Image with FTK Imager Explanation Section Digital Forensics Definition The use of scientifically derived and proven methods toward the preservation, collection,
A White Paper from AccessData Group. Cerberus. Malware Triage and Analysis
A White Paper from AccessData Group Cerberus Malware Triage and Analysis What is Cerberus? Cerberus is the first-ever automated reverse engineering tool designed to show a security analyst precisely what
How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
Business Insight Report Authoring Getting Started Guide
Business Insight Report Authoring Getting Started Guide Version: 6.6 Written by: Product Documentation, R&D Date: February 2011 ImageNow and CaptureNow are registered trademarks of Perceptive Software,
PUBLIC Preferences Setup Automated Analytics User Guide
SAP Predictive Analytics 2.3 2015-08-27 PUBLIC Automated Analytics User Guide Content 1 About Startup Options....3 1.1 Accessing the Preferences Dialog.... 3 2 Setting the General Options....4 2.1 Default
Chapter 14 Analyzing Network Traffic. Ed Crowley
Chapter 14 Analyzing Network Traffic Ed Crowley 10 Topics Finding Network Based Evidence Network Analysis Tools Ethereal Reassembling Sessions Using Wireshark Network Monitoring Intro Once full content
Concepts of digital forensics
Chapter 3 Concepts of digital forensics Digital forensics is a branch of forensic science concerned with the use of digital information (produced, stored and transmitted by computers) as source of evidence
See Criminal Internet Communication as it Happens.
A PRODUCT OF See Criminal Internet Communication as it Happens. In Real Time or Recreated. From the Field or From Your Desk. That s Intelligence. That s Intellego. 2 / Visual Reconstruction & Analysis
TZWorks Windows Event Log Viewer (evtx_view) Users Guide
TZWorks Windows Event Log Viewer (evtx_view) Users Guide Abstract evtx_view is a standalone, GUI tool used to extract and parse Event Logs and display their internals. The tool allows one to export all
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
Creating and Using Databases for Android Applications
Creating and Using Databases for Android Applications Sunguk Lee * 1 Research Institute of Industrial Science and Technology Pohang, Korea [email protected] *Correspondent Author: Sunguk Lee* ([email protected])
Chapter 7D The Java Virtual Machine
This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly
Netwrix Auditor for SQL Server
Netwrix Auditor for SQL Server Quick-Start Guide Version: 7.1 10/26/2015 Legal Notice The information in this publication is furnished for information use only, and does not constitute a commitment from
IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications
IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents
DEVELOPMENT OF AN ANALYSIS AND REPORTING TOOL FOR ORACLE FORMS SOURCE CODES
DEVELOPMENT OF AN ANALYSIS AND REPORTING TOOL FOR ORACLE FORMS SOURCE CODES by Çağatay YILDIRIM June, 2008 İZMİR CONTENTS Page PROJECT EXAMINATION RESULT FORM...ii ACKNOWLEDGEMENTS...iii ABSTRACT... iv
A Review of Different Comparative Studies on Mobile Operating System
Research Journal of Applied Sciences, Engineering and Technology 7(12): 2578-2582, 2014 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2014 Submitted: August 30, 2013 Accepted: September
THE CERN/SL XDATAVIEWER: AN INTERACTIVE GRAPHICAL TOOL FOR DATA VISUALIZATION AND EDITING
THE CERN/SL XDATAVIEWER: AN INTERACTIVE GRAPHICAL TOOL FOR DATA VISUALIZATION AND EDITING Abstract G. Morpurgo, CERN As a result of many years of successive refinements, the CERN/SL Xdataviewer tool has
A Practical Approach for Evidence Gathering in Windows Environment
A Practical Approach for Evidence Gathering in Windows Environment Kaveesh Dashora Department of Computer Science & Engineering Maulana Azad National Institute of Technology Bhopal, India Deepak Singh
USING LOCAL NETWORK AUDIT SENSORS AS DATA SOURCES FOR INTRUSION DETECTION. Integrated Information Systems Group, Ruhr University Bochum, Germany
USING LOCAL NETWORK AUDIT SENSORS AS DATA SOURCES FOR INTRUSION DETECTION Daniel Hamburg,1 York Tüchelmann Integrated Information Systems Group, Ruhr University Bochum, Germany Abstract: The increase of
Performance Management Platform
Open EMS Suite by Nokia Performance Management Platform Functional Overview Version 1.4 Nokia Siemens Networks 1 (16) Performance Management Platform The information in this document is subject to change
MaaS360 Mobile Enterprise Gateway
MaaS360 Mobile Enterprise Gateway Administrator Guide Copyright 2013 Fiberlink Communications Corporation. All rights reserved. Information in this document is subject to change without notice. The software
Workshop & Chalk n Talk Catalogue Services Premier Workshop & Chalk n Talk Catalogue
Services Premier Workshop & Chalk n Talk Catalogue The Microsoft Services Premier Workshop & Chalk n Talk Catalogue 2011 is published by Microsoft Services in Ireland. Workshop Schedule Workshop Location
DataPA OpenAnalytics End User Training
DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 [email protected] San Diego Supercomputer Center
IBM Operational Decision Manager Version 8 Release 5. Getting Started with Business Rules
IBM Operational Decision Manager Version 8 Release 5 Getting Started with Business Rules Note Before using this information and the product it supports, read the information in Notices on page 43. This
RSA Enterprise Compromise Assessment Tool (ECAT) Date: January 2014 Authors: Jon Oltsik, Senior Principal Analyst and Tony Palmer, Senior Lab Analyst
ESG Lab Review RSA Enterprise Compromise Assessment Tool (ECAT) Date: January 2014 Authors: Jon Oltsik, Senior Principal Analyst and Tony Palmer, Senior Lab Analyst Abstract: This ESG Lab review documents
winhex Disk Editor, RAM Editor PRESENTED BY: OMAR ZYADAT and LOAI HATTAR
winhex Disk Editor, RAM Editor PRESENTED BY: OMAR ZYADAT and LOAI HATTAR Supervised by : Dr. Lo'ai Tawalbeh New York Institute of Technology (NYIT)-Jordan X-Ways Software Technology AG is a stock corporation
WIND RIVER SECURE ANDROID CAPABILITY
WIND RIVER SECURE ANDROID CAPABILITY Cyber warfare has swiftly migrated from hacking into enterprise networks and the Internet to targeting, and being triggered from, mobile devices. With the recent explosion
MENDIX FOR MOBILE APP DEVELOPMENT WHITE PAPER
MENDIX FOR MOBILE APP DEVELOPMENT WHITE PAPER TABLE OF CONTENTS Market Demand for Enterprise Mobile Mobile App Development Approaches Native Apps Mobile Web Apps Hybrid Apps Mendix Vision for Mobile App
New Hash Function Construction for Textual and Geometric Data Retrieval
Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan
AndroLIFT: A Tool for Android Application Life Cycles
AndroLIFT: A Tool for Android Application Life Cycles Dominik Franke, Tobias Royé, and Stefan Kowalewski Embedded Software Laboratory Ahornstraße 55, 52074 Aachen, Germany { franke, roye, kowalewski}@embedded.rwth-aachen.de
Impact of Digital Forensics Training on Computer Incident Response Techniques
Impact of Digital Forensics Training on Computer Incident Response Techniques Valorie J. King, PhD Collegiate Associate Professor University of Maryland University College Presentation to AFCEA June 25,
13 Managing Devices. Your computer is an assembly of many components from different manufacturers. LESSON OBJECTIVES
LESSON 13 Managing Devices OBJECTIVES After completing this lesson, you will be able to: 1. Open System Properties. 2. Use Device Manager. 3. Understand hardware profiles. 4. Set performance options. Estimated
Redpaper Axel Buecker Kenny Chow Jenny Wong
Redpaper Axel Buecker Kenny Chow Jenny Wong A Guide to Authentication Services in IBM Security Access Manager for Enterprise Single Sign-On Introduction IBM Security Access Manager for Enterprise Single
Load testing with. WAPT Cloud. Quick Start Guide
Load testing with WAPT Cloud Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. 2007-2015 SoftLogica
Scalable Run-Time Correlation Engine for Monitoring in a Cloud Computing Environment
Scalable Run-Time Correlation Engine for Monitoring in a Cloud Computing Environment Miao Wang Performance Engineering Lab University College Dublin Ireland [email protected] John Murphy Performance
System Structures. Services Interface Structure
System Structures Services Interface Structure Operating system services (1) Operating system services (2) Functions that are helpful to the user User interface Command line interpreter Batch interface
