COMPUTER FORENSIC Ibrahim Khoury, Eralda Caushaj ABSTRACT The process of using scientific knowledge to collect, analyze and present digital evidence to court is identified as Computer Forensic. To be able to examine large amounts of data in a timely manner, in search of important evidence during crime investigations is essential to the success of computer forensic examinations. The limitations in time and resources, both computational and human, have a negative impact in the results obtained. Thus, better uses of the resources available are necessary, beyond the capabilities of the currently used forensic tools. In order to improve the current software used in computer forensic the use of Artificial Intelligence tools is necessary. 1. INTRODUCTION Computer forensics has recently gained significant popularity with many local law enforcement agencies. It is currently employed in fraud, theft, drug enforcement and almost every other enforcement activity [1, 2]. Computer forensics involves the preservation, identification, extraction and documentation of digital evidence in the form of magnetically, optically or electronically stored media [3]. Computer and mobile devices forensic techniques are not as advanced as those of the more mature and mainstream forensics techniques used by law enforcement, such as blood typing, ballistics, fingerprinting, and DNA testing [9]. Its immaturity is attributable to the fast-paced changes in technology. Recently we have been researching on wide area of Forensic computing like email, mobile and computer forensic, but we decided to focus more on offline and network analysis. Artificial intelligence tools in computer forensic will be our focus in this paper as well. Section 2 will be dedicated to the origin of computer forensic as well as an analysis of what is considered forensic and non-forensic tools. Section 3 will be focused on offline analysis, and section 4 on network analysis and EnCase Enterprise software. Artificial Intelligence in Computer Forensic and MADIK tool is covered in section 5. Neural network and support vector machine are introduced in section 6. 2. ORIGIN OF THE TERM The term computer forensics in the late 1980s was used to refer to the process of examining standalone computers for digital evidence of crime, by early law enforcement practitioners. Some researchers have argued that forensic computing is a more accurate term, because digital evidence is increasingly captured from objects not commonly thought of as computers, but from mobile devices as well. In Computer Forensic, the term forensics implies the use of tools to present some aspect of evidence not available through standard observation. 2.1 NON FORENSIC-TOOLS Standard file copy programs or routines that search for text are not considered forensic tools [1]. Data manipulation with other processes that transform information in some fashion cannot be considered as forensic operations. Encryption, data compression and other types of encoding are not considered as forensic tools. These methods are only used to transform the same evidence into a different form and do not serve to uncover new evidence. 2.2 FORENSIC-TOOLS The reconstruction of files by uncovering patterns of bytes, or obtaining data from a microscopic view of a medium s magnetic domains does serve as suitable candidates for forensic research [1]. 3. OFFLINE ANALYSIS Investigators in a crime scene cannot proceed without following a protocol which includes the following steps: Taking pictures from outside and inside the computer is the first evidence in identifying the physical situation of computer. Determine if a destructive program in running or not should be done as well. The offline analysis of the computer is possible when the investigator powers down the computer and remove it from the network. Creating an extract physical copy of the evidence is the next step in acquiring digital evidence. According to Kruse and Heiser this copy is called a bit-stream image or forensic image [3]. Forensic images are important for several reasons. Courts look favorably upon forensic images, because it demonstrates that all of the evidence was captured. Authentication is another step in computer forensic. It is important to authenticate that the copy of the evidence is exactly the same as the original. Analyzing the data and presenting it in an acceptable format at court is the last step in computer forensic.
Administrative Consideration System Preservation Evidence Acquisition Comparision Evidence Examination Physical Presentation Policy and Procedure Development Determine if a destructive program is running Turn off the computer Make a digital copy of the original hard disk Authenticate that the copy of the evidence is exactly the same as the original. Analyze the digital copy Documenting and Reporting Figure 1: Computer Forensic steps In figure 1 all the above steps are illustrated. A digital copy of the hard disk can be done as follows [3]: Use write blocker: a hardware mechanism that allows reading from, but not writing to the hard disk. Write blocker procedure is used usually if we are imaging using a Windows-based application, because Window s will automatically mount the hard drive as read + write. Therefore the possibility for changing files on hard drive is evident. In Linux is not necessary because manually the hard drive can be mounted as read only. Bit stream image of the original hard disk that contain all the physical and deleted files. The analyzing of the digital copy of hard drive can be obtained as follows: Mounting the image: In order to access the file system of the hard drive we must mount the forensic image. Mounting the disk or image makes a file system available to the operating system s kernel. When the image is mounted any tool necessary to work with files (search, view, sort, print, etc.) can be used. The forensic image should be mounted in a read-only mode, in order to not change it. Hash analysis (MD5Deep): Files in a hash set typically fall into one of two categories known or notable. Known files are ones that can be ignored, such as typical system files (iexplore.exe, winword.exe, etc), instead of the notable files are ones that have been identified as illegal or inappropriate, such as child pornography. Hash analysis compares the hashes of the files to a set of hashes of files of a known content. Signature analysis: is an automated procedure for comparing the header or footer of the file with the file extension. File signature is a header or footer (or both) within a file that indicates the application associated with a typical file. File signatures are useful for evaluating whether a suspect is attempting to hide files by changing the extension. Email search: Email can be an important source of evidence for many types of investigations. To conduct an email investigation the mailbox files must be located. The mailbox locations differ depending upon the version of Windows and the application used, therefore different path directories should be considered. File type search: Law enforcement might want to find all graphics files on a subject s hard drive that contains over 500,000 files. An effective search in this case could be searching for files with the appropriate graphical file extension like jpg, gif etc. Keyword search: Searching for specific keywords within the forensic image is done after the investigator reduces the search space by identifying and filtering known files. Also the suspect files can be identified via signature analyses. Web based email: The most common webmail services which are Yahoo Mail, Hotmail and Gmail are good source for investigation. Webmail messages are stored in html format with the extension html or htm and are thus readable with any web browser (Mozilla, IE). The messages that are downloaded from or uploaded to the Web are stored in the Temporary Internet Folders. Cookies: are pieces of information generated by a Web server and stored in the user's computer, ready for future access. Cookies are embedded in the HTML information flowing back and forth between the user's computer and the servers [4]. Most users may be unaware that these cookies are being placed on their hard drives. The cookies directory contains the individual cookies as well as an index.dat that consists of the activity records for each of the cookies in the directory. These files can be a valuable source of information during investigations. Swap file: is virtual memory that is used as an extension of the computer systems RAM. Even if the file was forensically deleted from a hard drive, the
swap file can contain evidence that has been previously removed. If the swap file was forensically deleted then the file within the swap is unrecoverable, although there is a possibility still to find copies in unallocated or slack space. The windows swap file is win386.swp, pagefile.sys. Deleted files are a very important evidence, the investigators try to retrieve deleted files by first, looking at INFO2 files that tracks important information about deleted files. FAT (File Allocation Table) is a good source to find deleted files. Temporary files: Many Windows applications create temporary files that are usually written to the hard drive. Investigators retrieve those files from the hard disk even if you overwrite the original files. For example Homework1.doc and ~homework1.tmp Print Spool Files: when a file is printed, an enhance metafile (EMF) is written to the hard disk. Investigators recover those files from the hard disk, even if the suspect deletes the original file. 4. NETWORK ANALYSIS The study of analyzing network activity in order to discover the source of security policy violations or information assurance breaches is network forensic [5]. Capturing network activity for forensic analysis is simple in theory, but relatively trivial in practice [6]. Not all the information captured or recorded will be useful for analysis. Identifying key features that reveal information deemed worthy for further intelligent analysis is a problem of great interest to the researchers in the field. Network analysis focus on the packets captured from intrusion detection systems. Investigators analyze the network packets and activities to find evidences. Intrusion Detection Systems (IDS) is a device (or application) that monitors network and/or system activities for malicious activities or policy violations. IDS are the most important source of information for investigators in network attacks. Intrusion Detection Systems can detect various types of attacks [3]: Denial of service attack: is a class of attacks in which an attacker makes some computing or memory resource too busy or too full to handle legitimate requests, or denies legitimate users access to a machine [3]. Scan ports: is a software application designed to scan a network host for open ports. Viruses: A virus is a small piece of software that piggybacks on real programs. Eavesdropping: Secretly gaining unauthorized access to confidential communications Spoofing attacks: is a situation in which one person or program successfully masquerades as another by falsifying data and thereby gaining an illegitimate advantage. Smurf attack: is a way of generating significant computer network traffic on a victim network. A range of information can be retrieved by IDS: Source and destination IP addresses Source and destination DNS names Source and destination ports Type of attack The original packets To analyze more in details the offline and network analyzes we will introduce EnCase Enterprise tool that is widely used in federal agencies for investigation purposes. EnCase Enterprise is used in a network, but the previous version EnCase was used just for standalone computers. 4.1. EnCASE ENTERPRISE EnCase Enterprise software is used to investigate networked environments and it allows investigators to securely investigate multiple machines simultaneously, at the disk and memory levels, without taking computers offline. EnCase provides the following functionalities as defined by the NIST (National Institute of Standards and Technology) [7]: Immediate response capability: EnCase Enterprise has the ability to conduct immediate forensic analysis of any system on a WAN (wide area network), without disrupting operations. The immediate response capability of EnCase Enterprise enables many federal agencies to better identify incidents as they occur. Initial System Snapshot: is one of the most important features of EnCase Enterprise. For any compromised system on WAN a snapshot of all the key volatile and binary data can be quickly obtained. Analyze live systems with minimal invasiveness: EnCase has the ability to analyze online the systems, without being visible to the user of the attacker in a forensically manner. Volatile data acquisition and analysis: EnCase Enterprise can capture and examine the volatile data from several systems at once, such as open ports or
files, running processes and live registry. The software does it remotely without disrupting the system being investigated. Forensic hard drive data acquisition: EnCase Enterprise is capable of obtaining complete and accurate forensic images of hard drives. It can create images on a local drive, but using EnCase Enterprise we can images of any computer in WAN. Computer forensic analysis: Besides disk imaging EnCase provides industry leading computer forensic analysis capability. It includes all the functions determined from NIST as follows: Identifying and recovering file fragments, hidden and deleted files, directories from any location Examining file structures Displaying the contents of all graphic files Performing complex searches Graphically displaying the acquired drive s directory structure Generating reports. Establish a proper chain of custody with a message digest hash algorithm: The EnCase acquisition process features an integrated process to establish a proper chain of custody, including the secure generation of a MD5 hash for the forensic image and CRC s for every 32K of data for authentication. Log file acquisition and analysis: Log files are really important in terms of information they can provide to investigators. Skilled attackers know the importance of log files and may delete them in attempts to cover their tracks, but EnCase supports the collection, parsing and analysis of those files. Ability to correlate multiple time zones of acquired media: EnCase is designed to support the analysis and correlation of dates and times originating in different time zones. Validated computer forensic technology via courts and independent testing: It is crucial that Federal agencies utilize forensic technologies that meet legal requirements for the admission of computer evidence. EnCase is exceptionally accepted by the courts in appellate and trial court decisions [8]. 5. ARTIFICIAL INTELLIGENCE IN COMPUTER FORENSIC The success of computer forensic examinations depends on the ability to examine large amounts of data in a timely manner in search of important evidence during crime investigations. The limitations in time and resources, both computational and human, have a negative impact in the results obtained. In order to improve the current software used in computer forensic the use of Artificial Intelligence tools is necessary. The MADIK (MULTI-AGENT DIGITAL INVESTIGATION TOOLKIT) tool is used for offline analysis; neural network and SVM (Support Vector Machine) are used in network analysis [6]. 5.1. MADIK (MULTI-AGENT DIGITAL INVESTIGATION TOOLKIT) MADIK is a multiagent system used to assist the computer forensic experts on its examination. Figure 1 presents the architecture of MADIK, which is divided into four layers, named strategic, tactical, and operational and specialist levels. The strategic manager receives the requests for investigation cases and distributes them to the tactical managers. The tactical manager will assign each evidence that belong to its case to one of its operational managers. Finding the appropriate specialized agents to examine the evidence received from its manager, is a task performed by the operational manager. The operational manager has an important role in the architecture, because it determines which specialized agent will be employed. The system is composed of a set of ISAs (Intelligent Software Agent) that perform different analysis on the digital evidence related to a case on a distributed manner [10]. In MADIK, each ISA contains a set of rules and a knowledge base, both based on the experience of the expert on a certain kind of investigation. MADIK has six specialized intelligent agents as follows [11]: HashSetAgent: calculates the MD5 hash from a file and compares it with its knowledge base, which contains sets of files known to be ignorable or important. FilePathAgent: keeps on its knowledge base a collection of folders which are commonly used by several application which may be of interest to the investigation like P2P (peer-to-peer), VoIP and instant messaging applications. FileSignatureAgent: the file headers (the first 8 bytes of the file) are examined, to determine if they match the file extension. TimelineAgent: dates of creation, access and modification to determine events like system and software installation, backups, web browser usage and other activities are examined, some which can be relevant to the investigation. WindowsRegistryAgent: files related to the windows registry and extracts valuable information such as system installation date, time zone configuration, removable media information are examined.
KeywordAgent: searches for keywords. Regular expressions are used to extract information from files such as credit card numbers, URLs or e-mail addresses. estimate that it matches with the data it has been trained to recognize. By training the system with both the input and output of the desired problem, the neural network gains initially the experience. Neural network is used to classify the intrusion detection data to important and unimportant. Investigators look deeply to the important data. Neural network reduce the amount of search space. Although neural network is a very successful tool to classify data, many tests prove that machine support vectors are more effective and 99% accurate. 6.2 SUPPORT VECTOR MACHINE Figure 2: MADIK s architecture The different agents can diverge in their decisions, what causes a conflict in the blackboard that must be solved by the operational manager. 6. NEURAL NETWORK AND SVM WITH IDS The first step in using neural network or SVM is feature selection which is an important issue in network forensic, because the elimination of useless features enhances the accuracy of detection while speeding up the computation. Elimination of useless features improves the overall performance of the detection mechanism. In cases where there are no useless features, concentrating on the most important ones may well improve the time performance of the detection mechanism, without affecting the accuracy of detection in statistically significant ways [7]. The use SVMs for network analysis has the following benefits: Fast results Reduced size of data Eliminate human analysis 6.1 NEURAL NETWORK The collection of processing elements that are highly interconnected and transform a set of desired outputs is defined as artificial neural networks. The result of the transformation is determined by the characteristics of the elements and the weights associated with the interconnections among them. A neural network conducts an analysis of the information and provides a probability Support vector machines are learning machines that plot the training vectors in high-dimensional feature space, labeling each vector by its class. Classification of data in SVMs can be done by determining a set of support vectors, which are members of the set of training inputs that outline a hyper plane in the feature space [6]. SVMs provide a generic mechanism to fit the surface of the hyper plane to the data through the use of a kernel function. The user may provide a function (e.g., linear, polynomial, or sigmoid) to the SVMs during the training process, which selects support vectors along the surface of this function. The number of free parameters used in the SVMs depends on the margin that separates the data points, but not on the number of input features. There are many reasons that we use SVMs for intrusion detection data. The first is speed; because real-time performance is of primary importance to intrusion detection systems, any classifier that can potentially run fast is worth considering. The second reason is scalability; SVMs are relatively insensitive to the number of data points and the classification complexity does not depend on the dimensionality of the feature space, so they can potentially learn a larger set of patterns and scale better than neural networks. Once the data is classified into two classes, a suitable optimizing algorithm can be used, if necessary, for further feature identification, depending on the application. 7. CONCLUSION Computer Forensic is the process of using scientific knowledge to collect, analyze and present digital evidence to court. There is strict procedure that the investigators should follow in order to get digital evidences. In offline and network analysis the limitations in time and resources, both computational and human, have a negative impact in the results obtained. The use of artificial intelligence tools in computer forensic has
resulted very helpful. Multiagents, neural network and SVM systems are used in network forensic. 9. REFRENCES [11] Bruno W. P. Hoelz, Célia Ghedini Ralha, Rajiv Geeverghese, Artificial Intelligence Applied to Computer Forensics, ACM, Symposium on Applied Computing, 2009 [1] Bhanu Prakash Battula, KeziaRani, Satya Prasad, T. Sudha, Techniques in Computer Forensics: A Recovery Perspective, Volume: 3, Issue: 2 Pages: 27-35, Publication Date: March/April 2009, ISSN (Online): 1985-2320 [2] Michael Yip, Signature analysis and Computer Forensics, School of Computer Science University of Birmingham, December, 2008 [3] J. Philip Craiger, Computer Forensics Procedures and Methods, Handbook of Information Security. John Wiley & Sons [4] The Cookie Concept: http://www.cookiecentral.com/c_concept.htm {accessed August 5, 2010} [5] Eoghan Casey, Network traffic as a source of evidence: tool strengths, weaknesses, and future needs, Digital Investigation, Volume 1, Issue 1, February 2004, Pages 28-43 [6] Srinivas Mukkamala, Andrew H. Sung, Identifying Significant Features for Network Forensic Analysis Using Artificial Intelligent Techniques, International Journal of Digital Evidence, Winter 2003, Volume 1, Issue 4 [7] NIST Computer Security Incident Handling Guide, January 2004 [8] State v. Cook, 2002-Ohio-4812, 2002 WL 31045293 (Appellate court expressly validates the authenticity of an EnCase image); Williford v. State, 2004 WL 67560 (Tex.App.-Eastland) (EnCase validated under Frye/Daubert standard). [9] Ibrahim Baggili, Ashwin Mohan, and Marcus Rogers, SMIRK: SMS Management and Information Retrieval Kit, Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering 2010 [10] Bruno W. P. Hoelz, C elia G. Ralha, Rajiv Geeverghese and Hugo C. Junior, A Cooperative Multi- Agent Approach to Computer Forensics, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology