Privacy Scrubber: Means to Obfuscate Personal Data from Benign Application Leakage Michael Walker Youngstown State University One University Plaza Youngstown, Oh 44555 USA 1.877.468.6978 mawalker@my.ysu.edu ABSTRACT A wide range of personal information is distributed over the Internet by benign software applications. These applications have access to user name, host name, a list of components in and attached to the computer and many other pieces of information that can be used for tracking or profiling purposes. These benign applications can and do send such private information to not only the developers, but also to marketing and tracking services. We attempt to locate the most commonly leaked information and to create a means to obscure this information when an attempt to read it is made. A software program that we write called Privacy Scrubber then intercepts access to the personal data being requested by the leaking software. After interception, a randomized value will be returned to the requesting application. A key issue is to prevent any alteration in the running of the leaking software, and to prevent the leakage of personal information. To do this, the information that is generated by the Privacy Scrubber has to return syntactically valid data for the information requested, but randomized to protect privacy. This project has two benefits. One, it identifies some of the most commonly leaked information by benign software applications. Two, it proposes techniques to intercept and replace the leaked information, without altering the execution of the benign software. We will also be writing software to implement these techniques and evaluating the effectiveness of the software when used with frequently used benign applications. This initial project is not meant to be complete in its protection of private information, nor is it meant to defend against all the benign software applications that exist. This Privacy Scrubber is a starting point for a scalable and extensible framework to start the process of protecting a person's private information from benign software leakage, without causing the application to perform abnormally. Categories and Subject Descriptors D.4.0 [Operating Systems]: Security and Protection access controls, information flow controls, invasive software General Terms Security Keywords Windows System Programming, Windows Hooks, Windows API, Filter, Personal Information, Web Tracking Ben Christian Ben Christian Youngstown State University One University Plaza Youngstown, Oh 44555 USA 1.877.468.6978 bochristen@my.ysu.edu Min Gyung Kang Min Gyung Kang Electrical and Computer Engineering Carnegie Mellon University Pittsburgh PA 15213 USA 1.412.268.2000 mgkang@gmail.com 1. INTRODUCTION With over 1 billion personal computers in the world [1] and just below 90% of them running a version of Microsoft's Windows Operating System [2] there is a distinct market for software applications that wish to not be classified as malware or a virus, but still wish to gain access to personal information of those using the computer, or just any program's publisher wishing to increase their product's profitability. These benign applications can be in many categories such as instant messengers, productivity tools and anything that is commonly used by nontechnical end-users. In a past audit many of the most commonly downloaded programs from Download.com leaked personal information ranging from host name, user name, geographic coordinates based on Internet Protocol Address and other identifiable personal information[3]. The goal of our project is threefold. First to create a systematic methodology that allows testing of benign applications that harvest personal information then leak it over the Internet. Secondly, to create a program which can filter applications' attempts to access personal information when they are going to leak the resulting information. Finally, to verify that our filtering software works in protecting personal information without altering the execution of the target benign application. The motivation for creating a standard means to test benign applications and the means to filter any that leak personal information is to increase individuals' privacy. Not only does this protect against single instances of personal data being leaked, but it provides a way to protect information that might be used to profile or identify a user on the Internet. Our approach has two distinct sections, the way to detect and the way to defend against the leakage of the personal information. We attempted to make a standard installation method for testing the benign applications. We then intercepted registry access and network traffic for installed applications. After capturing the raw data of what the program intercepts, we will create automatic comparison programs to parse the logs and create human readable output. After locating applications that leak personal information, we will use our software suite to create filters that can be loaded to filter the offending applications on any computer running our protection software. We make two contributions in this paper. The first is the creation of the techniques involved in automating as much as possible the process of locating leaked personal information. The second is the examination of current methods of filtering the Windows API and evaluating which would be most beneficial in reaching our goal of filtering private information without altering the execution of the leaking benign application.
The rest of the paper is organized as follows: Problem Details where we will discuss the issues we wish to address with our research. Methodology where we describe the steps we took to work towards a possible solution for the problem. Evaluation where we discuss our results of our project and evaluate the strengths and drawbacks of our approach. Conclusion where we discus the overall results of our research. Future Work where we describe the future directions of research. Finally acknowledgements and references. 2. PROBLEM DETAILS Personal information being leaked onto the Internet can cause several issues for the person who had the information leaked. The information can be used to build a more complete Internet profile of the user and their machine of use. It can also be used to enable Internet tracking and individualized marketing against the user. It can further help someone attempting to steal an individual's identity. A person's user login name or other such private information such as the model of processor running on the machine might not seem like easy things to access. However, we show that a very simple test application with very few lines of code can access this and more very easily. Programs such as Coupons.com's required software to print coupons have been accessing and possibly leaking this personal information onto the Internet. A single person reverse engineered the printer software and found the following: [I] determined that Coupons.com retrieves a wide variety of sensitive Windows registry keys and computer configuration settings including Windows Product ID, Windows CD key, motherboard serial number, and hard drive serial number. These numbers serve to identify a specific individual computer, and these numbers persist over the lifetime of a computer. [4] This level of profiling of someone's Windows System both on the hardware and software side constitutes an extreme level of potential profiling and tracking on the Internet. We see a need for benign applications to have filters applied to them so that they do not access information such as what the Coupons.com printer software deoes when there is the possibility for such information to be leaked outside the computer. 3. METHODOLOGY The following will describe what steps we took to determine which programs were leaking personal information and creating the means to filter those programs. 3.1 Finding Leaking Applications To create a standardized way to test benign applications for leaked personal information we used VMware, a virtual machine, to create an image of a Windows XP SP3 installation with Wireshark, HTTPAnalyzer and Process Monitor installed and running. Wireshark is the world's foremost network protocol analyzer [5]. HTTPAnalyzer allows you to monitor, trace, debug and analyze HTTP/HTTPS traffic in real-time [6]. HTTPAnaylyzer intercepts HTTP traffic before it reaches Window's SSL encryption. Process Monitor is program from Microsoft to provide an advanced monitoring tool for Windows that shows real-time file system, Registry and process/thread activity [7]. We us this for logging what registry access our targeted program makes. We focused our research on the Registry because of the ease of access and wealth of information in itwe dropped all information that was not created by the benign application's processes so as to decrease log files and to focus the saved records. We made separate logs for installation and first run. We then stored all the logs outside the virtual machine and restored the state of VMware to the point before the installation of the tested benign application, so that every application would have the same installation conditions. We tested the top 20 programs from download.com, several of the top instant message clients, and a commonly used software such as itunes. After collection of all the logs from the targeted software applications, we used a self-created script to parse the logs and locate information that was read from the registry and then transmitted over the Internet. We can use these results to create filters for applications so as to protect users with these programs installed. 3.2 Building the Filtering Suite We divided our concentrations about how to make the filter into three portions, determining how to filter the benign application, how to design the program filtering program and finally the actual development of the filtering program. 3.2.1 Determining How to Filter The Windows API enables interception of Windows API calls by a mechanism called Hooking. Hooking is enabled on all current installations of Windows Operating Systems. However, clear documentation and guides on how to accomplish it have not been updated frequently on the Internet since early in the 2000s. This information is well known by virus, key loggers, malware and other malicious software developers, because it is obvious that they would also wish to employ this mechanism to record personal information or to manipulate the system in their desired way. New information on the subject is not readily available for those wishing to learn more on the subject. The first obstacle to overcome in building the suite was determining which mechanism we would implement. The three main categories of Hooking, the term Microsoft uses for capturing Windows API calls, are: self hooking, user level hooking, and kernel level hooking. Self hooking is when a program filters itself. This method is of no use in our suite's intended goals. Kernel level hooking captures all system calls and therefor can cause extreme system slowdown. Not only that, but if any errors occur in the injection or filtering of the API calls, it can cause the Windows operating system to become non-functional. The last option is what is called user level injection. This is when a program can inject code into another program with the same user rights.
Figure 1 After determining the best level of injection we looked to find the best method for actually injecting the filtering code into the leaking benign program. We determined that the optimal method for injecting our code was to use a method by the name of import address table (IAT) table altering. Windows executables have a standardized format called the Portable Executable (PE) 3.2.2 Design of the Filtering Suite The hook system that we chose is outdated and contains many deprecated functions and had to be edited to compile in current development environments. After altering the code to the point where it compiled properly, ignoring warnings about deprecated function calls, we had a semi complete framework which to build our filtering suite off of. The suite has 4 programs to it, including a test application for development purposes. The other three portions are a device driver that captures creation of new processes. A hook server which receives the notification of when a new process is created and then alters the IAT of the filter program to include the address to the Hook DLL. The Hook DLL is where the actual filtering takes place. This is the code that will be inserted into the benign application. The code has to be in a DLL because of Microsoft's decision to only allow DLLs to be inserted into other applications. A copy of the hook DLL's code is created in a separate thread than the process it is being attached to. The fact that the filtering code is in a separate thread than the original process creates a race condition where the first portion of the filtered application can process before the filtering thread starts to be executed. Windows file format showed in Figure 1. The Import Table is a list of functions that windows must translate to the address of corresponding external dynamic linked library (DLL) functions. By altering the IAT we can link to our own DLL and thereby enable filtering of Windows API function calls. After determining that our suite was going to insert our filtering code by this method we found a previously written suite that fulfilled all the requirements we had came to to this point so we extended this code to meet our needs[8]. 3.2.3 Implementing Filtering Each of the three portions of the windows hooking suite: the device driver, the hook server and the hook DLL needed to be updated so that it would compile properly. The project files for these parts had over 100 deprecated function names in each project. Also, the headers were malformed and needed to be corrected before compilation would complete properly. The most important improvement we made to the suite was to implement a filter, filterlist and filterfileaccess classes to be inserted into the design of the suite's framework. This is a key addition to the previous project because previously the filter DLL could only statically filter Windows API calls. This allows a high
degree of customization and allows different programs to be filtered in different ways. We created three classes to be included in the code of the hook DLL, shown in Figure 2. The first is filter and is a container for each object's private information with get_ and set_ methods for accessing private information. The next class is the filterlist and creates a data structure to store and quickly search through the filters. The third class is filterfileaccess and allows the saving of a configuration file containing ASCII string represented filters. Finally we also created a test application that accessed several key registry values containing the user name, host name, processor specifications. It also uses the getversion API and the getsystemtime API call to get the version of Windows running and the current date. 4. EVALUATION For the identification of benign applications that leak personal information on to the Internet we were able to create a script which would parse both Process Monitor's and HTTPAnalyzer's logs. This script is able to properly identify values that were read from the registry and then transmitted by HTTP with or without SSL encryption. This script can cause some false positives of values of all zeros which are both read from the registry and transmitted but might be coincidental. The script does properly detect when an application does access the registry for personal information and then transmit it across the Internet. The test application that we crated was able to be filtered when Privacy Scrubber was activated. Our application accessed several unique or semi-unique locations in the Registry, and for each that we created a filter, the values were filtered and the application displayed our desired output instead of the original values. Figure 3 The only value that we do not filter in the test application, shown in Figure 3, is the second one asking for the VendorIdentitifier. The rest we use different filtering methods. We can use static filters hard coded into the DLL, such as with the first and fifth tests. We do not filter the second test to prove that we are not altering every registry call. The 3 rd and 6 th tests are dynamically loaded filters with a static string set as the output replacements. The 4 th test uses a dynamic filter with the output set to be randomized for each access. Out of the programs that we tested. Only the itunes software read from the registry and then transmitted it. It reads the registry from four locations, listed in Table 1, multiple times each and then transmits the host name when a user logs into the online store-front from inside the itunes client. Table 1 HKLM\System\CurrentControlSet\Control\ComputerName\ ActiveComputerName\ComputerName HKLM\System\CurrentControlSet\Services\Tcpip\Paramet ers\hostname HKLM\System\CurrentControlSet\Control\ComputerName\ ComputerName\ComputerName HKCU\Software\Microsoft\Windows Media\WMSDK\General\ComputerName It appears at first glance that common software applications are less commonly accessing personal information from the registry and then transmitting it across the Internet. However, this does not mean the programs we test are not accessing personal information and transmitting it. It is possible that they already have anti-monitoring systems built in to themselves such as, reading registry values then encrypting them inside the program before it is sent to the Windows API for Internet access. There are some limitations to what our Privacy Scrubber will be able to protect against. Firstly, It does not account for any possible anti-iat redirection methods. It also does have an issue where if a API call is made near beginning of an application's execution then sometimes those calls will not be filtered. We assume this is do to a possible race condition of the process's main thread and the thread containing the hook DLL. This will need to be looked into further. Our current screening process lacks enough control to be very useful for extensive testing without tedious hand changing of registry values. This is why we believe using an extended version of Privacy Scrubber to also test software might provide a better platform from which to find applications which violate our desired prohibition on leaked personal information. 5. Conclusion Our project obtained three goals. The first was to create a filter standard which can be used to filter many(if not all) Windows API calls. The second was to create a method for testing for when an applications access personal information by using the Windows API and then transmit that data over the Internet. We accomplished this by using 2 commercially available programs and a custom parsing script to compare the logs. The third goal we achieved was to determine the best way to filter access to the Windows API and to develop a way of implementing our determination.
6. FUTURE WORK The research here has laid the groundwork for future research in two areas, improving and testing Privacy Scrubber further and adapting the Privacy Scrubber suite to be used for detection of leaked personal information for the goal of creating filters and auditing software 6.1 Further Privacy Scrubber Privacy Scrubber's code base needs to be rewritten with current development practices and without the large number of deprecated functions that exist in currently. We can also optimize the code for the injected filter DLL when it creates the list of filters loaded into that instance of the DLL. Other issues to address are the race condition that occurs at the beginning of the filtered application's execution. 6.2 Detecting Benign Application Leaks We could see if adapting Privacy Scrubber would allow us to decrease the number of tools required for detecting future benign applications which are leaking personal information. We could implement both the functionality of Process Monitor and HTTPAnalayzer into Privacy Scrubber. We could also build in the functionality to incorporate both functionalities into the same logging format. This could both speed up and simplify the parsing portion of the process. Parsing the logs could also be incorporated into Privacy Scrubber. 7. ACKNOWLEDGMENTS Our thanks to the TRUST program for providing us with an opportunity to participate in this research. We also wish to thank University of California, Berkeley for hosting us during our stay here and providing us with such a productive work environment. Min Gyung Kang and Dr. Dawn Song both deserve our thanks for providing us with such wonderful mentoring and giving us direction for our research. Also special thanks to Dr. Kristen Gates, Beatriz Lopez-Flores and Sheila Humphreys for their support and mentorship in preparing for graduate studies. 8. REFERENCES [1] Garner.com Press Release http://www.gartner.com/it/page.jsp?id=703807 [2] ComputerWorld.com Windows market share dives below 90% for first time http://www.computerworld.com/s/article/9121938/windows_mar ket_share_dives_below_90_for_first_time [3] Jung, J., Sheth, A., Greenstein, B., Wetherall, D., Maganis, G., and Kohno, T. 2008. Privacy oracle: a system for finding application leaks with black box differential testing. In Proceedings of the 15th ACM Conference on Computer and Communications Security (Alexandria, Virginia, USA, October 27-31, 2008). CCS '08. ACM, New York, NY, 279-288. DOI= http://doi.acm.org/10.1145/1455770.1455806 [4] benedelman.org A Closer Look at Coupons.com http://www.benedelman.org/news/082807-1.html [5] Wireshark.com About http://www.wireshark.org/about.html [6] IEinspector.com HTTP Analyzer 5 http://www.ieinspector.com/httpanalyzer/ [7] Microsoft.com Process Monitor http://technet.microsoft.com/enus/sysinternals/bb896645.aspx [8] CodeProject.com API hooking Revealed http://www.codeproject.com/kb/system/hooksys.aspx