A Museum of API Obfuscation on Win32 Masaki Suenaga Senior Software Engineer Contents Abstract... 1 File Image vs. Memory Image... 2 API Analysis... 4 Generating Memory Dumps... 5 Runtime API Address Resolution... 7 Basic API Obfuscation... 10 Advanced API Obfuscation... 13 Conclusion... 19 References... 20 Abstract Antivirus software vendors attempt to identify threats by unpacking suspicious samples and hence aim to produce as many unpackers as possible. When characteristic portions of a successfully unpacked sample are identified, the sample can be tagged and detection added. This procedure is commonly used for variants of well-known malware families and does not require the analysis of Windows API calls made by the sample. In contrast, in-depth analysis of packed threats requires the knowledge of the API functions called during execution. When a sample cannot be unpacked, memory dumps may be used to provide insight into its behavior. Such dumps are frequently detailed enough to allow analysts to ascertain whether a particular sample is malicious, although they may not be suitable for analysis using software disassemblers; executable section headers need to be adjusted in order for the software to resolve API calls and avoid problems that may be caused by incorrect file alignments. API calls may be obfuscated in a number of ways, with some examples being: Setting up multiple memory segments containing jump relays Copying initial instructions from API functions into malware modules and subsequently jumping into the body of the API Copying API instructions interleaved with redundant instructions Analysis may be made more difficult when these methods are mixed or of a scope that makes the use of manual tools unwieldy. This paper seeks to detail methods that may be used to obfuscate API calls and the tools and techniques that may be used to resolve them.
File Image vs. Memory Image The Portable Executable (PE) format is the file format for executables used in 32-bit and 64-bit versions of Windows (Windows 95, 98, Me, NT 4.0, 2000, Server 2003, XP, Vista, and so on). PE files are not loaded into memory as they appear on disk, instead being loaded according to the information in the PE file header. Unlike MS-DOS, segment registers are not used to relocate code and data, with sections instead being used to hold program code, variable, constant and resource data. Where possible, the PE file format stores these sections in a packed state, holding information as to how much memory will be required as opposed to the actual data itself. Also unlike MS-DOS, Windows assigns a 2GB address space to each newly loaded program. The Role of the Loader When a PE-format file is to be executed, the file handle is passed to the CreateProcess() API function and statically imported DLL files are loaded (EXE and DLL files are both in PE format and are identical apart from a single-bit flag). This initialization is performed by the OS component called the loader, detailed in this section. The loader reads the PE file header and copies it to the 32-bit ImageBase address taken from the header; if this address is in use the loader determines an alternative address. It then reads each section and copies each one to memory. Two distinct alignments may be specified: file alignment and section alignment. If these alignments are different, the loader copies each section to memory according to the section alignment, which in most cases results in the memory layout being different from that of the PE file. If the loader uses a different ImageBase address from that specified in the header as is often the case with.dll files address relocation must be performed. Relative jumps and calls do not need to be changed, but all absolute addresses (calculated during the linking stage of program compilation) must be adjusted by taking the difference between the preferred ImageBase and the chosen ImageBase and altering the addresses accordingly. Nearly all programs import DLLs of some kind. Regular Windows executables are unable to call int instructions (as was possible with MS-DOS) and must make API calls in order to perform useful work. The loader loads the statically imported DLLs, resolves the APIs and writes the addresses of API calls to memory. This process is described in more detail in the following section. With these processes complete, the process memory layout differs from that of the PE file in terms of alignment, section length, address relocation and API resolution. If program execution has begun the variable data section will also be tainted. API Address Resolution in the Loader API calls in programs built with Microsoft Visual Studio are compiled to call [offset32], or mov reg32 [offset32], call reg32. For example (note that addresses will vary): TranslateMessage(&msg); will be compiled and linked as: lea eax, [ebp-20h] push eax call [01001270h] or: mov edi, [01001270h] lea eax, [EBP-20h] push eax call edi Page 2
IDA The address of TranslateMessage() in the example above is stored at the virtual address of 01001270h. When program execution begins, the correct address of TranslateMessage() is written to these four bytes of memory; the PE file contains dummy values. This API resolution is done by the loader before starting the program s code. In order to achieve this, the loader traverses the import directory table (pointed to by the PE header) which specifies which DLLs to load, what API calls to search for and where to write the API function addresses. Interactive Disassembler (IDA) is a software tool used by many virus analysts. IDA is able to report the names of API functions when it finds call instructions leading to Import Address Table (IAT) entries. In other words, IDA is unable to resolve API names if the IAT is not found as specified in the PE header, which is why memory dumps are often unsuitable for analysis with IDA. Development Environments and API Calls Differences in the PE files produced by different development environments can aid the virus analyst. PE files built with Microsoft Visual C usually make API calls using call dword ptr [IAT entry]. Programs written in C++ do the same when calling Windows APIs, but some encoding is applied when calling C++ runtime libraries, for example the new operator will be compiled to call dword ptr [??2@YAPAXI@Z], where??2@yapaxi@z maps to void * cdecl operator new (unsigned int). The name of the function is difficult to ascertain by eye, but IDA is able to perform the decoding and reports operator new when this entry is encountered. Unfortunately, calls to MFC methods such as the destructor void CWnd::~CWnd() (which may be expressed as??1cwnd@@uae@xz) are not imported by name. Instead, a predetermined ordinal number is used, which is 818 within mfc42.dll in this case. Because the program simply calls the 818th function of mfc42.dll, it is impossible to retrieve the function name from this information alone. When IDA detects mfc42.dll and imports by ordinal numbers it is able to display the API function names using a preconfigured internal list; however, IDA cannot resolve the virtual address 0x73D31828 in a memory dump unless it is linked to the 818th export of mfc42.dll. Programs written in C and Delphi and built using Borland products call inside the runtime libraries and then jump to API functions. For example, a call to GetSystemMetrics() will be compiled to call near ptr j_getsystemmetrics, whereas j_getsystemmetrics will be jmp dword ptr [GetSystemMetrics] which resides in the IAT and should be resolved by the loader. IDA labels the entry in the IAT as imp_getsystemmetrics and the called address as GetSystemMetrics(), thus aiding the process of analysis. Programs written in Visual Basic import VB runtime functions via the IAT, but Windows APIs are imported in a slightly different way. The typical method is as follows: 00408094: db urlmon,0 004080A0: db URLDownloadToFileA,0 004080B4: dd 408094 ; offset of urlmon 004080B8: dd 4080A0 ; offset of URLDownloadToFileA 004080BC: dd 040000 004080C0: dd 4092D8 ; offset of a structure 004080CC: URLDownloadToFileA proc near mov eax, dword _ 4092E0 ; initially zero or eax, eax jz short 4080D7 ; If not yet resolved, call VB Runtime library. jmp eax 004080D7: push 4080B4 mov eax, offset DllFunctionCall ; jmp [ imp _ DllFunctionCall] jmp eax Page 3
When URLDownloadToFileA() is first called, a check is made as to whether URLDownloadToFileA() has been already resolved. If not, control is deferred to DllFunctionCall() in the VB Runtime library which resolves the address of the API function (caching it for the next time URLDownloadToFileA() is called) and calls the function itself. As API references will not be resolved until call-time, memory dumps of programs using the above method for API calls are likely to be incomplete. Fortunately for the virus analyst, the names of the API functions appear in ASCII text. API Analysis Analysis of API calls may not be necessary when analyzing variants of familiar threat families, as core functionality is likely to remain chiefly the same across the board, with differences being the network addresses, URLs, file names and various other strings compiled into the code. The knowledge of which API functions are called by an unfamiliar sample, though, is important when attempting to decide whether or not it is malicious. If the API functions called remain a mystery, the presence of certain email addresses, URLs, game-related strings and keystroke logging routines would tend to indicate that a particular sample might be a Trojan horse that attempts to steal information related to online games; however, without the keystroke logging code, such a sample could itself be an online game, and as such API analysis is often required. API Call Analysis vs. API Call Monitoring Some security products monitor the API calls made while a program is executing; by evaluating the call sequences the software can attempt to determine whether a process is malicious or benign. Products such as these often hook the API entry points; some techniques used to avoid detection are detailed in Operating System Interface Obfuscation and the Revealing of Hidden Operations. 1 In contrast, the virus analyst often has only a memory dump to work from, and needs to know what API functions may be called by a sample, i.e. a superset of those that have been called. API call monitoring techniques can only provide the latter. Avoiding Guesswork In addition to the classification of samples into malicious and non-malicious groups, it is also the job of the malware analyst to come to understand, by whatever means, the behavior of a sample. If the API functions called by a sample cannot be deduced, it may be possible to make educated guesses based on the presence of certain parameters and strings. Some API functions such as RegOpenKey() require obvious parameters (for instance, 0x80000001 which is HKEY _ CURRENT _ USER), but others such as GetLocalTime() do not. String parameters may also be good hints as to which API function is being called, for example RegOpenKey() may take Software\ Microsoft\Windows\CurrentVersion. API functions taking arguments other than strings are harder to guess, and ambiguity is introduced. When possible, guesswork is to be avoided during detailed analysis. Following the operation of the loader, the IAT will contain the virtual addresses of all imported API functions; for example, the address 0x77D16017 in the IAT refers to the GetSystemMetrics() API function in certain versions of Windows XP. The obfuscation of API calls hides this information and makes analysis much more difficult. Motivation for API Obfuscation Ambiguity is introduced if strings are encrypted and API calls are obfuscated; the analysis of such a sample is likely to be a more complex and lengthy task than if no such obfuscation were present. Even if an unknown proprietary packer is used (with no corresponding unpacker available to the analyst), the memory dump of a running executable can be used during analysis. Just-in-time string decryption, code obfuscation and API call obfuscation are all techniques used by malware authors to hide their intentions, but API call obfuscation also has legitimate uses. Page 4
An online game in which experience points are calculated by the client application as opposed to on the server is one example of a legitimate use of API obfuscation. Underhanded players may seek to analyze the client program and overwrite certain memory locations in order to change values or inject custom code. The game provider may decide to obfuscate the API calls made by the client program in order to prevent this kind of analysis and hence minimize cheating. API call obfuscation may also be used to prevent code from scrutiny when proprietary algorithms are used, for instance in the case of product registration keys, encryption algorithms and routines that are trade secrets. Meaningless and/or redundant API calls may also be inserted in order to make analysis more difficult. Although intended for legitimate use, commercial packers used to obfuscate program code may also be used by malware authors. Motivation for Overcoming API Obfuscation It is the job of the malware analyst to provide his or her customers with information that may be used to mitigate against threats; sample analysis can provide mitigation information such as TCP/IP ports to close, IP addresses or domains to block at the firewall, registry entries to set or other system changes to make. Malware analysts are also called upon to create free removal tools for certain viruses. Deep analysis of a sample is required for tools such as these to comprehensively and effectively clean all traces of a threat from a compromised computer for all possible differences in system setup and locale. Tools for Overcoming API Obfuscation As previously mentioned, IDA is a disassembly tool used by many malware analysts that supports a C-like internal scripting language, IDC script. IDC script allows the user to automate repetitive tasks, perform lookups and change the way in which information is displayed, including changing the label names on specific addresses. The following sections will discuss the creation of a tool to de-obfuscate the API calls made by a program and generate an IDC script to perform the address renaming within the disassembler environment. Generating Memory Dumps Resolving API Function Names Without the Import Table Providing the addresses of the API functions called by a program have been resolved (either by the loader or the program itself), the function names are relatively easy to look up and it is not necessary to use the import table. For example, if the instruction call [01001480h] appears in a memory dump, the tool may read 77D16017h from this address and then search all memory blocks for the module in which this address appears; it will soon find a match in user32.dll. After parsing the export table of user32.dll the tool can resolve this address to the GetSystemMetrics() API function. Trusting the Import Table The import table from a given memory dump cannot necessarily be trusted; malware authors may remove the import table from memory or may even deliberately construct a fake import table to mislead the analyst (and his or her tools) and hence make the task at hand more difficult. The import table should therefore be erased from the memory dump. Adjusting Image Base and Section Tables IDA is designed to dissemble PE programs in file format, and as such considers information written in the PE header to be correct. If the ImageBase in the header appears as 10000000h, IDA shows the assembler code at this location without taking into account any relocation that might have occurred at load-time, which is especially common with DLLs. If a DLL whose ImageBase is 10000000h is relocated to 12000000h, all operands and data structures that use absolute addresses will be updated to point to the correct virtual addresses. The reloca- Page 5
tion is performed by the loader but the ImageBase information in the PE header is not updated in memory. This discrepancy will cause IDA to function incorrectly, especially when strings are involved. Any tool for resolving API names from memory dumps therefore must take relocation into account. Differences in alignment between file and memory must also be considered; for example, a given executable may have file alignment of 200h and memory alignment of 1000h, meaning that the first section can begin at file offset of 200h (or 400h, 600h etc.), but it is loaded to 1000h (or 2000h, 3000h, etc.) from the header position in memory. When IDA reads a PE file as input it calculates addresses and offsets based on file alignment, which results in a difference of 0E00h (or 0C00h, 0A00h etc.) in addresses for this example. IDA also examines the raw data offsets in each section, which reflects the file alignment. For example, if the raw data offset of a section is 1200h and the virtual address of the section is 2000h, IDA adjusts every address in the section by 0E00h, effectively copying the 1200h to 2000h portion of the section. When a memory dump is being used, however, this section data will already have been copied by the loader; a further addition of 0E00h will produce anomalous results. In order to avoid this the tool must adjust each section s raw address to match the virtual address of the section. Recreating a Missing Header Once an executable is running, the PE header of the.exe file is no longer required; indeed, some programs overwrite their own PE header structure in memory. In some cases the section is truncated while in others overwritten with data that may cause IDA to fail. Even if the PE header is zeroed out or filled with junk data, the tool should be able to create a new catch-all PE header with a single flat section table that covers the entire block of memory used by the executable. This means that section information will no longer be required to analyze a memory dump. Searching Hidden Modules We refer to EXE and DLL programs as modules. An EXE is loaded first, followed by the DLLs specified in its import table; if one of the DLLs requires another DLL to run, this will also be loaded, and so on. The modules are loaded to the appropriate memory block positions, with each block holding a PE header or a section. The space for the process stack also occupies a memory block. When an API function to allocate memory is called, new blocks are assigned as required. During analysis, certain modules can be enumerated using the EnumProcessModules() and GetModuleInformation() API calls; only modules managed by the OS (using a reference counter that is incremented on load and decremented when freed) can be examined in this way. Except for those OSs supporting the NX (No execute) bit, Windows does not require a memory block containing executable code to be registered as executable with the OS. This means that code in any portion of memory can execute if pointed to by the instruction pointer. This system provides a great deal of flexibility when designing program architecture but also room to exploit stack and buffer overflows (as the instruction pointer can be manipulated to point even to a block of memory that has been allocated from the heap). Traditional packers overwrite the packed program code with the unpacked code in place during the unpack operation; many others call the VirtualAlloc() or GlobalAlloc() API functions to allocate memory, store the unpacked code in the newly-allocated block and perform a jump to run the code from there. VirtualAlloc() is generally preferred to GlobalAlloc() because the memory address can be specified by way of a parameter, which means that the need to perform address relocation on the unpacked code is obviated if a memory block can be allocated at the desired address. Occasionally it is necessary to search all memory blocks to find a hidden module. Resolving Names of API Calls made by Injected Threads A program (or more strictly, process) cannot usually access the address space of another, as was possible with Page 6
MS-DOS; a process cannot simply write to an area of memory in which, say, Internet Explorer is running, as the virtual addresses mappings used by each process will differ. The API function VirtualAllocEx(), however, allows a block of memory belonging to a different process to be allocated; code can then be copied into this area and executed by way of the CreateRemoteThread() function. Code injected into Internet Explorer in this way is, to all intents and purposes, part of Internet Explorer itself, and as such is able to operate with the same privileges as the original program. In the case of Internet Explorer this is likely to mean being able to bypass any firewall present and access the Internet on port 80. Injected threads generally do not have import table unless they are DLLs; they may instead have their own API resolution routines. Alternatively the calling thread may pass the API function addresses to the injected thread by way of a parameter. It is not a trivial task to write a generic tool to resolve the addresses of API calls made by injected threads. Other Memory Blocks As previously discussed, program code can reside in allocated memory blocks and on the stack as well as in EXE and DLL modules. Other blocks may also be used to hold data or code to relay instructions from the application program to an OS module in order to obfuscate the API calls made. This means that the entirety of memory accessible by a program is required during analysis. Runtime API Address Resolution There are two main types of API obfuscation. In the first type, all API function addresses are resolved before the main routine of the program begins. In the second, API function addresses are resolved individually at call-time. The remainder of this paper will focus on the first kind of obfuscation, but the second kind will also be covered to some extent. Decoding API Function Names using Hashing Shell code in document files deliberately crafted to exploit vulnerabilities usually encodes API functions by hashing API names. A typical algorithm is to add each ASCII character of an API function name to a 32-bit value, performing a bitwise rotation right 13 places for each character. This produces a hash with no collisions in any major system DLLs, making it an easy and safe method of obfuscation. The parameters of the hashing algorithm may also be modified, for example adding XOR operations or altering the number of bitwise rotations applied to each character. Modifications to hashing algorithms result in analysis taking longer to complete as tools/scripts may need to be altered. The following is an example of hashed API address resolution, taken from Trojan.Anicmoo: 0000016F GetAPIaddress proc near 0000016F arg _ 0 = dword ptr 14h ; DWORD checksum value 0000016F arg _ 4 = dword ptr 18h ; virtual address of module (DLL) 0000016F 0000016F push ebx 00000170 push ebp 00000171 push esi 00000172 push edi 00000173 mov ebp, [esp+arg _ 4] ; module handle (== VA of DLL image base) 00000177 mov eax, [ebp+3ch] ; position of PE header 0000017A mov edx, [ebp+eax+78h] ; Export Directory Table 0000017E add edx, ebp ; convert RVA to VA 00000180 mov ecx, [edx+18h] ; number of Name Pointers 00000183 mov ebx, [edx+20h] ; Name Pointer RVA 00000186 add ebx, ebp ; convert RVA to VA Page 7
00000188 00000188 LOOP _ NEXT _ API: 00000188 jecxz short NOT _ FOUND 0000018A dec ecx 0000018B mov esi, [ebx+ecx*4] ; Export RVA 0000018E add esi, ebp ; convert RVA to VA 00000190 xor edi, edi ; clear the checksum 00000192 cld 00000193 00000193 LOOP _ NEXT _ CHARACTER: 00000193 xor eax, eax 00000195 lodsb ; al <-- [esi], then esi++ 00000196 cmp al, ah ; is it zero (null-terminator)? 00000198 jz short END _ OF _ API _ NAME 0000019A ror edi, 13 0000019D add edi, eax 0000019F jmp short LOOP _ NEXT _ CHARACTER 000001A1 000001A1 END _ OF _ API _ NAME: 000001A1 cmp edi, [esp+arg _ 0] ; compare with the parameter checksum 000001A5 jnz short LOOP _ NEXT _ API 000001A7 mov ebx, [edx+24h] ; Ordinal Table RVA 000001AA add ebx, ebp ; convert RVA to VA 000001AC mov cx, [ebx+ecx*2] ; get the ordinal number 000001B0 mov ebx, [edx+1ch] ; Export Address Table RVA 000001B3 add ebx, ebp ; convert RVA to VA 000001B5 mov eax, [ebx+ecx*4] ; get RVA of the API via the ordinal number 000001B8 add eax, ebp ; convert RVA to VA 000001BA jmp RETURN 000001BF 000001BF NOT _ FOUND: 000001BF xor eax, eax 000001C1 000001C1 RETURN: 000001C1 mov edx, ebp 000001C3 pop edi 000001C4 pop esi 000001C5 pop ebp 000001C6 pop ebx 000001C7 retn 000001C7 GetAPIaddress endp 0000011E push eax ; HMODULE (== virtual address) of urlmon.dll 0000011F push 702F1A36h ; checksum of URLDownloadToFileA 00000124 call GetAPIaddress The Backdoor.Darkmoon Trojan horse uses a more complex algorithm to hash API function names. Malware that uses hashes to encode/decode API functions often includes a routine to store the addresses in allocated or stack memory. It is difficult to develop a tool to resolve these API addresses from memory dumps because the memory locations at which the resolved API function addresses are held will vary and where the structures begin may not be clear (often referenced by a register plus an offset such as [ESI + 24h]). Page 8
The Use of LoadLibrary() and GetProcAddress() Strings are commonly encrypted in malicious code to make it more difficult to analyze. In addition, presence of the strings bind, listen, send, recv, RegSetValue, CreateRemoteThread and SetWindowsHook would immediately arouse the suspicions of a malware analyst and increase the risk of the program being detected by antivirus software. Because it is easy to discover which API functions a program calls by examining the import table, the LoadLibrary() and GetProcAddress() API functions are used by malware such as the Spybot family of worms to resolve the addresses of the API calls made by the body of the threat (as opposed to making the calls directly). Although GetProcAddress() is called after the Spybot worm has started, the presence of this call in conjunction with a suspicious parameter such as GetProcAddress( send ) would serve as an obvious indicator of a program s malicious intent; the string parameter therefore must also be encrypted if this technique is to be of any use. W32.Stration, prevalent in 2006 through 2007, is an example of a worm that decrypts Windows API function name parameters to be passed to GetProcAddress() the first time they are called. The addresses of resolved API functions are saved in global variables but not all API calls will be resolved unless every code path is executed, making analysis using memory dumps difficult. When this technique is used, static decryption tools may be more effective. The following example is a routine for API address resolution taken from W32.Stration.CX@mm: 00401EE0 sub _ 401EE0 proc near 00401EE0 00401EE0 var _ 18 = dword ptr -18h 00401EE0 var _ 14 = dword ptr -14h 00401EE0 var _ 10 = dword ptr -10h 00401EE0 var _ C = dword ptr -0Ch 00401EE0 var _ 8 = dword ptr -8 00401EE0 var _ 4 = byte ptr -4 00401EE0 arg _ 0 = dword ptr 4 00401EE0 arg _ 4 = dword ptr 8 00401EE0 00401EE0 mov eax, dword _ 404118 ; saved API address 00401EE5 or byte ptr word _ 40401C, 3Dh 00401EEC sub esp, 18h 00401EEF test eax, eax 00401EF1 jnz short loc _ 401F48 00401EF3 mov eax, ds:dword _ 4010C0 ; 637E7640h 00401EF8 mov ecx, ds:dword _ 4010C4 ; 44657851h 00401EFE mov edx, ds:dword _ 4010C8 ; 7B70797Eh 00401F04 mov [esp+18h+var _ 18], eax 00401F07 mov eax, ds:dword _ 4010CC ; 7D755872h 00401F0C mov [esp+18h+var _ 14], ecx 00401F10 mov ecx, ds:dword _ 4010D0 ; 17637472h 00401F16 mov [esp+18h+var _ 10], edx 00401F1A mov dl, ds:byte _ 4010D4 ; 0 00401F20 mov [esp+18h+var _ C], eax 00401F24 mov [esp+18h+var _ 8], ecx 00401F28 mov [esp+18h+var _ 4], dl 00401F2C xor eax, eax 00401F2E mov edi, edi 00401F30 00401F30 loc _ 401F30: 00401F30 xor byte ptr [esp+eax+18h+var _ 18], 17h ; decrypting 00401F34 inc eax Page 9
00401F35 cmp eax, 14h 00401F38 jl short loc _ 401F30 00401F3A lea eax, [esp+18h+var _ 18] 00401F3D push eax ; WaitForSingleObject 00401F3E call sub _ 401E40 ; get the API address 00401F43 mov dword _ 404118, eax ; save the API address for the next time 00401F48 00401F48 loc _ 401F48: 00401F48 mov ecx, [esp+18h+arg _ 4] 00401F4C mov edx, [esp+18h+arg _ 0] 00401F50 push ecx 00401F51 push edx 00401F52 call eax ; call the API 00401F54 add esp, 18h 00401F57 retn 8 00401F57 sub _ 401EE0 endp Basic API Obfuscation Unlike most MS-DOS viruses, which tended to be written in assembly language, high-level languages are preferred to write malware for Windows. W32.Stration variant worms are written in C and use the method described above to obfuscate API calls (decrypting the API name string when the API function is first called). This requires the malware author to write code to encrypt and decrypt strings and to use these routines to call API functions. Although these tasks can be time-consuming, a skilled analyst can decrypt the strings with relative ease, meaning that home-grown implementations are not particularly prevalent. In many cases API obfuscation can be achieved without writing custom code in C or Delphi by way of the use of software libraries or existing packers. To make use of the former, an application is linked not against the regular import library but instead against another library that adds a layer of misdirection or redundant code before calling regular API functions. Packers, in contrast, typically redirect API calls to custom code following the unpacking operation but before the execution of the program proper has begun. The commonly seen packer UPX operates in this way. Usually the loader resolves the addresses of all the API functions called by a program, but in this case the loader only resolves those used during the unpacking operation; the original program s imported API information is also packed and is resolved by the unpacker instead of by the loader itself. The remainder of this chapter provides some examples of methods used to obfuscate API calls. Staged API Obfuscation A regular API call consists of a single call to the target function. In C: call ds:getsystemtime Or in Delphi: call j _ GetSystemTime j _ GetSystemTime: jmp ds:getsystemtime Not usually seen in regular code, a layer of misdirection can be introduced by constructing a call instruction to call another function, which in turn calls the target API: call call _ GetSystemTime call _ GetSystemTime: mov EAX, ds:getsystemtime jmp EAX Page 10
This redirection procedure is termed a stage ; the example above is a one-stage API obfuscation because it requires a single set of instructions to redirect to the target API function. When many redirections are used the technique is termed multi-stage API obfuscation. Although multi-stage API obfuscation can easily be resolved by a (human) malware analyst, this is not the case for IDA. IDA may display the aforementioned code as: call sub _ 412458 sub _ 412458: mov EAX, ds:getsystemtime jmp EAX A de-obfuscation tool must rename the label sub_412458 to GetSystemTime if it is to be of maximum possible use. If the redirection stage exists in the same module as the one currently being analyzed, finding the address of the system call is no problem for the analyst; there are, however, some cases where the stage component resides in an allocated memory block and therefore out of the scope of the current module under analysis. This may look like: 4010C0: call ds:[402780h] 402780: dd 00370000h It is impossible to know what the above code calls without reading the DWORD value at 370000h; this is why access to all memory blocks used by a program is essential during in-depth analysis. Performing a search for a memory block containing 370000h yields: 370000: mov EAX, 77E7B476h 370005: jmp EAX 77E7B476h appears to be the Virtual Address (VA) of an API function. Searching memory dumps for a DLL containing 77E7B476h and subsequently examining the export table reveals that the function in question is CreateFileA(). Since most DLLs are relocatable it is important that the memory dumps of the same process are searched (as opposed to the files in the %System% directory) in order for the VAs to be correct. When the redirection stage is located outside of the current module, as in the example above, we term it extramodular one-stage API obfuscation. If an API address de-obfuscation tool is able to add the label CreateFileA at VA 402780h, IDA correctly displays: 4010C0: call ds:[createfilea] 402780: CreateFileA: dd 00370000h Extra-Modular Function Tables As discussed in the previous section, an allocated memory block can be used to store a staged program that redirects the flow of execution. Memory blocks of this type can also be used to store API addresses. As an example: 00404A13 call dword ptr ds:0b5a068h ; references a memory block outside the module 00B5A068 dd 7743DE3Ah; SHFileOperationA It appears that the IAT, which should reside in the same module, is located in a different memory block outside of the module. This presents a problem for a de-obfuscation tool; it cannot add a label SHFileOperationA to the address 00B5A068h because it is outside of the range of the module currently being displayed in IDA. The only thing that can be done is to add the comment SHFileOperationA to the address 00404A13h. Page 11
Immediate Jumps As mentioned above, API calls made in Delphi compile down to a call to an address followed by a jmp to the target API function address. These two steps are known as a thunk and can appear in code other than that generated by Delphi build systems. For example, if a thunk is created in an allocated memory block it can be used for API call obfuscation, as in: 004023A8 call ds:label _ 4130B4 004130B4 label _ 4130B4 dd 972030h ; This is outside the current module. 00972030 jmp near ptr 77E5B476h; CreateFileA Since this example has a label at 004130B4h, which is inside the current module, the de-obfuscation tool can rename label_4130b4 to CreateFileA which results in the instruction at 004023A8h correctly being displayed as call ds:createfilea. Jump-in Regardless of the tricks used along the way, if a call eventually reaches the address of an API function the analyst can resolve the obfuscation. For example, if a call of a one-stage API obfuscation reaches 77E7B476h and 77E7B476h is the entry address of CreateFileA(), the obfuscation is resolved. This means that if a full list of all the API function addresses in all the DLLs loaded by a given process is generated, an automated de-obfuscation tool can trace all call and jmp instructions in the current process until one of the API addresses in the list is reached. In order to prevent the use of this method of de-obfuscation a technique termed jump-in API obfuscation is used; the target of a thunk operation is altered to be several instructions after the entry address of the target API function, as in: 00401922 call sub _ 403C08 00403C08 jmp ds:off _ 404090 00404090 off _ 404090 dd offset unk _ 40C4A1 0040C4A1 unk _ 40C54A1 jmp near ptr 0040C4A4h 0040C4A3 db 0EAh ; dummy byte to distract 0040C4A4 push 0 0040C4A6 jmp near ptr 77E41BECh ; API entry (Sleep()) + 2 This example of one-stage API obfuscation does not hit any API function entry points; the Sleep() API function starts at 77E41BEAh, but this is not the address reached by the final jmp instruction. A push 0 instruction can be seen before the final jmp to 77E41BECh; subtracting 2 from this address (i.e. the length of the push 0 instruction) yields the entry address of the API function Sleep(). The first instruction in the Sleep() function is also push 0; the instruction has been copied to ensure that no functionality is lost or skipped when jumping into the middle of the Sleep() routine proper. It may be possible to de-obfuscate API calls made in the manner above by simply subtracting the length of the instructions executed before the jump and adjusting the target address appropriately, but redundant or dummy instructions may have been inserted (such as jmp $+1or db 0EAh, as seen in the example). The redundant code here is three bytes long, but it is non-trivial to determine whether an arbitrary sequence of instructions is redundant or not, and an over- or under-estimation will result in the calculated address of the target API function entry point being incorrect. One way around this problem is simply to select the nearest API addresses; if a tool can suggest an approximate adjustment this may be enough. Page 12
Jump-in obfuscation may originally have been developed to prevent API call monitoring tools from functioning; in the above example the first address of the Sleep() API function (as exported by kernel32.dll) is never executed and as such any monitoring tool will never trigger if only the first instruction is hooked. Advanced API Obfuscation API obfuscation has evolved to such an extent that a simple tool can no longer fully resolve all obfuscated API calls. Commercial packers such as ASProtect, Enigma, Themida and Obsidium are being armored with evermore sophisticated API obfuscation techniques, and in many cases make use of several obfuscation techniques simultaneously. To de-obfuscate methods such as these, many CPU instructions have to be emulated; it would take many months for an analyst to perform the task by hand. As such a timescale is not acceptable, automated emulation tools can be used, but this approach also has drawbacks. Some examples of advanced API obfuscation techniques appear in the following sections. Logic Stage and Skipper Stage Obfuscation As detailed above, staged calls can obfuscate API calls to some extent, but these can usually be resolved as there are clear patterns involved. A more complex technique is termed logic stage API obfuscation ; although a pattern is still evident, there is a sequence of instructions that must be executed in order to reach the target address. A logic stage does not necessarily require emulation of code to de-obfuscate the call as the code is redundant and does not affect the flow of execution of the program proper. Logic stages are sometimes armored with a return address skipper: while calculating the target address to jump to, the logic also rewrites the return address on the stack, usually adding one to it. This is termed skipper stage API obfuscation. In many cases a logic component and a skipper component share a stage, as in: 00D50000 sbb ecx,61h ; meaningless instruction 00D50003 jmp short 00D50006h 00D50005 db 0E9h ; placed to obfuscate in disassembler 00D50006 mov ecx, 486366h ; meaningless instruction 00D5000B pop eax ; return address 00D5000C lea eax,[eax+1] ; return address += 1 00D5000F push eax ; return address is now incremented (skipper) 00D50010 push 0D40000h ; the address of the next stage 00D50015 retn ; jump to the next stage This type of obfuscation can be seen in Backdoor.Graybird, which contains code as below: 00411000 push eax 00411001 call ds:[420008h] ; points to logic and skipper stage 00411007 db 0E9h ; a skipped byte 00411008 or eax, eax ; instruction pointer returns here from the call The instructions from 411007h onwards are correct because the skipper stage was noticed and the addresses adjusted accordingly. This adjustment could not have been made by a disassembler alone; the instruction at 411007h would likely have been interpreted as call XXXXXXh (with XXXXXX being a meaningless address), meaning that the flow of the program would appear incorrectly. A de-obfuscation tool must detect the skipper stage and return address adjustment, recognize which API function is being called and add a label to 420008h; it must then undo its analysis from 411007h and reanalyze from 411008h onwards. Copied and Substituted Obfuscation ASProtect is a packer that is often used to obfuscate API calls. One of its obfuscation methods is to copy the whole body of the target API function into memory owned by the process. This is termed copied API obfuscation, for example: Page 13
01230000 mov eax,fs:[18h] 01230006 mov ecx,[eax+30h] 01230009 mov eax, word ptr [ecx+0b0h] 0123000F movzx edx, word ptr [ecx+0ach] 01230016 xor eax,0fffffffeh 01230019 shl eax,0eh 0123001C or eax,edx 0123001E shl eax,8 01230021 or eax,[ecx+0a8h] 01230027 shl eax,8 0123002A or eax,[ecx+0a4h] 01230030 ret The above code is taken from an allocated memory block and is called from the main program. Since it does not reference any system DLL it is impossible to tell directly from the address what API function was called, or indeed whether it represents an API call at all. Searching kernel32.dll for this block of code yields the result that the entire block matches the GetVersion() API function. Although it can be time consuming to search existing DLLs for identical snippets of code, it is not difficult; the code can be found verbatim. A more sophisticated approach to this method of API obfuscation is to copy the whole or partial API routine and insert some additional code to fix up the address displacement. For example, the jz conditional jump instruction takes two bytes of memory if the distance to jump is between -128 and +127 bytes of the current operation. If a routine containing this instruction is copied, this distance is likely to increase and as such the jz instruction will require six bytes of memory; the obfuscation routine is able to adjust the code accordingly. This means that any tool capable of de-obfuscating techniques such as these must be able to compare blocks of code in terms of underlying logic as opposed to surface structure. The following example illustrates this necessity: (Code copied from kernel32.dll into the main program): 00402000 6A 00 push 0 00402002 FF 74 24 08 push [esp+8] 00402006 E8 1A 83 A5 77 call SleepEx @ 77E5A325 0040200B C2 04 00 ret 4 (Code as in kernel32.dll): 77E41BEA 6A 00 push 0 77E41BEC FF 74 24 08 push [esp+8] 77E41BF0 E8 30 87 01 00 call SleepEx @ 77E5A325 77E41BF5 C2 04 00 ret 4 The above examples are both from the Sleep() API function but a simple binary comparison will fail to match because the machine code for call SleepEx differs. The semantic comparison of code can be computationally expensive and hence time-consuming; an optimization to improve performance is to enumerate system DLLs, ordering on frequency of use (for example, kernel32.dll > advapi32.dll > user32.dll > shell32.dll), and perform the search based on this ranking. Another example of copied API code is as follows, termed substituted API obfuscation : 00F40000 00F40006 00F40009 mov eax,fs:[18h] mov eax,[eax+34h] ret Page 14
The above code can be found in the function RtlGetLastWin32Error() in ntdll.dll; in fact, the often-used GetLastError() API function in kernel32.dll simply redirects to this function. It would be optimal if a de-obfuscation tool were to be aware of this substitution and insert the more commonly used API call as a label in the appropriate place. Push-ret and Push-calc-ret Obfuscation The simple technique of pushing the target API address to the stack and executing a ret instruction is termed push-ret API obfuscation ; however, some enhanced versions are as follows: 003C80B0 call dword ptr [3E82B8h] ; calls 17B000Dh 003E83B8 dd 17B000Dh ; in another memory block 017B000D push 3E62B8CDh 017B0012 sub dword ptr [esp], 0CCC079FFh ; = 71A23ECEh (bind()) 017B0019 ret The code in the example above pushes an immediate value of 3E62B8CDh on to the stack, then subtracts 0CCC079FFh to yield 71A23ECEh; this is the address of the bind() function from the Windows implementation of the Berkley sockets API. The de-obfuscation tool should be aware of this technique and calculate the value on the top of the stack just before the ret instruction is executed. Since this technique requires calculation between push and ret it is termed the push-calc-ret API obfuscation. An example of further enhanced push-ret API obfuscation, seen in certain Trojan horse programs with back door functionality, is as follows: 004014DA mov esi, offset unk _ 404907 ; stores DWORD-value list 004014DF push dword ptr [esi+30h] ; pushes 8DC82618h 004014E2 push loc _ 4014ED ; return address 004014E7 push loc _ 4010A4 ; call destination 004014EC ret ; calls 4010A4h 004014ED <next instruction> ; returns here 004010A4 mov edx, [esp+4] ; 8DC82618h <-- came from [esi+30h] 004010A8 mov ecx, [esp+0] ; 004014EDh (return address) 004010AB add esp, 8 004010AE ror edx, 0FAh 004010B1 sub edx, dword _ 404027 ; == 0FA23D1ADh 004010B7 push ecx ; returning address 004010B8 push edx ; API address of CreateFileA 004010B9 ret ; jumps to CreateFileA The block of code from 4010A4h onwards is common to all API calls made by the program. The code from 4014DAh has clearly not been compiled from a high-level language, and as it does not explicitly call anything it initially appears as though no API calls are made. The offset from the stack pointer, however, is actually used to access the API, with [esi+30h] being CreateFileA() and [esi+34h] also referring to another API function. A deobfuscation tool must be able to recognize the following sequence of instructions: mov esi, xxxx push dword ptr [esi+xx] push xxxx push xxxx ret Page 15
Some simple address arithmetic must then be performed to calculate the address of the API function. The tool can then insert an IDA comment containing the function s name. Padded and Copied API Obfuscation (Themida) Certain obfuscating packers, such as Themida and Enigma, copy code from API functions and in addition interleave redundant instructions in with the copied code. They may also replace blocks of code with equivalent but longer sequences of instructions. It is not known where this technique originated but its use was observed in the Themida packer in 2005 to 2006. The API obfuscation techniques used by Themida are explained in the presentation Analysis and Visualization of Common Packers. 2 The following is an example of these methods: 00401B77 call 2930000h... 02930000 push edx ; making room for EBP 02930001 push eax ; save EAX 02930002 push edx ; save EDX 02930003 jmp 293000Eh... 0293000E rdtsc ; destroys EDX:EAX 02930010 jmp 2930029h... 02930029 pop edx ; restore EDX 0293002A pop eax ; restore EAX 0293002B mov [esp],ebp The code block starting at 02930000h contains junk code from 02930001h to 0293002Ah. Removing this block yields the following: 02930000 push edx ; making room for EBP 0293002B mov [esp],ebp Patterns of code replacement can be exploited during analysis. A de-obfuscation tool may have an internal replacement table which maps obfuscated sequences of instructions to their more concise counterparts, for example mapping push reg32(1), mov [esp], reg32(2) to push reg32(2).the example above can be replaced with the more simple push ebp, with the remaining task being to follow the steps necessary to resolve the jump-in obfuscation as previously detailed. Padded and Copied API Obfuscation (Enimga) The Enigma Protector obfuscates API calls in a similar way to Themida but introduces more complexity: 00401753 call dword ptr ds:973245h ; it points to 974819h... 00974819 call 97481Fh ; 0097481E push esi ; dummy instruction 0097481F call 974827h ; 00974824 jmp 97482Ah 00974826 db 15h ; dummy code 00974827 ret 4 ; 0097482A add esp, 5C9099Bh 00974830 mov [esp-5c909fh],esi ; mov [esp-4], esi 00974837 add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4) 0097483D call 974846h ; 00974842 db 80h, 0DEh, 9Dh, 70h ; dummy code 00974846 add esp,4 Page 16
It is necessary to examine the code carefully to observe how the stack pointer is manipulated. Following the second call instruction, the stack pointer is decreased by 8 to store the two return addresses. The ret instruction at 00974827h not only returns, however, but also adds 4 to the stack pointer, resetting it to its original state. The block of code from 00974819h to 00974827h is redundant and can be deleted. The block from 0097483Dh to 00974846h can also be removed. Removing these blocks yields: 0097482A add esp, 5C9099Bh 00974830 mov [esp-5c909fh],esi ; mov [esp-4], esi 00974837 add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4) The adding of such a large number to the stack pointer may cause alarm. This sequence of instructions is likely to cause problems in an MS-DOS environment as an interrupt may occur before the stack pointer is restored, resulting in stack corruption. This, however, is not a problem for user-space executables in the Windows environment, and as such these three instructions are the equivalent of the shorter push esi. As with Themida, patterns of junk code and code replacement can be observed. If a list of possible patterns is available, an automated tool should be able to reconstruct the (possible) original code and compare it with routines in system DLLs, thus locating the called API function. Splicing Intensive Instructions to Provide Obfuscation (Obsidium) Sometimes emulation of code is required to obtain target API addresses. One such situation is the obfuscation technique in which all API calls are replaced by jumps to a common routine which dispatches the flow of control to the various target API functions. Disambiguation is performed using the EDX register, which is set prior to the jump to the dispatch routine and, following some calculations, is used to index into a table of API function addresses. Another similar obfuscation technique is to determine the address of the target API function by way of the address from which the common dispatch routine is called. Emulation is often effective in dealing with these kinds of API obscuration, but can be time-consuming and occasionally yields wrong answers. This means that it is best considered to be a last resort, employed only when the use of conventional analytical techniques proves impossible. Obsidium is a packer that requires emulation to resolve its API obscuration. It installs a custom Structured Exception Handler (SEH) and intentionally executes erroneous instructions in order to jump to this code, as shown in the following example: 008B6037 55 push ebp 008B6038 8B EC mov ebp, esp 008B603A 81 EC 30 01 00 00 sub esp, 130h 008B6040 EB 04 jmp short 008B6046 008B6046 60 pusha 008B6047 EB 04 jmp short 008B604D 008B604D 9C pushf 008B604E EB 03 jmp short 008B6053 008B6053 EB 04 jmp short 008B6059 008B6059 E8 00 00 00 00 call $+5 (008B605E) 008B605E EB 01 jmp short 008B6061 008B6061 5E pop esi 008B6062 EB 03 jmp short 008B6067 008B6067 EB 01 jmp short 008B606A 008B606A 8B 96 64 03 00 00 lea edx, [esi+364h] 008B6070 EB 04 jmp short 008B6076 008B6076 33 C0 xor eax, eax 008B6078 EB 03 jmp short 008B607D 008B607D 52 push edx 008B607E EB 01 jmp short 008B6081 Page 17
008B6081 64 FF 30 push dword ptr fs:[eax] 008B6084 EB 01 jmp short 008B6087 008B6087 64 89 20 mov fs:[eax], esp 008B608A EB 01 jmp short 008B608D 008B608D EB 03 jmp short 008B6092 008B6092 EB 02 jmp short 008B6096 008B6096 EB 36 jmp short 008B60CE 008B60CE EB 01 jmp short 008B60D1 008B60D1 8B 54 24 30 mov edx, [esp+30h] 008B60D5 EB 01 jmp short 008B60D8 008B60D8 EB C1 jmp short 008B609B 008B609B EB 02 jmp short 008B609F 008B609F F7 C2 01 00 00 00 test edx, 1 008B60A5 EB 04 jmp short 008B60AB 008B60AB 74 0C jz 008B60B9 008B60AD EB 04 jmp short 008B60B3 008B60B3 0F 0B ud2 ; undefined opcode 008B60B5 EB 02 jmp short 008B60B9 008B60B9 EB 03 jmp short 008B60BE 008B60BE F7 F0 div eax ; division by zero Removing short jumps, the above code is as follows: 008B6037 55 push ebp 008B6038 8B EC mov ebp, esp 008B603A 81 EC 30 01 00 00 sub esp, 130h 008B6046 60 pusha ; push EAX,ECX,EDX,EBX,ESP,BP,ESI,EDI 008B604D 9C pushf ; push EFLAGS 008B6059 E8 00 00 00 00 call $+5 (008B605E) 008B6061 5E pop esi ; esi = 008B6061h 008B606A 8B 96 64 03 00 00 lea edx, [esi+364h] ; edx = 008B63C5h 008B6076 33 C0 xor eax, eax 008B607D 52 push edx ; exception handler address (008B63C5h) 008B6081 64 FF 30 push dword ptr fs:[eax] 008B6087 64 89 20 mov fs:[eax], esp 008B60D1 8B 54 24 30 mov edx, [esp+30h] ; value from uninitialized stack variable 008B609F F7 C2 01 00 00 00 test edx, 1 008B60AB 74 0C jz 008B60BE 008B60B3 0F 0B ud2 ; undefined opcode 008B60BE F7 F0 div eax ; division by zero A CPU exception occurs when execution reaches either 008B60B3h or 008B60BEh. The exception is handled by the SEH at 008B63C5h, shown here: 008B63C5 EB 03 jmp short 008B63CA 008B63CA E8 00 00 00 00 call $+5 (008B63CF) 008B63CF EB 02 jmp short 008B63D3 008B63D3 5A pop edx 008B63D4 EB 01 jmp short 008B63D7 008B63D7 8B 8A 95 FB FF FF mov ecx, [edx-46bh] 008B63DD EB 04 jmp short 008B63E3 Page 18
The exceptions that will be encountered when debugging the above code are likely to be extremely distracting. This, coupled with the jumps to random locations and the fact that some 100,000 instructions must be emulated before the target API function is reached, makes the analysis of the above code a troublesome and difficult task. A further complication exists in that the obfuscated code deliberately calls ret from inside an OS DLL, thus rendering useless the technique of checking all instructions for references to system libraries. A de-obfuscation tool must be able to recognize these kinds of dummy call ret sequences and avoid flagging them as significant. 16-bit Addressing Obfuscation A de-obfuscation tool that includes emulation of instructions must support 16-bit addressing. This initially seems counter-intuitive given the 32-bit Win32 environment; the 0 to 0FFFFh address range is not normally visible from user-space, and indeed access to address 0 will cause a page fault exception. In SEH code, though, address 0 is accessible by way of fs:[eax], used to map the thread information block (TIB) with an offset of 0. The following code is an example of this operation: 64 67 FF 36 00 00 push dword ptr fs:[0] ; 67 changes from 32 bit to 16 bit mode 64 67 89 26 00 00 mov fs:[0], esp This sequence of instructions is not seen very often in user-space code given that 32-bit mode is central to the architecture and operation of the underlying OS. A de-obfuscation tool must support 16-bit addressing in order to avoid missing API calls made from this mode. Conclusion Uncovering obfuscated API calls is a difficult task given the wide range of obfuscation techniques that can be used and combined to hide a program s functionality. Emulation centered on the call instruction may initially seem to be an effective method of de-obfuscation but suffers from the disadvantage of being defeated by the copying of API code and may yield false positives when the emulated instruction pointer reaches an OS module. Emulation can also be time-consuming and as such may not be the best choice in situations in which results are required in a timely manner. It is therefore necessary to design a modular de-obfuscation tool able to deal with the myriad of techniques described in this paper. Page 19
References 1. Operating System Interface Obfuscation and the Revealing of Hidden Operations Abhinav Srivastava, Andrea Lanzi, Jonathon Giffin www.cc.gatech.edu/research/reports/gt-cs-08-09.pdf 2. Analysis and Visualization of Common Packers Ero Carrera http://nchovy.kr/uploads/3/301/d1t1%20-%20ero%20carrera%20-%20analysis%20and%20visualization%20of%20common%20packers.pdf Page 20
This paper was originally presented at the AVAR2009 Conference. For more information on AVAR, please visit https://www.aavar.org/. NO WARRANTY. The technical information is being delivered to you as is and Symantec Corporation makes no warranty as to its accuracy or use. Any use of the technical documentation or the information contained herein is at the risk of the user. Documentation may include technical or other inaccuracies or typographical errors. Symantec reserves the right to make changes without prior notice. About the author Masaki Suenaga is a Senior Software Engineer with Symantec Security Response. About Symantec Symantec is a global leader in providing security, storage and systems management solutions to help businesses and consumers secure and manage their information. Headquartered in Cupertino, Calif., Symantec has operations in more than 40 countries. More information is available at www.symantec.com. For specific country offices and contact numbers, please visit our Web site. For product information in the U.S., call toll-free 1 (800) 745 6054. Symantec Corporation World Headquarters 20330 Stevens Creek Blvd. Cupertino, CA 95014 USA +1 (408) 517 8000 1 (800) 721 3934 www.symantec.com Copyright 2009 Symantec Corporation. All rights reserved. Symantec and the Symantec logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.