Introduction
Reverse engineering and tool development expert David Zimmer (dzzie, http://sandsprite.com) recently released remote_lookup, “a small tool which can scan a 32bit process and build an export name/address map which can be queried”. The tool is available on FireEye’s GitHub page at https://github.com/fireeye/remote_lookup.
remote_lookup can significantly speed up the analysis of malware that obfuscates its access to Windows API functions. David covers common malware techniques for such obfuscations in the blog post available at https://www.fireeye.com/blog/threat-research/2017/06/remote-symbol-resolution.html. The post also talks about the motivation of remote_lookup, common analysis techniques (and their shortcomings), the tool’s functionalities and how it can accelerate reverse engineering of obfuscated malware. I highly recommend reading his post – especially if you are going to continue reading this tutorial.
This tutorial will show you step-by-step how you can use remote_lookup based on the analysis of a real-world malware sample (MD5 hash C6A2FB56239614924E2AB3341B1FBBA5).
If you want to follow along, these are the involved files:
- Dropper: https://virustotal.com/en/file/92ad1b7965d65bfef751cf6e4e8ad4837699165626e25131409d4134f031a497/analysis/
- Ransomware payload: https://virustotal.com/en/file/d174f0c6ded55eb315320750aaa3152fc241acbfaef662bf691ffd0080327ab9/analysis/
This post will include some scripts that you can use to integrate your debugger (I’m using OllyDbg as an example here) and remote_lookup. I hope this will make it easier for you to leverage this cool project by the FLARE team.
Setup
I performed my analysis on a 64-bit Windows 7 system running in VMWare workstation.
The project’s GitHub repository, https://github.com/fireeye/remote_lookup, comes with precompiled binaries in the bin directory. The directory contains the main binary remoteLookup.exe and the three dependencies MSWINSCK.OCX, procLib.dll, and sppe.dll. If you run remoteLookup.exe with Administrator privileges the tool will register the dependencies on the first run.
Usage
remoteLookup.exe provides a straight-forward user-interface. The first step when using the tool is to select the process we are interested in analyzing in the top-left corner.
The resulting dialog allows us to manually find the process we are interested in.
Additionally, we can search for processes in the search bar at the bottom left.
After selecting a process, remoteLookup.exe shows the selected process’ loaded modules, their base addresses, and the number of exports per module.
Analysis
The malware with MD5 hash C6A2FB56239614924E2AB3341B1FBBA5 is a dropper that contains a ransomware component encrypted in its resource section PIC\104. The dropper decrypts the resource, writes it to a file in the %APPDATA% directory, and executes the file. The ransomware component (MD5 hash A0A7022CAA8BD8761D6722FE3172C0AF) obfuscates its Windows API calls as David describes in his post. To recap his findings: as part of its initialization, the malware contains a function that resolves Windows API calls based on hash values. The graph overview of the function that performs the hash lookups makes for a beautiful stairway to (malware) heaven.
The malware stores the resolved addresses XOR encoded in its .data section. Just before an API is about to be called, the malware restores the resolved address at run-time. Further details of this will be discussed below.
Via static analysis we can see that the ransomware only imports two functions from kernel32.dll: Sleep and CreateEventW. When we pause the malware at its entry point remote_lookup shows the following output.
We can see kernel32.dll loaded from the SysWoW64 directory because we are running a 32-bit application on a 64-bit operating system. ntdll.dll and KERNELBASE.dll are dependencies of kernel32.dll. The number of exports indicates that remote_lookup resolves all modules’ exports.
The malware’s first function at offset 0x0E8EA0 initializes many of the malware’s components. This includes the hash-based API lookups discussed earlier. After stepping over the initialization function we use remote_lookup again to select the malware process.
This time we get a different output indicating that many more modules are loaded into the process’ address space now. Note that not all modules necessarily are directly imported, but can be dependencies of other loaded modules.
In the tool’s title bar, we see that 26 additional modules have been loaded providing potential access to a total of 11,560 functions.
The tool now allows us to query for function names or specific addresses. For example, if we want to know at what address WriteFile from kernel32.dll is available in the current process memory, we can type the function name in the query field.
More useful for our analysis will be to resolve addresses to their respective function names. Let’s look at how the malware restores the resolved API address just before calling it. The following disassembly stems from the second function call (0xEDE00) in the malware.
Offset 0x000F2390 stores the resolved Windows API offset (0x43C1FBF5) XORed with the hard-coded value 0x43C1FBF5 (stored at offset 0x000F10A4). The malware performs the XOR operation directly before performing the call at offset 0x000EDE14. The ransomware uses this obfuscation pattern for all its Windows API calls.
Performing the XOR operation on the two values 0x43C1FBF5 and 0x353FB1D8 results in the value 0x76FE4A2D. Searching for this value in remoteLookup.exe tells us that this address translates to the function call HeapCreate exported by kernel32.dll.
We can now annotate our IDB accordingly. I recommend the ApplyCalleeType plugin for IDA Pro, available at https://www.fireeye.com/blog/threat-research/2015/04/flare_ida_pro_script.html (see my recommended tools at http://moritzraabe.de/2017/02/28/malware-analysis-tools/).
Here it is easy to see that this function reserves 0x400000 bytes of heap space.
Performing this procedure for every function call obviously is not practical. Because of this remoteLookup.exe supports bulk analysis. David shows examples of this in his blog post using IDA Jscript. Here I am going to show similar steps in case you don’t have IDA Jscript set up. All you need is a debugger.
remoteLookup.exe supports four import formats for bulk analysis:
- Hexadecimal memory address,
- case insensitive API name,
- DLL name and ordinal export (e.g. ws2_32@13), and
- DLL name and function name (e.g. ntdll!atoi or msvcrt.atoi)
Since we are interested in the translation of memory address to function name I chose to use hexadecimal memory addresses. So, the first order of business is to create a file that we can feed to remoteLookup.exe.
In this example the table of XOR encoded function pointers starts at offset 0x000F21DC and ends at offset 0x000F248C.
I changed OllyDbg’s view to Long – Address and copied the values to a text file.
The resulting text file starts like this.
$ head xor_encoded_pointers_table.txt
000F21C0 00014000
000F21C4 000E0000 A0A7022C.000E0000
000F21C8 353FB2E6
000F21CC 00000000
000F21D0 00000000
000F21D4 00000000
000F21D8 00000000
000F21DC 353E35DB
000F21E0 353FEFE5
000F21E4 353FE977
I used the following script to parse the text file and calculate the actual resolved API addresses.
from __future__ import print_function
import sys
def main():
""" Prints the XOR decoded addresses read from the input file. """
if len(sys.argv) != 2:
print("Usage: %s " % sys.argv[0])
"""
Example file content:
# Address Value
000F21DC 353E35DB
"""
print("Example: $ python %s xor_encoded_pointers_table.txt > decoded_pointers.txt" % sys.arv[0])
return -1
decoded_pointers = get_decoded_pointers(sys.argv[1])
for decoded_pointer in decoded_pointers:
print("0x%x" % decoded_pointer)
def get_decoded_pointers(filename):
BASE = 0x43C1FBF5
decoded_pointers = []
with open(filename, "rb") as f:
for line in f.read().split("\r\n"):
parts = line.split(" ")
if len(parts) < 2:
continue
value = parts[1]
try:
xorEncodedPointer = int(value, 0x10)
except ValueError:
# "Could not parse %s" % value
continue
decoded_pointers.append(BASE ^ xorEncodedPointer)
return decoded_pointers
if __name__ == "__main__":
main()
xor_decode_pointers.py
And stored the calculated addresses in the file decoded_pointers.txt.
$ python xor_decode_pointers.py xor_encoded_pointers_table.txt > decoded_pointers.txt
$ head decoded_pointers.txt
0x43c0bbf5
0x43cffbf5
0x76fe4913
0x43c1fbf5
0x43c1fbf5
0x43c1fbf5
0x43c1fbf5
0x76ffce2e
0x76fe1410
0x76fe1282
After loading decoded_pointers.txt in remoteLookup.exe the tool takes a couple of seconds to look up all addresses. The tool then opens the output file it produces (<input filename>_results<input fileextension>). Voila, all translated API calls.
$ head decoded_pointers_results.txt
ResolveExport(0x76ffce2e) = 76FFCE2E , SetEndOfFile , 1107 , kernel32.dll
ResolveExport(0x76fe1410) = 76FE1410 , CloseHandle , 84 , kernel32.dll
ResolveExport(0x76fe1282) = 76FE1282 , WriteFile , 1318 , kernel32.dll
ResolveExport(0x76fe3ed3) = 76FE3ED3 , ReadFile , 960 , kernel32.dll
ResolveExport(0x76fe3f5c) = 76FE3F5C , CreateFileW , 145 , kernel32.dll
ResolveExport(0x76fe1909) = 76FE1909 , CreateFileMappingW , 142 , kernel32.dll
ResolveExport(0x76fe5aa6) = 76FE5AA6 , GetLocalTime , 515 , kernel32.dll
ResolveExport(0x76ffc807) = 76FFC807 , SetFilePointerEx , 1126 , kernel32.dll
ResolveExport(0x75485708) = 75485708 , SHGetFolderPathW , 373 , shell32.dll
ResolveExport(0x76ffd4f7) = 76FFD4F7 , SetFileAttributesW , 1120 , kernel32.dll
The following Python script is an example of how you can rename the respective addresses to the looked-up API names. I’ll leave it as an exercise for you to turn this into an IDAPython script to annotate your IDB file.
from __future__ import print_function
import sys
def main():
if len(sys.argv) != 3:
print("Usage: %s " % sys.argv[0])
print("Example: $ python %s xor_encoded_pointers_table.txt decoded_pointers_results.txt" % sys.arv[0])
return -1
xorEncodedPointersFilename = sys.argv[1]
lookupResultsFilename = sys.argv[2]
addresses = get_addresses(xorEncodedPointersFilename)
results = get_lookup_results(lookupResultsFilename)
for addr in addresses:
if addr in results:
print("0x%x => %s from %s" % (addresses[addr], results[addr][0], results[addr][2]))
def get_addresses(xorEncodedPointersFilename):
BASE = 0x43C1FBF5
addresses = {}
with open(xorEncodedPointersFilename, "rb") as f:
for line in f.read().split("\r\n"):
parts = line.split(" ")
if len(parts) < 2:
continue
try:
address = int(parts[0], 0x10)
xorEncodedPointer = int(parts[1], 0x10)
except:
continue
addresses[BASE ^ xorEncodedPointer] = address
return addresses
def get_lookup_results(lookupResultsFilename):
results = {}
with open(lookupResultsFilename, "rb") as f:
for line in f.read().split("\r\n"):
parts = line.split(" = ")
if len(parts) < 2:
continue
lookupResults = parts[1].split(" , ")
if len(lookupResults) != 4:
continue
addr, name, ordinal, dll = lookupResults
try:
addr = int(addr, 0x10)
results[addr] = (name, ordinal, dll)
except ValueError:
# "Could not parse %s" % addr
continue
return results
if __name__ == "__main__":
main()
translate_lookup_results.py
$ python translate_lookup_results.py xor_encoded_pointers_table.txt decoded_pointers_results.txt | head 0xf243c => TerminateProcess from kernel32.dll 0xf22e8 => InternetReadFile from wininet.dll 0xf21f8 => SetFilePointerEx from kernel32.dll 0xf2468 => WSAGetLastError from ws2_32.dll 0xf21e0 => CloseHandle from kernel32.dll 0xf246c => GetCurrentDirectoryW from kernel32.dll 0xf23c4 => EnumDependentServicesA from advapi32.dll 0xf2414 => WaitForMultipleObjects from kernel32.dll 0xf22c4 => GetProcAddress from kernel32.dll 0xf2408 => WNetOpenEnumW from mpr.dll
Conclusion
This tutorial showed how remote_lookup can speed up the reverse engineering process of malware that obfuscates its access to Windows API functions. I hope you find my explanations and the provided scripts helpful. Please let me know if you have any feedback or questions.