Tutorial: How to use remote_lookup to resolve remote symbols

Introduction

Reverse engineering and tool development expert David Zimmer (dzzie, http://sandsprite.com) recently released remote_lookup, “a small tool which can scan a 32bit process and build an export name/address map which can be queried”. The tool is available on FireEye’s GitHub page at https://github.com/fireeye/remote_lookup.

remote_lookup can significantly speed up the analysis of malware that obfuscates its access to Windows API functions. David covers common malware techniques for such obfuscations in the blog post available at https://www.fireeye.com/blog/threat-research/2017/06/remote-symbol-resolution.html. The post also talks about the motivation of remote_lookup, common analysis techniques (and their shortcomings), the tool’s functionalities and how it can accelerate reverse engineering of obfuscated malware. I highly recommend reading his post – especially if you are going to continue reading this tutorial.

This tutorial will show you step-by-step how you can use remote_lookup based on the analysis of a real-world malware sample (MD5 hash C6A2FB56239614924E2AB3341B1FBBA5).

If you want to follow along, these are the involved files:

This post will include some scripts that you can use to integrate your debugger (I’m using OllyDbg as an example here) and remote_lookup. I hope this will make it easier for you to leverage this cool project by the FLARE team.

Setup

I performed my analysis on a 64-bit Windows 7 system running in VMWare workstation.

The project’s GitHub repository, https://github.com/fireeye/remote_lookup, comes with precompiled binaries in the bin directory. The directory contains the main binary remoteLookup.exe and the three dependencies MSWINSCK.OCX, procLib.dll, and sppe.dll. If you run remoteLookup.exe with Administrator privileges the tool will register the dependencies on the first run.

remoteLookup.exe first start
remoteLookup.exe first start
remoteLookup.exe auto setup of dependencies
remoteLookup.exe auto setup of dependencies

Usage

remoteLookup.exe provides a straight-forward user-interface. The first step when using the tool is to select the process we are interested in analyzing in the top-left corner.

Select PID
Select PID

The resulting dialog allows us to manually find the process we are interested in.

Choose Process Dialog
Choose Process Dialog

Additionally, we can search for processes in the search bar at the bottom left.

Search
Search

After selecting a process, remoteLookup.exe shows the selected process’ loaded modules, their base addresses, and the number of exports per module.

Analysis

The malware with MD5 hash C6A2FB56239614924E2AB3341B1FBBA5 is a dropper that contains a ransomware component encrypted in its resource section PIC\104.  The dropper decrypts the resource, writes it to a file in the %APPDATA% directory, and executes the file. The ransomware component (MD5 hash A0A7022CAA8BD8761D6722FE3172C0AF) obfuscates its Windows API calls as David describes in his post. To recap his findings: as part of its initialization, the malware contains a function that resolves Windows API calls based on hash values. The graph overview of the function that performs the hash lookups makes for a beautiful stairway to (malware) heaven.

Resolve API functions based on hash-lookup
Resolve API functions based on hash-lookup

The malware stores the resolved addresses XOR encoded in its .data section. Just before an API is about to be called, the malware restores the resolved address at run-time. Further details of this will be discussed below.

Via static analysis we can see that the ransomware only imports two functions from kernel32.dll: Sleep and CreateEventW. When we pause the malware at its entry point remote_lookup shows the following output.

remote_lookup first output
remote_lookup first output

We can see kernel32.dll loaded from the SysWoW64 directory because we are running a 32-bit application on a 64-bit operating system. ntdll.dll and KERNELBASE.dll are dependencies of kernel32.dll. The number of exports indicates that remote_lookup resolves all modules’ exports.

The malware’s first function at offset 0x0E8EA0 initializes many of the malware’s components. This includes the hash-based API lookups discussed earlier. After stepping over the initialization function we use remote_lookup again to select the malware process.

OllyDbg after stepping over the first function
OllyDbg after stepping over the first function

This time we get a different output indicating that many more modules are loaded into the process’ address space now. Note that not all modules necessarily are directly imported, but can be dependencies of other loaded modules.

remote_lookup updated output
remote_lookup updated output

In the tool’s title bar, we see that 26 additional modules have been loaded providing potential access to a total of 11,560 functions.

The tool now allows us to query for function names or specific addresses. For example, if we want to know at what address WriteFile from kernel32.dll is available in the current process memory, we can type the function name in the query field.

Finding the address of WriteFile
Finding the address of WriteFile

More useful for our analysis will be to resolve addresses to their respective function names. Let’s look at how the malware restores the resolved API address just before calling it. The following disassembly stems from the second function call (0xEDE00) in the malware.

API call obfuscation scheme
API call obfuscation scheme

Offset 0x000F2390 stores the resolved Windows API offset (0x43C1FBF5) XORed with the hard-coded value 0x43C1FBF5 (stored at offset 0x000F10A4). The malware performs the XOR operation directly before performing the call at offset 0x000EDE14. The ransomware uses this obfuscation pattern for all its Windows API calls.

Performing the XOR operation on the two values 0x43C1FBF5 and 0x353FB1D8 results in the value 0x76FE4A2D. Searching for this value in remoteLookup.exe tells us that this address translates to the function call HeapCreate exported by kernel32.dll.

Finding the symbol for address 0x76FE4A2D
Finding the symbol for address 0x76FE4A2D

We can now annotate our IDB accordingly. I recommend the ApplyCalleeType plugin for IDA Pro, available at https://www.fireeye.com/blog/threat-research/2015/04/flare_ida_pro_script.html (see my recommended tools at http://moritzraabe.de/2017/02/28/malware-analysis-tools/).

Annotated function HeapCreate
Annotated function HeapCreate

Here it is easy to see that this function reserves 0x400000 bytes of heap space.

Performing this procedure for every function call obviously is not practical. Because of this remoteLookup.exe supports bulk analysis. David shows examples of this in his blog post using IDA Jscript. Here I am going to show similar steps in case you don’t have IDA Jscript set up. All you need is a debugger.

remoteLookup.exe supports four import formats for bulk analysis:

  • Hexadecimal memory address,
  • case insensitive API name,
  • DLL name and ordinal export (e.g. ws2_32@13), and
  • DLL name and function name (e.g. ntdll!atoi or msvcrt.atoi)

Since we are interested in the translation of memory address to function name I chose to use hexadecimal memory addresses. So, the first order of business is to create a file that we can feed to remoteLookup.exe.

In this example the table of XOR encoded function pointers starts at offset 0x000F21DC and ends at offset 0x000F248C.

I changed OllyDbg’s view to Long – Address and copied the values to a text file.

OllyDbg Address view
OllyDbg Address view

The resulting text file starts like this.

$ head xor_encoded_pointers_table.txt
000F21C0  00014000
000F21C4  000E0000  A0A7022C.000E0000
000F21C8  353FB2E6
000F21CC  00000000
000F21D0  00000000
000F21D4  00000000
000F21D8  00000000
000F21DC  353E35DB
000F21E0  353FEFE5
000F21E4  353FE977

I used the following script to parse the text file and calculate the actual resolved API addresses.

from __future__ import print_function

import sys


def main():
    """ Prints the XOR decoded addresses read from the input file. """
    if len(sys.argv) != 2:
        print("Usage: %s " % sys.argv[0])
        """
        Example file content:
        # Address Value
        000F21DC  353E35DB
        """
        print("Example: $ python %s xor_encoded_pointers_table.txt > decoded_pointers.txt" % sys.arv[0])
        return -1

    decoded_pointers = get_decoded_pointers(sys.argv[1])
    for decoded_pointer in decoded_pointers:
        print("0x%x" % decoded_pointer)


def get_decoded_pointers(filename):
    BASE = 0x43C1FBF5
    decoded_pointers = []
    with open(filename, "rb") as f:
        for line in f.read().split("\r\n"):
            parts = line.split("  ")
            if len(parts) < 2:
                continue
            value = parts[1]
            try:
                xorEncodedPointer = int(value, 0x10)
            except ValueError:
                # "Could not parse %s" % value
                continue
            decoded_pointers.append(BASE ^ xorEncodedPointer)
    return decoded_pointers


if __name__ == "__main__":
    main()
xor_decode_pointers.py

And stored the calculated addresses in the file decoded_pointers.txt.

$ python xor_decode_pointers.py xor_encoded_pointers_table.txt > decoded_pointers.txt
$ head decoded_pointers.txt
0x43c0bbf5
0x43cffbf5
0x76fe4913
0x43c1fbf5
0x43c1fbf5
0x43c1fbf5
0x43c1fbf5
0x76ffce2e
0x76fe1410
0x76fe1282

After loading decoded_pointers.txt in remoteLookup.exe the tool takes a couple of seconds to look up all addresses. The tool then opens the output file it produces (<input filename>_results<input fileextension>). Voila, all translated API calls.

$ head decoded_pointers_results.txt
ResolveExport(0x76ffce2e) = 76FFCE2E , SetEndOfFile , 1107 , kernel32.dll
ResolveExport(0x76fe1410) = 76FE1410 , CloseHandle , 84 , kernel32.dll
ResolveExport(0x76fe1282) = 76FE1282 , WriteFile , 1318 , kernel32.dll
ResolveExport(0x76fe3ed3) = 76FE3ED3 , ReadFile , 960 , kernel32.dll
ResolveExport(0x76fe3f5c) = 76FE3F5C , CreateFileW , 145 , kernel32.dll
ResolveExport(0x76fe1909) = 76FE1909 , CreateFileMappingW , 142 , kernel32.dll
ResolveExport(0x76fe5aa6) = 76FE5AA6 , GetLocalTime , 515 , kernel32.dll
ResolveExport(0x76ffc807) = 76FFC807 , SetFilePointerEx , 1126 , kernel32.dll
ResolveExport(0x75485708) = 75485708 , SHGetFolderPathW , 373 , shell32.dll
ResolveExport(0x76ffd4f7) = 76FFD4F7 , SetFileAttributesW , 1120 , kernel32.dll

The following Python script is an example of how you can rename the respective addresses to the looked-up API names. I’ll leave it as an exercise for you to turn this into an IDAPython script to annotate your IDB file.

from __future__ import print_function

import sys


def main():
    if len(sys.argv) != 3:
        print("Usage: %s  " % sys.argv[0])
        print("Example: $ python %s xor_encoded_pointers_table.txt decoded_pointers_results.txt" % sys.arv[0])
        return -1

    xorEncodedPointersFilename = sys.argv[1]
    lookupResultsFilename = sys.argv[2]

    addresses = get_addresses(xorEncodedPointersFilename)
    results = get_lookup_results(lookupResultsFilename)
            
    for addr in addresses:
        if addr in results:
            print("0x%x => %s from %s" % (addresses[addr], results[addr][0], results[addr][2]))


def get_addresses(xorEncodedPointersFilename):
    BASE = 0x43C1FBF5
    addresses = {}
    with open(xorEncodedPointersFilename, "rb") as f:
        for line in f.read().split("\r\n"):
            parts = line.split("  ")
            if len(parts) < 2:
                continue
            try:
                address = int(parts[0], 0x10)
                xorEncodedPointer = int(parts[1], 0x10)
            except:
                continue
            addresses[BASE ^ xorEncodedPointer] = address
    return addresses


def get_lookup_results(lookupResultsFilename):
    results = {}
    with open(lookupResultsFilename, "rb") as f:
        for line in f.read().split("\r\n"):
            parts = line.split(" = ")
            if len(parts) < 2:
                continue
            lookupResults = parts[1].split(" , ")
            if len(lookupResults) != 4:
                continue
            addr, name, ordinal, dll = lookupResults
            try:
                addr = int(addr, 0x10)
                results[addr] = (name, ordinal, dll)
            except ValueError:
                # "Could not parse %s" % addr
                continue
    return results


if __name__ == "__main__":
    main()
translate_lookup_results.py

$ python translate_lookup_results.py xor_encoded_pointers_table.txt decoded_pointers_results.txt | head
0xf243c => TerminateProcess from kernel32.dll

0xf22e8 => InternetReadFile from wininet.dll
0xf21f8 => SetFilePointerEx from kernel32.dll
0xf2468 => WSAGetLastError from ws2_32.dll
0xf21e0 => CloseHandle from kernel32.dll
0xf246c => GetCurrentDirectoryW from kernel32.dll
0xf23c4 => EnumDependentServicesA from advapi32.dll
0xf2414 => WaitForMultipleObjects from kernel32.dll
0xf22c4 => GetProcAddress from kernel32.dll
0xf2408 => WNetOpenEnumW from mpr.dll

Conclusion

This tutorial showed how remote_lookup can speed up the reverse engineering process of malware that obfuscates its access to Windows API functions. I hope you find my explanations and the provided scripts helpful. Please let me know if you have any feedback or questions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.