In this blog post I am going to discuss how you can interact with basic blocks in IDAPython. Before we jump into the technical details, I want to provide some context and show why I became interested in exploring this feature of IDA Pro.

Background and Motivation

The other day I reverse engineered a backdoor that was heavily armored with two classic anti-disassembly techniques. The first technique substitutes jmp instructions with sequences of push and retn instructions. Figure 1 shows how this hinders the program’s control flow analysis. First, IDA Pro interprets the retn instruction to mark a function’s end. Second, IDA Pro is not able to identify the target addresses as code and hence does not disassemble them.

.text:00564A90 sub_564A90      proc near               ; CODE XREF: ___tmainCRTStartup+10Dp
.text:00564A90                 push    564C87h
.text:00564A95                 retn
.text:00564A95 sub_564A90      endp
Figure 1

Figure 2 shows the same disassembly after I changed the push operand type to offset.

.text:00564A90 sub_564A90      proc near               ; CODE XREF: ___tmainCRTStartup+10Dp
.text:00564A90                 push    (offset loc_564C83+4)
.text:00564A95                 retn
.text:00564A95 sub_564A90      endp
Figure 2

The region surrounding the push/retn target is shown in Figure 3. You can see that a different code cross reference led IDA Pro to identify code starting at address 0x564C6D. With this analysis the actual jump target identified above falls right within a disassembled instruction.

.text:00564C6D loc_564C6D:                             ; CODE XREF: .text:00564C45j
.text:00564C6D                 add     [eax+105EB780h], bh
.text:00564C73                 mov     eax, 100CDFF0h
.text:00564C78                 add     [ebx+2Ch], ah
.text:00564C7B                 xor     [edx+ebp*4], bl
.text:00564C7E                 cmp     eax, 0C1329E44h
.text:00564C83
.text:00564C83 loc_564C83:                             ; DATA XREF: sub_564A90o
.text:00564C83                 mov     ds:551EC102h, eax
.text:00564C88                 jz      loc_564BEC
.text:00564C8E                 jnz     loc_564BEC
.text:00564C94                 movzx   ebx, byte ptr [esi+0ACh]
.text:00564C9B                 call    dword ptr ds:4501F8h
Figure 3

Scanning the instructions and their operands quickly suggests that IDA Pro’s current interpretation of this code segment is different to what a CPU actually executes. After manually undefining the region and marking the area starting at offset 0x564C87 as code, we get the disassembly shown in Figure 4.

.text:00564C87 loc_564C87:                             ; DATA XREF: sub_564A90o
.text:00564C87                 push    ebp
.text:00564C88                 jz      loc_564BEC
.text:00564C8E                 jnz     loc_564BEC
.text:00564C94                 movzx   ebx, byte ptr [esi+0ACh]
.text:00564C9B                 call    dword ptr ds:4501F8h
.text:00564CA1                 mov     eax, 0F6F7D8h
.text:00564CA6                 push    4F8784h
Figure 4

Starting at address 0x564C88 we encounter the second anti-disassembly technique used in this malware. This method exploits IDA Pro’s flow-oriented disassembly process. The two instructions jz and jnz point to the same target address. Program execution will continue at this address whether the Zero Flag (ZF) is set or cleared. This combination turns the two conditional jumps into an unconditional jump. However, IDA Pro first disassembles the instruction following the jnz instruction at 0x564C8E. This junk code is never executed. Instead, the program proceeds execution at address 0x564BEC where already the next push/retn jump waits for the eager analyst.
Statically analyzing an entire backdoor interspersed with countless anti-disassembly instruction sequences is nothing any malware analyst is excited about. Instead, I wrote an IDAPython script that replaces both push/retn and jz/jnz sequences with jumps to the respective target addresses. Comparing the navigation band before and after applying the fixes visualizes how the script improved IDA Pro’s overall analysis of the program. The number of identified functions almost doubled from 741 to 1403 functions.

The resulting disassembly was in an almost analyzable state now. IDA Pro had no issues recognizing and analyzing nearly all of the functions. However, most of them still contained numerous blocks of junk instructions. Similar to the disassembly presented above, Figure 6 lists falsely identified instructions which are never executed starting at address 0x664BBC6 and ending at address 0x664BFA. This junk block implicitly transfers control over to a function’s legitimate basic block starting at address 0x664BFA and is therefore also considered part of the function.

.text:00664BC6                 mov     eax, 8D004265h
.text:00664BCB                 mov     dword ptr [esi], 11470D78h
.text:00664BD1                 push    114C4FECh
.text:00664BD6                 mov     dword ptr [esi], 11472CF8h
.text:00664BDC                 mov     ds:111F9D24h, eax
.text:00664BE1                 mov     ds:41944Ch, eax
.text:00664BE6                 mov     [ebp+var_84], ecx
.text:00664BEC                 mov     byte ptr [ebp+var_4], 33h ; '3'
.text:00664BF0                 mov     ebp, 9FB61879h
.text:00664BF5                 pop     es
.text:00664BF6                 adc     al, 0Eh
.text:00664BF8                 out     dx, al
.text:00664BF9                 pop     eax
.text:00664BFA
.text:00664BFA loc_664BFA:                             ; CODE XREF: sub_662AC0+939 j
.text:00664BFA                 mov     eax, [ebp+var_10]
.text:00664BFD                 mov     ecx, [eax+14h]
.text:00664C00                 cmp     ecx, dword ptr [ebp+var_C]
.text:00664C03                 jnb     loc_663945
Figure 6

Figure 7 shows this memory region displayed in graph view mode. There are three basic blocks with junk instructions and no cross-reference to them in this area alone. Throughout the binary’s analysis there are hundreds of these junk code blocks that complicate analysis.

Figure 8 displays the nastiness of the entire function’s graph representation.

So, before performing my final analysis of the backdoor, I removed all superfluous basic blocks with an IDAPython script.

IDA Pro basic blocks and IDAPython

Below, I am going to share some insights I gained while programmatically interacting with basic blocks in IDA Pro.

Obtaining and iterating basic blocks

To get access to a function’s basic blocks you need to get a function object first. The function object contains a FlowChart object which contains the basic blocks (Figure 9).

function = idaapi.get_func(fva)
flowchart = idaapi.FlowChart(function)
print("Function 0x%x consists of %d basic blocks" % (fva, flowchart.size))
Figure 9

Iterating the FlowChart object gives you all basic blocks as BasicBlock objects. Each basic block has the following attributes:

id
type
start address (startEA)
end address (endEA)

id is an internal index for the basic block.

The type provides information about the block type, see flow chart block types (enum fc_block_type_t). In standard x86 and AMD64 binaries you will predominantly see normal blocks (type 0) and return blocks (type 2).

Note that a basic block’s end address points to the address immediately following the block’s last byte. A more complicated way to say this is that the block’s end address is the first head following the block’s last instruction. In the example below (Figure 10) the top basic block’s start address is 0x663370 and the end address is 0x663F12. Use idc.PrevHead(basicblock.endEA) to get the address where a basic block’s last instruction starts (0x663373 in the example).

Getting references between basic blocks

To get the flow relation between basic blocks, use a BasicBlock object’s preds() and succs() methods. During my tests with IDA 6.95.160808 (32-bit and 64-bit) preds() did not return any values. However, getting a basic block’s successors is sufficient to recover a function’s code flow.

print_basic_blocks.py is a simple script that iterates all basic blocks, prints their attributes, and shows succeeding basic blocks.

from __future__ import print_function

import idaapi
import idautils
import idc


def main():
    for fva in idautils.Functions():
        print_all_bbs(fva)


def print_all_bbs(fva):
    function = idaapi.get_func(fva)
    flowchart = idaapi.FlowChart(function)
    print("Function starting at 0x%x consists of %d basic blocks" % (function.startEA, flowchart.size))
    for bb in flowchart:
        print(format_bb(bb))
        for succ in bb.succs():
            print("  -> %s" % format_bb(succ))


def format_bb(bb):
    bbtype = {0: "fcb_normal", 1: "fcb_indjump", 2: "fcb_ret", 3: "fcb_cndret",
              4: "fcb_noret", 5: "fcb_enoret", 6: "fcb_extern", 7: "fcb_error"}
    return("ID: %d, Start: 0x%x, End: 0x%x, Last instruction: 0x%x, Size: %d, "
           "Type: %s" % (bb.id, bb.startEA, bb.endEA, idc.PrevHead(bb.endEA),
                         (bb.endEA - bb.startEA), bbtype[bb.type]))


if __name__ == "__main__":
    main()
print_basic_blocks.py


Function starting at 0x5756c0 consists of 3 basic blocks
ID: 0, Start: 0x5756c0, End: 0x5756c2, Last instruction: 0x5756c0, Size: 2, Type: fcb_normal
  -> ID: 1, Start: 0x575714, End: 0x57571f, Last instruction: 0x57571d, Size: 11, Type: fcb_normal
ID: 1, Start: 0x575714, End: 0x57571f, Last instruction: 0x57571d, Size: 11, Type: fcb_normal
  -> ID: 2, Start: 0x575775, End: 0x575776, Last instruction: 0x575775, Size: 1, Type: fcb_ret
ID: 2, Start: 0x575775, End: 0x575776, Last instruction: 0x575775, Size: 1, Type: fcb_ret
print_basic_blocks.py example output

Removing superfluous basic blocks

At this point I was able to perform my final cleanup of the backdoor’s disassembly. remove_junk_basic_blocks.py shows how I identified and removed all junk blocks. A block is considered to be junk if there are no predecessor blocks and the block does not contain the function’s entry point. Note, in this case I did not care to specifically keep Structured Exception Handling (SEH) handlers which IDA Pro also considers to be basic blocks of a function.

from __future__ import print_function

import idaapi
import idautils
import idc


def main():
    for fva in idautils.Functions():
        f = idaapi.get_func(fva)
        junk_bbs = get_junk_bbs(f)
        while make_bbs_unkn(junk_bbs) != 0:
            junk_bbs = get_junk_bbs(f)


def get_junk_bbs(function):
    """ Return list of basic block address tuples that do not have any predecessor blocks for given
    function. """
    all_bbs = set([])
    referenced_bbs = set([])
    flowchart = idaapi.FlowChart(function)
    for bb in flowchart:
        # get start and end for all basic blocks in function
        all_bbs.add((bb.startEA, bb.endEA))
        # get start and end for function start basic block
        if bb.startEA == function.startEA:
            referenced_bbs.add((bb.startEA, bb.endEA))
        # get start and end for all basic blocks that have a predecessor
        for succ in bb.succs():
            referenced_bbs.add((succ.startEA, succ.endEA))
    return all_bbs - referenced_bbs


def make_bbs_unkn(bbs):
    """ Return number of removed basic blocks. """
    for bb in bbs:
        size = bb[1] - bb[0]
        print("removing basic block of size %d starting at 0x%x and ending at 0x%x" % (size, bb[0],
                                                                                       bb[1]))
        idc.MakeUnknown(bb[0], size, idc.DOUNK_SIMPLE)
    return len(bbs)


if __name__ == "__main__":
    main()

remove_junk_basic_blocks.py

Later on I implemented the two traditional depth- and breadth-first search algorithms for BasicBlocks. They can also be used to find all basic blocks reachable from a function’s entry point. With some modifications they can enable you to explore all possible code paths leading to an instruction.

from __future__ import print_function

import idaapi
import idautils
import idc

from collections import deque


def main():
    for fva in idautils.Functions():
        print_basic_blocks_dfs_bfs(fva)


def print_basic_blocks_dfs_bfs(fva):
    function = idaapi.get_func(fva)
    flowchart = idaapi.FlowChart(function)
    print("Function starting at 0x%x consists of %d basic blocks" % (function.startEA, flowchart.size))

    start_bb = get_start_bb(function, flowchart)
    print("Traversing flowchart using recursive depth-first search")
    found_bbs = []
    while len(found_bbs) != flowchart.size:
        found_bbs = dfs_bbs(start_bb, found_bbs)
    for bb in found_bbs:
        print("%s" % format_bb(flowchart[bb]))

    print("Traversing flowcahart using breadth-first search")
    for bb in bfs_bbs(start_bb):
        print("%s" % format_bb(flowchart[bb]))


def get_start_bb(function, flowchart):
    for bb in flowchart:
        if bb.startEA == function.startEA:
            return bb


def dfs_bbs(start_bb, found_bbs):
    found_bbs.append(start_bb.id)
    for w in start_bb.succs():
        if w.id not in found_bbs:
            dfs_bbs(w, found_bbs)
    return found_bbs


def bfs_bbs(start_bb):
    q = deque([start_bb])
    found_bbs = []
    while len(q) > 0:
        current_bb = q.popleft()
        if current_bb.id not in found_bbs:
            found_bbs.append(current_bb.id)
            for w in current_bb.succs():
                q.append(w)
    return found_bbs


def format_bb(bb):
    bbtype = {0: "fcb_normal", 1: "fcb_indjump", 2: "fcb_ret", 3: "fcb_cndret",
              4: "fcb_noret", 5: "fcb_enoret", 6: "fcb_extern", 7: "fcb_error"}
    return("ID: %d, Start: 0x%x, End: 0x%x, Last instruction: 0x%x, Size: %d, "
           "Type: %s" % (bb.id, bb.startEA, bb.endEA, idc.PrevHead(bb.endEA),
                         (bb.endEA - bb.startEA), bbtype[bb.type]))


if __name__ == "__main__":
    main()

remove_junk_basic_blocks.py

I hope you enjoyed this post. Please let me know if you have any questions or concerns. Either in the comment section below or via Contact.

IDA Pro Anti-Disassembly, Basic Blocks, and IDAPython

Background and Motivation

IDA Pro basic blocks and IDAPython

Obtaining and iterating basic blocks

Getting references between basic blocks

Removing superfluous basic blocks

Recommended links

2 Replies to “IDA Pro Anti-Disassembly, Basic Blocks, and IDAPython”

Leave a Reply Cancel reply