In this blog post I am going to discuss how you can interact with basic blocks in IDAPython. Before we jump into the technical details, I want to provide some context and show why I became interested in exploring this feature of IDA Pro.
Background and Motivation
The other day I reverse engineered a backdoor that was heavily armored with two classic anti-disassembly techniques. The first technique substitutes jmp
instructions with sequences of push
and retn
instructions. Figure 1 shows how this hinders the program’s control flow analysis. First, IDA Pro interprets the retn
instruction to mark a function’s end. Second, IDA Pro is not able to identify the target addresses as code and hence does not disassemble them.
.text:00564A90 sub_564A90 proc near ; CODE XREF: ___tmainCRTStartup+10Dp
.text:00564A90 push 564C87h
.text:00564A95 retn
.text:00564A95 sub_564A90 endp
Figure 1
Figure 2 shows the same disassembly after I changed the push operand type to offset.
.text:00564A90 sub_564A90 proc near ; CODE XREF: ___tmainCRTStartup+10Dp
.text:00564A90 push (offset loc_564C83+4)
.text:00564A95 retn
.text:00564A95 sub_564A90 endp
Figure 2
The region surrounding the push
/retn
target is shown in Figure 3. You can see that a different code cross reference led IDA Pro to identify code starting at address 0x564C6D
. With this analysis the actual jump target identified above falls right within a disassembled instruction.
.text:00564C6D loc_564C6D: ; CODE XREF: .text:00564C45j
.text:00564C6D add [eax+105EB780h], bh
.text:00564C73 mov eax, 100CDFF0h
.text:00564C78 add [ebx+2Ch], ah
.text:00564C7B xor [edx+ebp*4], bl
.text:00564C7E cmp eax, 0C1329E44h
.text:00564C83
.text:00564C83 loc_564C83: ; DATA XREF: sub_564A90o
.text:00564C83 mov ds:551EC102h, eax
.text:00564C88 jz loc_564BEC
.text:00564C8E jnz loc_564BEC
.text:00564C94 movzx ebx, byte ptr [esi+0ACh]
.text:00564C9B call dword ptr ds:4501F8h
Figure 3
Scanning the instructions and their operands quickly suggests that IDA Pro’s current interpretation of this code segment is different to what a CPU actually executes. After manually undefining the region and marking the area starting at offset 0x564C87
as code, we get the disassembly shown in Figure 4.
.text:00564C87 loc_564C87: ; DATA XREF: sub_564A90o
.text:00564C87 push ebp
.text:00564C88 jz loc_564BEC
.text:00564C8E jnz loc_564BEC
.text:00564C94 movzx ebx, byte ptr [esi+0ACh]
.text:00564C9B call dword ptr ds:4501F8h
.text:00564CA1 mov eax, 0F6F7D8h
.text:00564CA6 push 4F8784h
Figure 4
Starting at address 0x564C88 we encounter the second anti-disassembly technique used in this malware. This method exploits IDA Pro’s flow-oriented disassembly process. The two instructions jz
and jnz
point to the same target address. Program execution will continue at this address whether the Zero Flag (ZF
) is set or cleared. This combination turns the two conditional jumps into an unconditional jump. However, IDA Pro first disassembles the instruction following the jnz
instruction at 0x564C8E
. This junk code is never executed. Instead, the program proceeds execution at address 0x564BEC
where already the next push
/retn
jump waits for the eager analyst.
Statically analyzing an entire backdoor interspersed with countless anti-disassembly instruction sequences is nothing any malware analyst is excited about. Instead, I wrote an IDAPython script that replaces both push
/retn
and jz
/jnz
sequences with jumps to the respective target addresses. Comparing the navigation band before and after applying the fixes visualizes how the script improved IDA Pro’s overall analysis of the program. The number of identified functions almost doubled from 741 to 1403 functions.
The resulting disassembly was in an almost analyzable state now. IDA Pro had no issues recognizing and analyzing nearly all of the functions. However, most of them still contained numerous blocks of junk instructions. Similar to the disassembly presented above, Figure 6 lists falsely identified instructions which are never executed starting at address 0x664BBC6
and ending at address 0x664BFA
. This junk block implicitly transfers control over to a function’s legitimate basic block starting at address 0x664BFA
and is therefore also considered part of the function.
.text:00664BC6 mov eax, 8D004265h
.text:00664BCB mov dword ptr [esi], 11470D78h
.text:00664BD1 push 114C4FECh
.text:00664BD6 mov dword ptr [esi], 11472CF8h
.text:00664BDC mov ds:111F9D24h, eax
.text:00664BE1 mov ds:41944Ch, eax
.text:00664BE6 mov [ebp+var_84], ecx
.text:00664BEC mov byte ptr [ebp+var_4], 33h ; '3'
.text:00664BF0 mov ebp, 9FB61879h
.text:00664BF5 pop es
.text:00664BF6 adc al, 0Eh
.text:00664BF8 out dx, al
.text:00664BF9 pop eax
.text:00664BFA
.text:00664BFA loc_664BFA: ; CODE XREF: sub_662AC0+939 j
.text:00664BFA mov eax, [ebp+var_10]
.text:00664BFD mov ecx, [eax+14h]
.text:00664C00 cmp ecx, dword ptr [ebp+var_C]
.text:00664C03 jnb loc_663945
Figure 6
Figure 7 shows this memory region displayed in graph view mode. There are three basic blocks with junk instructions and no cross-reference to them in this area alone. Throughout the binary’s analysis there are hundreds of these junk code blocks that complicate analysis.
Figure 8 displays the nastiness of the entire function’s graph representation.
So, before performing my final analysis of the backdoor, I removed all superfluous basic blocks with an IDAPython script.
IDA Pro basic blocks and IDAPython
Below, I am going to share some insights I gained while programmatically interacting with basic blocks in IDA Pro.
Obtaining and iterating basic blocks
To get access to a function’s basic blocks you need to get a function object first. The function object contains a FlowChart object which contains the basic blocks (Figure 9).
function = idaapi.get_func(fva)
flowchart = idaapi.FlowChart(function)
print("Function 0x%x consists of %d basic blocks" % (fva, flowchart.size))
Figure 9
Iterating the FlowChart
object gives you all basic blocks as BasicBlock
objects. Each basic block has the following attributes:
id
type
- start address (
startEA
) - end address (
endEA
)
id
is an internal index for the basic block.
The type
provides information about the block type, see flow chart block types (enum fc_block_type_t). In standard x86 and AMD64 binaries you will predominantly see normal blocks (type 0) and return blocks (type 2).
Note that a basic block’s end address points to the address immediately following the block’s last byte. A more complicated way to say this is that the block’s end address is the first head following the block’s last instruction. In the example below (Figure 10) the top basic block’s start address is 0x663370
and the end address is 0x663F12
. Use idc.PrevHead(basicblock.endEA)
to get the address where a basic block’s last instruction starts (0x663373
in the example).
Getting references between basic blocks
To get the flow relation between basic blocks, use a BasicBlock object’s preds()
and succs()
methods. During my tests with IDA 6.95.160808 (32-bit and 64-bit) preds()
did not return any values. However, getting a basic block’s successors is sufficient to recover a function’s code flow.
print_basic_blocks.py
is a simple script that iterates all basic blocks, prints their attributes, and shows succeeding basic blocks.
from __future__ import print_function import idaapi import idautils import idc def main(): for fva in idautils.Functions(): print_all_bbs(fva) def print_all_bbs(fva): function = idaapi.get_func(fva) flowchart = idaapi.FlowChart(function) print("Function starting at 0x%x consists of %d basic blocks" % (function.startEA, flowchart.size)) for bb in flowchart: print(format_bb(bb)) for succ in bb.succs(): print(" -> %s" % format_bb(succ)) def format_bb(bb): bbtype = {0: "fcb_normal", 1: "fcb_indjump", 2: "fcb_ret", 3: "fcb_cndret", 4: "fcb_noret", 5: "fcb_enoret", 6: "fcb_extern", 7: "fcb_error"} return("ID: %d, Start: 0x%x, End: 0x%x, Last instruction: 0x%x, Size: %d, " "Type: %s" % (bb.id, bb.startEA, bb.endEA, idc.PrevHead(bb.endEA), (bb.endEA - bb.startEA), bbtype[bb.type])) if __name__ == "__main__": main()
print_basic_blocks.pyFunction starting at 0x5756c0 consists of 3 basic blocks ID: 0, Start: 0x5756c0, End: 0x5756c2, Last instruction: 0x5756c0, Size: 2, Type: fcb_normal -> ID: 1, Start: 0x575714, End: 0x57571f, Last instruction: 0x57571d, Size: 11, Type: fcb_normal ID: 1, Start: 0x575714, End: 0x57571f, Last instruction: 0x57571d, Size: 11, Type: fcb_normal -> ID: 2, Start: 0x575775, End: 0x575776, Last instruction: 0x575775, Size: 1, Type: fcb_ret ID: 2, Start: 0x575775, End: 0x575776, Last instruction: 0x575775, Size: 1, Type: fcb_ret
print_basic_blocks.py example output
Removing superfluous basic blocks
At this point I was able to perform my final cleanup of the backdoor’s disassembly. remove_junk_basic_blocks.py
shows how I identified and removed all junk blocks. A block is considered to be junk if there are no predecessor blocks and the block does not contain the function’s entry point. Note, in this case I did not care to specifically keep Structured Exception Handling (SEH) handlers which IDA Pro also considers to be basic blocks of a function.
from __future__ import print_function
import idaapi
import idautils
import idc
def main():
for fva in idautils.Functions():
f = idaapi.get_func(fva)
junk_bbs = get_junk_bbs(f)
while make_bbs_unkn(junk_bbs) != 0:
junk_bbs = get_junk_bbs(f)
def get_junk_bbs(function):
""" Return list of basic block address tuples that do not have any predecessor blocks for given
function. """
all_bbs = set([])
referenced_bbs = set([])
flowchart = idaapi.FlowChart(function)
for bb in flowchart:
# get start and end for all basic blocks in function
all_bbs.add((bb.startEA, bb.endEA))
# get start and end for function start basic block
if bb.startEA == function.startEA:
referenced_bbs.add((bb.startEA, bb.endEA))
# get start and end for all basic blocks that have a predecessor
for succ in bb.succs():
referenced_bbs.add((succ.startEA, succ.endEA))
return all_bbs - referenced_bbs
def make_bbs_unkn(bbs):
""" Return number of removed basic blocks. """
for bb in bbs:
size = bb[1] - bb[0]
print("removing basic block of size %d starting at 0x%x and ending at 0x%x" % (size, bb[0],
bb[1]))
idc.MakeUnknown(bb[0], size, idc.DOUNK_SIMPLE)
return len(bbs)
if __name__ == "__main__":
main()
remove_junk_basic_blocks.py
Later on I implemented the two traditional depth- and breadth-first search algorithms for BasicBlocks
. They can also be used to find all basic blocks reachable from a function’s entry point. With some modifications they can enable you to explore all possible code paths leading to an instruction.
from __future__ import print_function
import idaapi
import idautils
import idc
from collections import deque
def main():
for fva in idautils.Functions():
print_basic_blocks_dfs_bfs(fva)
def print_basic_blocks_dfs_bfs(fva):
function = idaapi.get_func(fva)
flowchart = idaapi.FlowChart(function)
print("Function starting at 0x%x consists of %d basic blocks" % (function.startEA, flowchart.size))
start_bb = get_start_bb(function, flowchart)
print("Traversing flowchart using recursive depth-first search")
found_bbs = []
while len(found_bbs) != flowchart.size:
found_bbs = dfs_bbs(start_bb, found_bbs)
for bb in found_bbs:
print("%s" % format_bb(flowchart[bb]))
print("Traversing flowcahart using breadth-first search")
for bb in bfs_bbs(start_bb):
print("%s" % format_bb(flowchart[bb]))
def get_start_bb(function, flowchart):
for bb in flowchart:
if bb.startEA == function.startEA:
return bb
def dfs_bbs(start_bb, found_bbs):
found_bbs.append(start_bb.id)
for w in start_bb.succs():
if w.id not in found_bbs:
dfs_bbs(w, found_bbs)
return found_bbs
def bfs_bbs(start_bb):
q = deque([start_bb])
found_bbs = []
while len(q) > 0:
current_bb = q.popleft()
if current_bb.id not in found_bbs:
found_bbs.append(current_bb.id)
for w in current_bb.succs():
q.append(w)
return found_bbs
def format_bb(bb):
bbtype = {0: "fcb_normal", 1: "fcb_indjump", 2: "fcb_ret", 3: "fcb_cndret",
4: "fcb_noret", 5: "fcb_enoret", 6: "fcb_extern", 7: "fcb_error"}
return("ID: %d, Start: 0x%x, End: 0x%x, Last instruction: 0x%x, Size: %d, "
"Type: %s" % (bb.id, bb.startEA, bb.endEA, idc.PrevHead(bb.endEA),
(bb.endEA - bb.startEA), bbtype[bb.type]))
if __name__ == "__main__":
main()
remove_junk_basic_blocks.py
I hope you enjoyed this post. Please let me know if you have any questions or concerns. Either in the comment section below or via Contact.
Recommended links
- http://reverseengineering.stackexchange.com/questions/1646/how-to-map-an-arbitrary-address-to-its-corresponding-basic-block-in-ida
- http://reverseengineering.stackexchange.com/questions/8570/ida-basic-block-type-fcb-cndret-what-does-it-mean
- https://gist.github.com/williballenthin/3abc9577bede0aeef25526b201732246 – creates a YARA rule to match the basic blocks of a function
“preds() did not return any values”
flowchart = idaapi.FlowChart(function)
replace with
flowchart = idaapi.FlowChart(function,flags=FC_PREDS)
PROFIT!
Thanks! Your comment helped me a lot!