Motivation and background
When triaging malicious executable files I always try the FireEye Labs Obfuscated String Solver (FLOSS) to quickly decode obfuscated strings. In short, FLOSS uses heuristics to identify decoding routine candidates and emulates them using vivisect’s disassembly and emulation modules.
While vivisect is an awesome tool, it sometimes is not as robust as IDA Pro in parsing and disassembling binaries. In addition, IDA Pro provides the Fast Library Identification and Recognition Technology (FLIRT) that helps to distinguish standard library functions and functions written by the program’s author.
To help with automatically identifying string decoding routines in IDA Pro I have ported some of the heuristics FLOSS uses to IDAPython. You can find the script on my GitHub page at https://github.com/mr-tz/idapython/blob/master/identify_string_decoders.py.
Usage and example output
You simply run the IDAPython script in IDA Pro: File – Script File… (ALT + F7 on Windows). Here is an example output of the script:
n Score Function VA 1 1.16667 0x0040166C 2 0.83333 0x0040261E 3 0.18667 0x00402647 4 0.17333 0x00403BC1 5 0.13000 0x0040229D 6 0.10667 0x0040172F 7 0.09000 0x00402ECB 8 0.06667 0x0040499B 9 0.04000 0x0040185B 10 0.04000 0x00404F63 11 0.03333 0x004031D6 12 0.03333 0x00404662 13 0.02667 0x0040430C 14 0.02000 0x00403393 15 0.02000 0x00403163 16 0.02000 0x004034EA 17 0.01333 0x004026D7 18 0.01333 0x00404491 19 0.01333 0x00402D15 20 0.01333 0x00401698
How it works
The script distinguishes between functions defined by the program’s author and library and thunk functions. To identify potential string decoding routines, heuristics are only run on “user” functions.
The identification of string decoding functions happens in two steps. First, different heuristic are used to identify function candidates. Second, weights are applied for each identified heuristic and function. The individual weights added together result in the final score.
The current heuristics identify functions based on:
- the number of cross-references to a function;
- non-zeroing XOR instructions;
- shift (SHL, SHR, SAL, SAR, ROL, ROR) instructions and
- suspicious MOV instructions in tight loops.
Here is the disassembly of a string decoding function that identify_string_decoders.py correctly identified. Note the tight loop (0x401675 – 0x401694), the suspicious MOV instruction (0x401692), the non-zeroing XOR instruction (0x401681), and the shift instructions (0x401685 and 0x401688). Additionally, this function is called more than 20 times in this binary. This screams string decoder!
Conclusion
Note, that the exact function rankings and scores will likely differ from FLOSS’s results. When debugging and tweaking FLOSS this plugin has been very useful to me, nonetheless. I hope the script will assist you as well. This IDA Pro implementation is also a great fallback option if vivisect fails to generate a workspace or does not analyze a binary correctly.
Who knows, maybe I will further integrate IDA Pro and vivisect to leverage the advantages of both tools. Obviously, FLOSS will continue to be a stand-alone tool, but the combination could provide enhanced analysis results for reverse engineers using IDA Pro.