Recursive Disassembling


Published:
Tags:rev python

The two widespread ways to disassemble binaries are called linear sweep and recursive disassembling. While the first one is particularly easy to use and implement it also has severe drawbacks. Let me demystify the recursive algorithm, show its merits and provide you with a small sample script so you don’t need to skimp on disassembling when whipping up your next static analysis toolchain.

The situation is the same time and time agaion: You are analyzing yet another virtual machine in some CTF competition and you arrive at the point where you want to start writing the disassembler. As time is of the essence corners are cut and you end up with yet another linear sweep disassembler. But who can fault you? Once you finish the instruction decoder the full disassembler is just one list comprehension away.

from struct import unpack_from as upf

def decode(base_addr, code, addr):
    """
    base_addr: The base address of the code
    code:      raw instructionstream as bytearray
    addr:      The address of the instruction to be decoded

    decode returns a triple:
    - True if this instruction terminates the bb, False if not
    - List of successors to this instruction (will contain exactly one element if it is not a terminator)
    - Decoded Instruction
    """
    ins = upf("<HBQBQBQ", code, addr - base_addr)
    if ins[0] == 0x14:                     # JMP
        return (True, [ins[2]], ins)
    if ins[0] in [0x15, 0x16, 0x17, 0x1b]: # JEQ, JNE, JA, JG
        return (True, [addr + 0x1d, ins[2]], ins)
    return (False, [addr + 0x1d], ins)

def disass_linear(vaddr, code):
    num_ins = len(code) // 0x1d
    return [decode(vaddr, code, vaddr + 0x1d * i) for i in range(num_ins)]

This usually does suffice, and works well as long as you don’t need to recover the control flow graph. Once you need it however you usually regret going the quick and easy way and think about how you can shoehorn CFG recovery into your tooling. To save you from regrets the next time around let’s throw together a small recursive disassembler. Not only is CFG recovery part of this algorithm but as added benefit disassembling will not fail due to embedded constant data.

from sortedcontainer import SortedDict

def disass_rec(vaddr, code, entry):
    todo = [entry]      # working set of basic block entry points
    insn = {}
    bbs = SortedDict()  # the recovered basic blocks
    while len(todo):
        cur = todo[0]   # take the first element from the working set
        todo = todo[1:]
        
        bbaddr = cur
        bb = []         # instructions in this basic block
        suc = []        # list of basic block successors

        # if we have not seen this virtual address yet a basic block starts here
        if cur not in insn:
            bbs[cur] = (suc, bb)
        
        # as long as we have unseen virtual addresses
        while cur not in insn:
            if cur < vaddr:
                bb.append(f"SEGFAULT")
                break
            try:
                # decode the instruction
                term, nxt, ins = decode(vaddr, code, cur)
            except:
                term, nxt, ins = True, [], f"INVALID / SEGFAULT"

            # place the instruction into the current basic block
            bb.append((cur, ins))
            # register the virtual address
            insn[cur] = bbaddr

            if term:
                # the instruction terminates the basic block
                suc.extend(nxt)
                todo.extend(nxt)
                break

            # go to the next virtual address
            cur = nxt[0]
            continue
        
        else:
            # we hit a virtual address we have already seen
            # fetch the coresponding basic block address
            bbaddr = insn[cur]
            if cur == bbaddr:
                # we hit the first instruction of the block, so just mark it as successor
                suc.append(cur)
                continue

            # it is not the first instruction, so we want to split the basic block
            # fetch the basic block
            oldbb_suc, oldbb_ins = bbs[bbaddr]
            # gather the instructions with virtual address >= cur
            overlap = [*dropwhile(lambda x: x[0] < cur, oldbb_ins)]

            # get the virtual address of the basic block that we split off
            ovs = overlap[0][0]
            # update the instruction registry such that the instructions in the block
            # that is split off point to its basic block again
            for insaddr, _ in overlap:
                insn[insaddr] = ovs
            # register the split off basic block, steal the old blocks successor
            bbs[ovs] = (oldbb_suc, overlap)

            # remove the instructions we transfered from the old block
            del oldbb_ins[-len(overlap):]
            # update the old basic block
            bbs[bbaddr] = ([ovs], oldbb_ins)

            # make the split off block the successor to this basic block
            suc.append(ovs)

    return bbs

An added benefit of recursive disassembling is that instrucitons don’t need to be aligned to a multiple of their size. There can be holes in the instruction stream faultering linear sweep algorithms that would be off after the hole.