Manipulating Shellcode for Emulation

It is not uncommon, when performing malware analysis, to come across many different types of shellcode usually encrypted and obfuscated. Sooner or later in the analysis journey, you will end up having to manipulate such shellcode and you would then want to transform it into a more analyzable format.

Suppose that during your analysis you found the following shellcode in a highly obfuscated embedded macro and that it is encrypted, byte by byte, using the key 0xde 0xad 0xbe 0xef. What I would normally do is extract the shellcode and try to decrypt it in python, in this case, it is very quick and straightforward to do using the following snippet:

shellcode = [...]
key = [0xde, 0xad, 0xbe, 0xef]

decoded = shellcode

for k in key:
    for s in range(0, len(shellcode)):
        decoded[s] = decoded[s]^k

This will leave us with the decrypted shellcode inside the decoded array. We can take this one step forward and save the array to a file so that it can be used and manipulated in other programs. This can be achieved by expanding the above code snippet with the following lines:

file.open("output.bin", 'wb')
file.write(bytes(decoded))
file.close()

We now can open the saved shellcode in rizin, ida or your favourite tool and begin our analysis. However, the quickest and easiest way of extracting information from the shellcode is by attempting to emulate it. Two projects can be used for this, one is qiling and the other one is speakeasy. Qiling is a topic for another blogpost, while speakeasy is simple and straightforward. It can be used both as a standalone executable as well as a python library, when using the latter we can write our own parser for the report which will allow us to extract some fairly rich and interesting information.

The following is an example script that will decode the shellcode and emulate it returning a brief report as an output.

import speakeasy
shellcode = [159, 132, 188, 241, 218, 255, 224, 251, 86, 6, 214, 121, 17, 235, 147, 123, 19, 73, 54, 33, 73, 54, 161, 201, 222, 102, 73, 13, 50, 37, 182, 242, 195, 85, 62, 23, 242, 135, 88, 31, 99, 91, 42, 49, 72, 208, 126, 162, 97, 217, 200, 248, 169, 46, 120, 114, 200, 1, 70, 235, 236, 0, 58, 50, 33, 166, 3, 249, 116, 231, 68, 143, 63, 8, 24, 91, 119, 196, 137, 44, 9, 24, 239, 226, 5, 32, 151, 71, 213, 212, 43, 69, 10, 95, 235, 101, 186, 40, 128, 189, 59, 253, 148, 75, 79, 193, 171, 180, 229, 178, 252, 193, 251, 82, 13, 22, 87, 159, 189, 155, 165, 216, 58, 0, 208, 210, 120, 253, 39, 225, 3, 25, 161, 241, 160, 234, 17, 21, 16, 62, 135, 158, 26, 203, 131, 184, 126, 206, 68, 179, 123, 71, 171, 87, 202, 31, 140, 115, 146, 196, 237, 226, 62, 106, 205, 48, 218, 23, 119, 123, 201, 2, 203, 128, 209, 110, 149, 22, 29, 163, 106, 230, 117, 176, 25, 212, 218, 42, 246, 152, 83, 181, 1, 233, 180, 10, 217, 81, 212, 244, 222, 161, 252, 62, 138, 241, 106, 150, 243, 154, 170, 27, 38, 118, 161, 143, 143, 128, 187, 5, 248, 146, 191, 4, 105, 31, 89, 58, 25, 79, 246, 251, 201, 239, 166, 147, 195, 224, 217, 128, 43, 43, 182, 107, 196, 197, 238, 199, 189, 128, 165, 182, 66, 91, 192, 181, 201, 169, 48, 123, 62, 220, 34, 172, 89, 34, 251, 109, 204, 34, 145, 105, 154, 117, 9, 116, 191, 189, 214, 139, 234, 190, 209, 116, 175, 182, 170, 67, 57, 186, 196, 175, 233, 58, 213, 249, 163, 58, 189, 153, 211, 105, 152, 225, 13, 218, 53, 116, 242, 138, 230, 211, 154, 116, 16, 23, 69, 139, 51, 103, 66, 119, 197, 64, 235, 31, 53, 17, 203, 159, 95, 145, 155, 247, 168, 190, 20, 55, 80, 21, 61, 31, 219, 244, 207, 254, 220, 208, 146, 98, 220, 211, 74, 81, 167, 88, 172, 86, 88, 177, 201, 87, 88, 185, 47, 104, 142, 128, 89, 175, 78, 179, 86, 154, 243, 146, 60, 224, 100, 224, 40]

key = [0xde, 0xad, 0xbe, 0xef]

decoded = shellcode

for k in key:
    for s in range(0, len(shellcode)):
        decoded[s] = decoded[s]^k

file = open("output.bin", 'wb')
file.write(bytes(decoded))
file.close()

import speakeasy

def formatter(report):

    for entry_point in report["entry_points"]:
        apis = entry_point["apis"]
        print("Function Calls:")
        for api in apis:
            print("\t{} \n\t\t|_ {}".format(api["api_name"], api["args"]))
        try:
            network_events = entry_point["network_events"]
        except:
            network_events = False
        if (network_events):
            print("Network Traffic:")
            for d in network_events["dns"]:
                print("Dns Query: {}".format(d))
            for t in network_events["traffic"]:
                try:
                    data = t["data"]
                except:
                    data = False
                print("\tAddress: {}:{} Type: {} Data: {}".format(t["server"], t["port"], t["type"], data))
        

se = speakeasy.Speakeasy()

module = se.load_shellcode("output.bin", "x86")

se.run_shellcode(module)

report = se.get_report()

formatter(report)

The output is the following

Last updated