MindshaRE: Statically Extracting Malware C2s Using Capstone Engine
It’s been far too long since the last MindshaRE post, so I decided to share a technique I’ve been playing around with to pull C2 and other configuration information out of malware that does not store all of its configuration information in a set structure or in the resource section (for a nice set of publicly available decoders check out KevTheHermit’s RATDecoders repository on GitHub). Being able to statically extract this information becomes important in the event that the malware does not run properly in your sandbox, the C2s are down or you don’t have the time / sandbox bandwidth to manually run and extract the information from network indicators.
To find C2 info, one could always just extract all hostname-/IP-/URI-/URL-like elements via string regex matching, but it’s entirely possible to end up false positives or in some cases multiple hostname and URI combinations and potentially mismatch the information. In addition to that issue, there are known families of malware that will include benign or junk hostnames in their disassembly that may never get referenced or only referenced to make false phone-homes. Manually locating references and then disassembling using a disassembler (in my case, Capstone Engine) can help to verify that you have found the correct information and avoid any of the junk inserted to throw your analysis off.
For those not familiar, Capstone Engine is a disassembler written by Nguyen Anh Quynh that was first released in 2013. The engine has seen a significant amount of development in that short amount of time and has a good track record of handling some tricky disassembly. Most importantly, it supports most popular programming languages, including Python – my current programming language of choice. One complaint I have with using an on-the-fly disassembler is the lack of symbols, but that can be gotten around by taking the list of imports and addresses from pefile and then checking any memory references against it. All of the PoCs presented expect an image base of 0x400000, but for any production use the actual image base should be parsed out and replaced.
Example: Backoff PoS Malware
Backoff is a recently discovered PoS malware family. I noticed that many of the times the malware was sandboxed, it would not communicate with a C2, but I could see the C2 info in plain-text in the binary or other times when the C2 was down.
In an attempt to “correctly” locate the C2 information and utilize some Capstone-fu, I crafted a function that first locates hostname- or IP-like strings in the binary, looks for a “mov [register+offset]/<addr> addr” pattern, and then uses capstone to disassemble to obtain the other configuration elements.
This ends up being useful, since the argument order is not necessarily the same. This doesn’t work for all versions, but does work for most – I have encountered a number that are using a VisualBasic injector or are using an array structure to store the config so the below code will not work. This can be coupled with another piece of code that searches for version-like strings and then disassembles to find the additional campaign name attached to the binary. The code should check to see if a) host,port, URI are defined after the loop and b) if the number of mov instructions encountered before the call was 3. The number of mov’s ends up being important since my code starts with the hostname and the arguments are not always encountered in the same order. If the mov’s are less than 3, then I jump back the appropriate number of mov’s via regex search and then walk the disassembly again to see if I encounter the expected configuration data. This will also help find the backup domains and URLs that are embedded in the malware that may not be seen during a sandbox run even if there is successful communication to the C2. The code is quick and dirty and can easily be improved by validating some common instructions seen in between, but is presented as-is for this example:
md = Cs(CS_ARCH_X86, CS_MODE_32) md.detail = True movs = 0 host = None uri = None port = None for insn in md.disasm(code, 0x1000): if insn.mnemonic == 'mov': movs += 1 if insn.operands.type == X86_OP_IMM: v = insn.operands.value.imm.real if v < 65536: port = v else: x = self.get_string(file,v-0x400000) if URI_REGEX.match(x): uri = x elif DOMAIN_REGEX.match(x): host = x elif IP_REGEX.match(x): host = x elif insn.mnemonic == 'call': break if movs == 3: break
Example: Alina PoS Malware
Alina is a PoS malware family that has been around for awhile. Similar to Backoff, I noticed that many of the sandbox runs did not successfully communicate with the malware when the configuration was viewable.
I used a similar process to what I did with Backoff to first locate potential C2 candidates and then search for XREFs and disassemble with capstone. Many times the C2 is stored is pushed onto the stack followed by instructions setting local variables and then a subroutine call. Prior to the push of the C2 and the URI, there is another push that represents the length of the string and can also be used to validate the sequence. Once again, this is a great place to utilize capstone to make sure that anything that is extracted matches up with what is desired.
This sequence of pushes and calls always seems to be preceded by a call to InitializeCriticalSection, so I first look for that, using a dict built from loading the binary into pefile to get at the import table.. The order that the hostname and the c2 occur in the binary can be flip-flopped, so I allow for that. I do make sure that the next push after the strlen is a string The code can be extended further to validate that the strlen matches the string I extract from the binary, but this is just a PoC 🙂
for i in md.disasm(CODE, push_len_addr): if instr_cnt == 0: # check for InitializeCriticalSection if i.mnemonic == 'call' and \ impts.get(i.operands.mem.disp,'') == 'InitializeCriticalSection': print "On the right track..." else: break elif i.mnemonic == 'push' and i.operands.imm < 0x100: strlen = i.operands.imm str_instr = instr_cnt + 1 print "Found the strlen push",i.mnemonic,i.op_str elif strlen and str_instr == instr_cnt and i.mnemonic == 'push': addr = i.operands.imm if addr == 0x400000+file.find(s): print 'found hostname push' hostname = get_string(file,addr-0x400000) print hostname else: uri = get_string(file,addr-0x400000) if URI_REGEX.match(uri): print uri instr_cnt += 1
Example: DirtJumper Drive
My last example involves a more complex example. Drive stores its most interesting strings in an encrypted format and does not decrypt all those strings in the same function, instead scattering the calls throughout the binary. In this example, I use the encrypted install name – it always starts with the same characters – to help us locate the decryption function. The decryption function is the function called right after the call that Xrefs the encrypted install name.
With the address of the decryption function known, I use the “k=” string used in the phone-home to help locate the network communication function. This function is where the C2 information is first decrypted and the C2 and the URI are the first two things decrypted in this function. The code can then be walked further down to locate the C2 port, but that code is not shown here.
Here’s the first piece of code used to locate the decryption function:
mov_addr = '\xb8'+struct.pack("<I",0x400000+file.find(s)) instr_addr = 0x400000+file.find(mov_addr) if instr_addr <= 0x400000: mov_addr = '\xba'+struct.pack("<I",0x400000+file.find(s)) instr_addr = 0x400000+file.find(mov_addr) # looks for PUSH EBP; MOV EBP, ESP func_start = file[:instr_addr-0x400000].rfind('\x55\x8b\xec') code = file[func_start:func_start+0x200] md = Cs(CS_ARCH_X86, CS_MODE_32) md.detail = True decrypt_func_next = False calls = 0 for i in md.disasm(code, func_start+0x400000): # looking for mov eax, if i.mnemonic == 'mov' and len(i.operands) == 2 \ and i.operands.type == X86_OP_REG and i.operands.reg == X86_REG_EAX \ and i.operands.type == X86_OP_IMM and i.operands.imm >= 0x400000 \ and i.operands.imm <= 0x500000: d = decrypt_drive(get_string(file,i.operands.imm-0x400000)) # validate that this is indeed the install name if d.endswith('.exe'): config['install_name'] = d decrypt_func_next = True # check for the next call after the install name call elif decrypt_func_next and 'install_name' in config \ and i.mnemonic == 'call' and calls == 1: config['decrypt_func'] = i.operands.imm break elif 'install_name' in config and i.mnemonic == 'call': calls += 1
Now that the decryption function has been located, the desired C2 information can now be located.
mov_inst = '\xba'+struct.pack("<I",0x400000+file.find('k=')) mov_k_addr = 0x400000+file.find(mov_inst) # look for PUSH EBP; MOV EBP, ESP func_start = file[:instr_addr-0x400000].rfind('\x55\x8b\xec') code = file[func_start:func_start+0x200] md = Cs(CS_ARCH_X86, CS_MODE_32) md.detail = True calls = 0 d = None for i in md.disasm(code, func_start + 0x400000): # look for mov edx, <addr> if i.mnemonic == 'mov' and len(i.operands) == 2 \ and i.operands.type == X86_OP_REG and i.operands.reg == X86_REG_EDX \ and i.operands.type == X86_OP_IMM and i.operands.imm >= 0x400000 \ and i.operands.imm <= 0x500000: d = get_string(file,i.operands.imm-0x400000) # if call decrypt_func, then decrypt(d) elif i.mnemonic == 'call' and i.operands.imm == config['decrypt_func'] and d: # first call is the c2 host/ip if calls == 0: config['host'] = decrypt_drive(d) d = None calls += 1 # 2nd call is the URI elif calls == 1: config['uri'] = decrypt_drive(d) d = None break
Capstone is a useful tool to have in your toolbox and hopefully the PoC code presented in this post will aid others in the future. For my own future work, I plan to tighten up the code presented and work on getting code for other interesting malware families into something that will be suitable to push out for public release.