Posted on Tuesday, July 18th, 2006 | Bookmark on del.icio.us

Googling for Malware, Bobbing for Mass Mailers

by Jose Nazario

HD Moore recently released a malware search engine. Dan Hubbard and the team at Websense had released an announcement that they had been able to use Google to find malware specifically. HD Moore was evidently frustrated that they didn’t get a copy of the code (evidently all he had to do was ask …), and so he wrote how own. I’ve looked at both, and I actually prefer HD’s later implementation, as it uses a couple of different ideas. So, I re-wrote HD’s tool in Python using my DuckyLib to wrap the Google queries in a simple API, and PE File from Ero for the Win32 binary analysis. After about 20 minutes, I went from sitting down to having a set of tools. The queries for a few hundred signatures took an additional 10 minutes.

HD’s tool (and hence mine) works like this:

  1. Read in an EXE file
  2. Unpack the PE header and gather up four different values
    1. TimeDateStamp
    2. SizeOfImage
    3. AddressOfEntryPoint
    4. SizeOfCode

    This combination is a unique signature for the malware.

  3. Google for these keys and values, using the Google API, and look at all of the EXEs roll in!

Surprisingly effective. After looking over a few hundred signatures, several dozen malware samples appeared and nary a 404 in the bunch. Another note: Google’s cached EXE exposes some of the APIs used in the program, so you can restrict it to EXEs that contain the function InternetOpenURLA, for example (i.e. downloaders). Dan also points out that you can search for specific sections. Pick a packer and start Googling for it’s common section names….

So far, HD’s live Google hits on his malware search usually return nothing. Most of the malware has been found and removed, so that’s good. I wonder how Google will react to this. On the one hand, it’s just data, and it’s best to avoid drawing a line somewhere about what you will/will not index. On the other hand, this is drawing some heat to them. My guess is that they’ll watch for abuse, work with site operators to notify them when they spot malware, and quietly tuck this one away. But I don’t really know what goes on there… Blocking HD’s site as a referrer may be a simple start to this, who knows. Currently, when we find malware we share the links with the parties responsible for taking it down. We’re doing our part, and people seem to appreciate it.

Most of the malware this method has found is not the secret stash locations you’d expect: it’s mailing list archives. It looks like some mailing list got spammed by the malware and held the attachment in a folder, and it’s been indexed. Nothing unexpected there, but some sites got hit worse than others.

Many thanks to Dan @ Websense for kicking this whole thing off, and to HD for pushing it forward.

3 Responses | Add your own



Comment Post by: Kirby Kuehl — July 19th, 2006 @ 7:01 am EST  Reply

From what I understand (after speaking with HD), the Websense announcement was first. HD then wrote his version because Websense’s wasn’t openly available to everyone.

Comment Post by: Jose Nazario — July 19th, 2006 @ 9:46 am EST  Reply

yeah, someone edited the post and screwed that up. fixing coming up shortly.

Comment Post by: HD Moore — July 23rd, 2006 @ 12:53 pm EST  Reply

The MWSearch backend has been updated to include only those malware signatures with a confirmed hit in the Google index. This is cheating in a sense, but for demonstration purposes, it does a much better job than the original:

http://metasploit.com/research/misc/mwsearch/

Leave a Comment