Automating Intelligence: Discovering Recent PlugX Campaigns Programmatically
One of the hardest things to do when you are receiving malware that have “anonymized” (e.g. name-is-hash) names or general samples that lack any indication of the infection vector is to determine the origin of the file and its intended target. Even harder is when you do not receive telemetry data from products that contains information about infected machines. To that end, I have been working on automating ways to help ASERT better understand the context around samples so we can answer question about what may have been targeted, why it was targeted and when it was targeted. This post will use the PlugX malware as an example (PlugX is well known and has had its various iterations analyzed many times), due in part to its ongoing activity and will focus on leveraging metadata from VirusTotal due to it being publicly accessible.
Automation is king when processing malware and getting the configuration out of samples without analyst intervention is always ideal and we prefer to treat our various sandbox platforms as black boxes and extract what we can from them before doing our own normalization and post-processing tasks to collate all the information into our internal malware analysis system and issue any alerts based on various properties or behaviors of a sample. To achieve this in Cuckoo Sandbox, we wrote a custom API call to run arbitrary Volatility Framework plugins on samples – typically this involves requesting process memory dumps of all PIDs touched during execution and then requesting all pages that the malfind plugin flags and linking those to the execution task. Once these are back in our processing system, we use a modified version of the PlugX configuration scanner originally authored by Fabien Perigaud to find and store the configuration. I updated the Volatility plugin earlier this year to support more configuration sizes – including the P2P variant that JP-CERT disclosed in January 2015 – and started migrating towards using ctypes structures to represent the various configuration sizes and published my changes to my personal repository on GitHub. The ASERT malware processing system is able to reliably extract and store PlugX configuration files that we can use to perform searches to see how many samples may be involved in a campaign, which C2s are being re-used or data such as what is the most popular installation name in part due to this plugin. After processing samples, we then query sources such as VirusTotal, TotalHash and TitaniumCloud to gain additional metadata like first seen, first seen in-the-wild, any known URLs that the file(s) came from, and filenames seen for the sample to further augment our search for proper context around the sample in question.
The “when” can be very important because something that was targeted 5 years ago is more likely to have been investigated and resolved whereas something that was targeted only 5 days ago may still be active – this is not always the case as some recent targeted threat reports have discussed long-running threats and C2s, but . One of the simplest (and usually unreliable) things to check when determining age of a sample is the compile timestamp – this can be easily modified or may be hardcoded. Delphi executables mostly end up with a hard-coded compile timestamp in 1992 and it’s also not a hard thing to modify. However, some malware families have somewhat reliable compile timestamps when used in conjunction with other indicators and PlugX is one such family.
Another way to determine the “when” is to use multiple antivirus / file tracking services such as the ones detailed above to see when they have first and last seen a file go through their system. Using the PlugX sample 1c6a50e51203fda640b8535268bee657591d0ac5 as an example, we see a compile timestamp of 2015-06-23 05:05:33 and a first seen value of 2015-06-24 04:45:30 on VirusTotal, so a reasonable assumption can be made that the file was probably created and sent out around those dates.
In the case of PlugX, sometimes other breadcrumbs are left in the configuration files that point to a when. The PlugX P2P sample with SHA1 79b073433082abfb6096b98c0780c5c0b5cce08b has the classic Delphi compile timestamp of 1992-06-19 22:22:17, but the CnC authentication string in the configuration is 2015-5.18 which matches up more closely to the VirusTotal first seen of 2015-05-19 19:55:29. Another example of this can be seen in the sample with SHA1 721e92d9bcec1baa687b6a244f24fc26e09da04e: a compile timestamp of 2015-05-22 07:05:47, VirusTotal first seen of 2015-05-26 03:04:19 and an auth string of 201505221504 lead to a reasonable assumption that this sample was very recent.
In use some basic regexes to detect potential timestamps in these fields and compare the compile timestamp, first seen timestamps and any timestamp-like values in the configuration to see how well they match up and alert analysts when a new sample is seen that matches the characteristics we are looking for.
This method is far from perfect and is not reliable by itself, but it does help give us a sense of the timeframe for a particular campaign when starting from scratch.
Another technique to determine when a threat has been active is to use Passive DNS sources such as OpenDNS or Farsight Security to obtain information about resolutions, the first and last query times, the number of unique sources who queried and the number of queries seen. This helps give some insight to reinforce or disprove previous assumptions, but will not be comprehensive as no existing service has full DNS visibility on the internet, many services will have some overlap and not all offer the same level of metadata detail regarding the source(s) that generated the queries (e.g. ASN, country code). The likelihood of very targeted subsets of users in specific countries and we can’t reliably expect a passive DNS source to see DNS queries from infected machines. In the case of recent sample 9edecb01897b2984daa29c979701e6df7c75160a that was compiled AND first seen in-the-wild on July 13, none of our passive DNS sources saw queries to the only C2 domain until July 21. ASERT received the sample shortly after July 21 which indicates that this was likely when the sample starting getting shared inside the security community and run through dynamic analysis. We know this sample was in-the-wild on July 13 – a week before the passive DNS services ASERT uses observed queries against its C2 domains, so one has to make sure that researcher activity is taken into account when measuring DNS queries after it is known to the security community. PlugX also commonly overrides the DNS servers of infected machines and this limits the availability of “legitimate” queries by infected machines seen by various passive DNS services that collect at the DNS server level. Passive DNS can still be useful for tracking overall lifetime of a campaign and potentially infected organizations, but less so when the campaign is short-lived or no access to extra metadata is available about the computers generating the queries.
The Who and The Why
For the purpose of this post, I group the Who is targeted and Why they were targeted together because in many of these cases one helps lead to the other, especially in cases where we can narrow down the timeframe that the targeting took place. Without any direct information relating to compromised sites, phishing emails, fake sites designed to mirror legit services, or information on compromised machines, I usually turn to the submitted filenames that VirusTotal (and other services) provide. Many times the origin filenames will be in the language of the targeted group and will give hints as to what industry / group was targeted and possibly why.
When the target is a non-native English speaker, the filenames will also typically be non-English. When trying to determine the whos and whys and trying to keep everything automated, various translation services that support API calls have to be relied upon. There are two primary choices that can be used to help translate a filename: Microsoft Translator API, which offers 2 million characters free per month and Google Translate API, which does not currently have a free translation level – there are modules for various languages that will scrape the web form, but that will likely be detected when attempting to translate a large number of strings in a short amount of time. Language detection and auto-translation are imperfect and there is no guarantee that a submitted filename is the original name, but can be used for correlation.
The previously mentioned 1c6a50e51203fda640b8535268bee657591d0ac5 had a submitted filename in Japanese of “豪外相 集団的自衛権の行使容認を支持.exe” that was translated to “Support to accept the exercise of the Japan-Australia Foreign Ministers’ right of collective self-defense.” Doing manual searching, there was a statement of cooperation for boosting defense between Australia and Japan in late May / early June 2015. This was over 3 weeks before the compile and first-seen timestamps, but was likely a useful lure in the aftermath of a public event like that. This file was submitted to VirusTotal from a Japanese IP address which allows for a loose assertion that a Japanese defense company or government organization was the likely target of this sample.
Another interesting sample is b5ea24faa3f9fe37cd30f8494fb828d9e993b2ca. This sample is a zip-file that contains a PlugX sample with a compile timestamp of April 21, 2015 and was first seen in-the-wild on April 23. The zip filename on VirusTotal caught my eye when I first saw it due to being non-English – “２７年５月国防問題講演会.zip.” Google translate identifies this as being of Chinese origin and translated it to “May 27 speech on defense issues will .zip” which gives a hint of the target being defense-related. Searching for defense-related things around the May 27 timeframe yields a conference named “NW Aerospace Defense Symposium” that takes place in Oregon, which doesn’t seem to fit with the sample being submitted from Japan. A search for “May 27” and “Japan” instead turns up an interesting news event that fits better in the compile and in-the-wild timeframe and potentially targeted country of Japan: “On April 27, the United States and Japan released the new Guidelines for U.S.-Japan Defense Cooperation.” Even with that, it is still guesswork as to which news event is “correct” without having the original lure or more information on the infected machines
Tying it All Together
ASERT recently saw a sample with SHA1 2d99e88c30cd805f5e346388d312f7a3e3386798 that contains a 2788 byte configuration structure which we found interesting for a few reasons: the configuration size was “new to us” and the way C2s were stored in the configuration appeared different than previous samples. The C2 port was in plain-text after the C2 host vs stored in a packed word value prior to the C2 host like other configurations we observed seen since we started monitoring and collecting PlugX configurations earlier this year. The other interesting piece was that there was an In-The-Wild URL associated with the sample on VirusTotal. The URL was first seen on July 23 and the domain – publicvoicepress[.]org was only registered on July 20 and may be a lure for people wanting to visit thepublicvoice[.]org – a very loose assertion based on no other sites being represented when searching for “public voice press” and the background of The Public Voice organization. The HTTP headers that VirusTotal logged for the page that drops the file reveal a “Last-Modified” header with the value of July 22, 2015. Another interesting piece of metadata picked up from VirusTotal is the submission country of Hong Kong due to ongoing political disagreements with the Chinese government. Combining all these artifacts we can still only guess as to who exactly was targeted, but we do have a timeframe – sometime between July 20 and 22 – and a possible why – political unrest in Hong Kong.
None of the techniques discussed in this blog post are novel, but when combined together in an automated fashion they can greatly speed up the analysis around determining which campaigns may be more active or which samples merit immediate investigation.
2d99e88c30cd805f5e346388d312f7a3e3386798 b5ea24faa3f9fe37cd30f8494fb828d9e993b2ca 1c6a50e51203fda640b8535268bee657591d0ac5 9edecb01897b2984daa29c979701e6df7c75160a 721e92d9bcec1baa687b6a244f24fc26e09da04e 79b073433082abfb6096b98c0780c5c0b5cce08b 1c6a50e51203fda640b8535268bee657591d0ac5