Measuring Botnet Populations
The following is excerpted from a talk I gave at the 2012 APCERT meeting in Bali, Indonesia in March, 2012. The topic was on botnet population measurements, something that we’ve been doing for many years and has grown in importance.
What do we mean when we talk about measuring botnet populations? We are trying to measure the number of infected devices to figure out how many people are affected, the number of accounts or customers, and the like. Because of the way the Internet is structured, we can only measure the number of infected PCs or IP addresses received in a time period. We then have to use this information to estimate how large the botnet infected population is.
We count botnet populations for several reasons. First, we want prevalence measurements in order to understand which threats to focus our limited efforts on. We want to understand the prevalence of a botnet by geographic region, for example, to understand to whom we need to reach out. We also want to understand how we should prioritize our efforts, focusing on botnets that will yield a significant impact if they are addressed. Finally, we want to understand the scale of the resources we need to gather as we tackle the botnet. Continuous measurement is vital in order to understand what mechanisms are effective at reducing the botnet’s population. Also, if the numbers ever drop to zero, we can call it a victory. Finally, we also want to understand the size of the possible attacks and any expected financial impact, in order to prepare defenses.
Counting methodologies are broken into several different methodologies. Measurements using sinkholes are the most popular mechanism right now to count. In this method, we take the botnet command control server and redirected either by DNS or IP redirection to servers that the good guys operate so it’s now outside of the botnet operator’s hands. Then we’re able to count the number of unique IP addresses connecting every day to this server, and we know that these belong to a particular botnet. We can also fingerprint the traffic coming in and are able to distinguish one botnet from another, giving us a prevalence count.
Sinkholes are the most common mechanism right now to count botnets, and are widely done by many groups. All we have the number of IP address is a connect to us here, but sometimes there is a piece of information and the communications from the botnet to the server that we can use to uniquely identify the client and identify when there are more than one PCs source IP address. This might include for example the MAC address for from the botnet, the hostname from the PC, or in the case of the recent Flashback malware the UDID from the device itself. This can help give us some better numbers about the population size.
Shown above is the number of Conficker infected systems that we counted over a two week period. This was gathered using the “q” value from each individual communication and then summed per source IP every day, yielding a decent estimate of the size of the botnet. In this period we estimated the botnet grew from 200,000 zombies to much more than 700,000 zombies.
Another method for counting botnets and estimating their size we call dark IP monitoring. This method takes large unused IP address blocks and then listens for traffic. The collection system is able to fingerprint bots based on specific signs. This could include the exploit traffic or traffic to a specific TCP/IP service used. This then gives you some passive mechanism to watch the botnet and try to spread. Arbor used this method to measure the size of the 2003 Blaster worm, watching a /8 network and counting worm sources.
This graphic is from of paper that we wrote called The Blaster Worm: Then and Now covering the Blaster worm’s propagation over time. Shown here are the various stages of the worm’s specific traffic from our dark IP monitors, showing the worm’s initial burst onto the Internet, followed by the decay phase as networks shut down those hosts and the TCP/IP services the worm used to propagate. The final phase in the graphic shows the diurnal rise and fall of the worm’s populations as PCs are turned on and off each day. The counts are the number of unique source IP addresses every hour.
A direct method for measuring botnets is actually counting on infected hosts. Microsoft has the best option here because they’re able to count reports from their Windows antivirus software, the MSRT executable pushed down during Windows Update, and other host-based antivirus solutions. Distributing this tool globally has enabled them to measure how many infected PCs hit each individual signature. While this is the most direct measurement possible, this is not accessible to many people outside of Microsoft.
Another direct methodology is to crawl a peer-to-peer botnet, gathering the peer list from every node and recursively walking the botnet. This enumeration of the botnet is possible if you know the P2P protocol, but is easily thwarted by strong cryptography. Kaspersky Labs has used this to track the Storm worm, the Miner.h botnet and others. Shown below is a graphic from a Kaspersky Labs blog post on the Miner.h P2P botnet, showing how the nodes are connected.
Clearly with botnet measurements you have possible visibility issues. If, for example, ISPs are blocking ports or are blocking collection addresses and instead directing clients to go to their own sinkholes on their own servers, identifying customers, this will lead to under-counting. Similarly, if the domain names for the botnet, which now point to sinkholes, are used in DNS blacklists, clients will never be recorded at the sinkhole, again leading to undercounting. Also, if hosts are offline – not connected or just powered off – they wont be counted. Finally, if the bot’s self-reporting mechanism is to be trusted to count the botnet population, you are possibly the victim of inaccurate reporting by the bot, either being actively deceived or through errors in the bot’s counters. All of these can lead to inaccurate values.
There are also problems in estimating populations from the source IP counts we gather. DHCP, for example, can lead to over counting. We know that one IP address does not mean equal one device, as DHCP churn can lead to the same device getting multiple IP addresses in a given day. NAT is another issue that can lead to reductions in the numbers. We see ratios about 10 to even 100 to 1 in the wild, meaning we believe that 100 PCs exist for every IP address in some parts of the network. The Blaster worm example from 2003 that I showed earlier is a striking example. Our estimate we present in the IEEE paper was about 800,000 hosts infected with the worm, while Microsoft’s direct measurements showed about 8 million hosts in the same timeframe.
Botnet infection data is widely available now from groups such as Arbor, Shadowserver, Team Cymru, and others. Data feeds from sinkholes and other measurements can be used by network administrators to identify infected hosts and remediate their problems. A number of these are covered in a recent report from ENISA entitled Proactive detection of network security incidents.
Obviously robust measurements are a crucial element to addressing the botnet problem. In the measurement community, we have identified gaps and inconsistencies in our available methods. Where we are going with this now is trying to standardize methodologies so we can measure consistently. Furthermore, we’re trying to identify the causes for the gaps in the methodologies (e.g. network vs host measurements) and provide stronger data by closing those gaps. Based on this data, we also work globally to identify working strategies that effectively shut down botnets and drop infection rates. We then want to coordinate these efforts globally to lead to lower infections in each region.