Posted on Friday, May 5th, 2006 | Bookmark on del.icio.us

And the winner is….(about Internet Traffic Distribution)

by Danny McPherson

Much of this post is from an e-mail I sent to a private list ~six weeks back. Trudging through some archives I stumbled upon it and thought it might be of interest to folks here.

—–

Given the topics at hand, I compiled some data (with permission, though now thoroughly anonymized) from 15 discrete networks employing Arbor Peakflow systems for *flow (e.g., NetFlow, sFlow, IPFIX) backbone traffic and routing data analytics. The 15 networks in question represent a wide spectrum of organization types, from tier-1/backbone to large enterprise, to tier-3, to academic, to broadband and MSO-types, from North America, EU & Asia.

The data is representative of average traffic in bits per second (only bps, sorry no corresponding pps this time) on the monitored portions of the networks in question for the last 28-30 days. The aggregate average bps data rate across the 15 networks was 703.3 Gbps (which is a pretty good sample size of live Internet traffic :-).

The information represented here is primarily derived from [sampled] NetFlow (though some derivatives as well) and therefore only representative of Network and Transport Layer attributes of observed network traffic. As such, the probability of applications employing these well-known (and perhaps lesser known) ports/protocols in order to obfuscate alternative payloads is certain.

The methodology (if there were such a thing…)

Basically, I took the top 25 port/protocol pairs from each network, totaled the sum of each, and listed the top 50 here (there were about 85 unique protocols that were represented in the top 25 from all 15 data sets). The first and most obvious: my aggregation-without-normalization (lazy, non-scientific, blah, de da) methodology. I.e., of the 15 networks I pulled data from, they ranged from a couple Gbps to 156 Gbps, and as such, the non-top-25 stats from the larger networks likely skewed my data pool considerably. Fortunately, those larger data sets were from what I consider backbone/transport-type networks and as such, had large representative user distributions that sorta make up for my laziness .. or not, but anyways. I also booted any entry that had only a single appearance (3) in order to avoid skew by some significant localized use - seemed to make sense at the time.

I could share lots of different inferences, but figured I’d only confuse things more, and with that apply a great big DISCLAIMER:

This data, at any given instance, very likely DOES NOT precisely represent what’s occuring on the global Internet. Use at your own risk!

Perhaps at some point in the near future we’ll do something along these lines that is a bit more (considerably more) scientific and takes more than an hour to compile. Nonetheless, perhaps some of you will find it of value.

And the winner is…

inet-traffic

If you’re wondering why 41% of the traffic isn’t accounted for in the top 50, this is largely an artifact of two things, both of which I’ve sorta mentioned already…of the 15 networks I pulled data from they ranged from a couple Gbps to 156 Gbps, and as such, the non-top-25 stats from the larger networks skewed things considerably (wrt the 703.3 Gbps tally). In addition, I only summed the top 25 from each - regardless of the bps values. Although on aggregate there were 85 unique protocols in the top 25 from the 15 data sets, had I recorded the aggregate value of all 85 protocols from each, I suspect the results would have varied somewhat.

I’ll say it again - this data is based solely upon Network and Transport Layer transactional data (IPv4 only); payloads were not analyzed and therefore applications that are port scrambling are not discernable.

Well, I suspect that, on aggregate, peer-to-peer (P2P) traffic holds the #1 spot - not surprisingly, though tcp/80 (s/http/*/) is the winner from a “single port/proto employment” perspective at the moment (perhaps in part thanks to all of you tunneling P2P and other within.

2 Responses | Add your own



Comment Post by: mike smith — May 6th, 2006 @ 2:51 pm EST  Reply

and the importance of this data is what? its funny there are tons of posts which are really stupid and dont mean much

Comment Post by: Danny McPherson — May 6th, 2006 @ 4:26 pm EST  Reply

Hrmm.. Let’s see.. Prevalance of encryption (e.g., pop/imap v. pops/imaps)? IPSEC ESP & AH, L2F? HTTP and HTTP-alt v. HTTPS? Prevalence of peer-peer applications? TCP and congestion friendly applications as opposed to UDP or others? ICMP prevalence? % of fragmented traffic (e.g., if you were building something that performed flow reassembly)? RTP and other real-time protocols and the implications on this and others on network engineering, architecture and security (e.g., if you were going to deploy WRED and some associated queuing algorithm to deal with congestion, oversubscription or redundancy on a given link, you’d have a much better idea about what effect it would have).

I guess I could go on — but if you had to ask the question [as such] it’s probably a moot point…

Thanks for reading!

Leave a Comment