Ops: Deep Darknet Inspection - Part 3 of 3

Editor’s note: Imported from my old personal blog @ TC with minor edits to improve readability where necessary.

In Deep Darknet Inspection - Part 1 and Deep Darknet Inspection - Part 2 we were able to determine not only the origin and nature of packets we received, but also postulate with a high degree of confidence, characteristics about the systems that sent them. Before completing the series, I need to correct something I said in part one. I had suggested that we would end the series with a mystery, but it turns out I solved it, or at least sufficiently so. The delay in posting this last segment is in part due to following a trail of clues I had thought were going to lead to a dead end. I still don’t believe deep darknet inspection will always aid in uncovering the desired insight, but in this case I proved myself wrong. We begin by examining a set of TCP SYN scan packets that share some interesting characteristics:

TCP SYN packets towards a single destination IPv4 address in a darknet session spanning 36 hours:

Top 5 IP identification field values % of total id value

12          0x0100
<1          0xcdb4
<1          0xccc2
<1          0xaa03
<1          0x4bff

Top 5 TCP source ports % of total source port

7.5         12200
5.6          6000
 <1          4369
 <1          3348
 <1          1558

Clearly, you can see an anomaly in the IP identification and TCP source port fields. In fact, based on the percentages, you might guess that the 6000 and 12200 source port SYN segments have their IP identification value set to 0x0100. In fact, this is true about 88% of the time. It is possible that some other random TCP SYN segments arriving at the darknet just happened to use one of those two source ports or perhaps the IP identification field was overridden by a middle box. What is true 100% of the time is that the advertised TCP window size is always 16384 for source port 6000 segments and 8192 for source port 12200 segments. If we examine just the TCP segments that have one of those anomalous properties (i.e. IP id = 0x0100, TCP source port = 6000 with an advertised window of 16384 or TCP source = 12200 with an advertised windows of 8192), do they have anything else in common? You know the answer is going to be yes, or I wouldn’t be asking.

All of those selected TCP SYN segments are exactly 20 bytes in length. We know that a TCP header without options is 20 bytes. You might think there isn’t anything so special about that, but in fact there is. What modern operating system’s TCP/IP stack doesn’t include any TCP options in the initial SYN segment by default? None that I know of. In my experience, there are relatively few modern systems that generate legitimate traffic matching that signature. Older systems (e.g. SunOS 4.1) and some VPN software are amongst the few that do. However, many tools that assemble TCP packets (e.g. port scanners) usually omit TCP options in the SYN segments they create. Normally these packets are created through the raw sockets interface. Raw sockets support is severely limited on modern Microsoft Windows systems, but is readily available on other systems.

As an aside, this 20-byte TCP SYN (and SYN flag only) with no options signature we stumbled upon makes a pretty good anomalous network traffic detector. Here is a tcpdump filter expression you can tailor to suite your needs that captures TCP segments with no options and only the SYN flag set:

'tcp[12] == 0x50 and tcp[13] == 0x02'

If you only have NetFlow data all is not lost. Watch for 40 byte TCP flows that have only the SYN flag set. What do you see? Can you easily filter out the false positives?

Lets get back to analyzing our anomalous SYN packets. The TTL field in the IP header might normally give us a clue as to the operating system, since many operating systems start with different default values. In fact, many of the TTL values we see here are around the 100 to 110 range, and judging from the path back, this would suggest they all started out at or around 120. That would be an uncommon default value for any OS. We might speculate that they all began at 128 since many Microsoft systems default to that, but is it likely we’re consistently always 8 hops off in the reverse direction? It is possible, but unlikely with so many varied sources.

Determining the OS is actually going to be tricky with what we know thus far because we can assume not only is the TCP portion of the packet being assembled using raw sockets, but the IP header appears to be assembled with raw sockets as well. Why do we think this? Even discounting the discrepancy of the TTL values, the common IP id value of 0x0100 would be highly improbable if raw sockets weren’t being used. All is not lost however.

From 2005, Jose Nazario’s Dasher.C Now In the Wild blog post details an amazingly similar set of packet characteristics. Notice the constant source port of 6000, the constant IP id value of 256 (0x0100 in hex) and the initial TTL of 120. Aha! This seems like our culprit or at least a relative. A comment in a thread entitled “win_dasher” at Offensive Computing shows some additional disassembled output of Dasher’s TCP Port Scanner module, the piece of the malware originally and apparently written by someone that goes by the moniker of WinEggDrop. We can easily find what looks like the original version of WinEggDrop’s TCP SYN scanner source code on a Chinese webforum. Examining that source code we see a hard coded IP TTL of 120 and a hard coded TCP window size of 16384, just like we’ve seen in our source port 6000 scans:

ip_header.ttl = 120; 
...
tcp_header.th_win = htons(16384);

Have you noticed we haven’t even bothered to characterize destination ports or source IP addresses yet? This might be what casual investigators look at first. It is often what generic statistical analysis and monitoring systems care most about. TCP packets with a source port of 6000 has been mentioned in a Internet Storm Center Diary entry. Likewise, TCP packets with a source port of 12200 has come up on a thread at DSLReports.com. Yet, both discussions end without a satisfactory explanation of what is going on. Maybe by doing “deep” darknet inspection we can get a bit closer to the ground truth. However, before we can pass judgment on what the data means we first want to better understand what the data actually is and what it is not.

There is something else peculiar about many of the source port 6000 packets. About 20 different sources are using either 622723072 or 1344798720 as the TCP initial sequence number (ISN). Looking back at WinEggDrop’s original code we see:

dwSeq = 0x19831018 + nPort;   // Set A Sequence 
...
tcp_header.th_seq = htonl(dwSeq); // Syn Sequence 

428019736 (0x19831018) is nowhere near either of the ISNs we are seeing, but it helps support the hypothesis that we are seeing a derivative of the WinEggDrop’s code. The source port 12200 scans appear to be another step removed, because they don’t use common sequence numbers, but there is much less randomness than there ought to be. Perhaps those scans are from a later generation of the original code or a minor branch of an ancestor?

What you may want to know is what are all these Dasher / WinEggDrop TCP scanner derivatives doing? Judging by the frequency of source addresses (i.e. overwhelmingly Chinese) and destination ports (i.e. largely ports such as 1080, 7212, 8000 and 8080), they are most likely looking for open proxies. We probably didn’t need deep darknet inspection to figure that out, but I for one have a much clearer picture of the nature and origin of the packets than we did when we started. Particularly in this case, when I thought I was going to end with more questions than answers, there is some personal gratification in knowing more now than when I started.