Code: Parsing the DNS-based IP address to route mapping service

Editor’s note: Imported from my old personal blog @ TC with minor edits to improve readability where necessary.

For me, writing code usually means writing combat Perl code. Its my standard joke, but also my standard disclaimer. When I recently used C for another project I included an addendum to my standard disclaimer. In that case, it went like this, “I wouldn’t advise running this as root”. Yet, being able to construct tools with your own code is a tremendously useful skill to have and I frequently urge my networking undergrad students to attain some competency in tool building. Being able to parse and summarize logs is a great first tool to attempt to build. In Perl mine was an ISC BIND named log parsing and summarization tool called named-report.pl. Its an abomination, but it served its purpose and still seems to work well enough for when I need it that I haven’t bothered to redo it. I’ve since gone on to build a number of tools with Perl, many of the parsing and summarization variety, but hopefully each successive incarnation a little better than the last. The one Perl book that has helped me take a little of the combat out of my more recent code for which I love to recommend to other Perl coders has been Damien Conway’s Perl Best Practices. Perl is what I tend to reach for first, mainly out of habit. Regardless of your preferred weaponry for combat coding, lets just agree that being able to build useful tools is a great thing and instead discuss what it takes to build a tool around a service a number of us have used for many years.

If you’re like me, you make regular use of the Team Cymru IP address to BGP route mapping service. I tend to use the whois-based service interface for queries that I do by hand and the DNS-based service interface in code. The Route Views Project offers a similar DNS-based service, but it is not as widely known nor does it have the registry-associated data that can be handy when trying to uncover some quick insight about an address. However, parsing the Team Cymru DNS-based service can be a bit tricky, something I hope this post provides some insight into if not a few good laughs at my code along the way.

Our task here is a seemingly simple one: pass an IP address to a subroutine and get back an autonomous system number (ASN). Here is how the start of such a Perl subroutine might look:

1. sub get_asn {
2.     my $address = shift || return;
3.     my $res     = Net::DNS::Resolver->new;
4.     my $qname   = get_ptr_name($address);
5.     my $query   = $res->send( $qname, 'TXT', 'IN' );
6.     my $asn;
7.
8.     return if !$query;
9.     return if $query->header->ancount < 1;

The routine above expects a scalar value parameter, an IPv4 or IPv6 address, and assigns it to the $address variable in line 2. We set up a DNS query by using the Net::DNS module in line 3 to create a new resolver object. In line 4 we need the reverse or PTR name that will be used in the query so we pass the address to a utility function called get_ptr_name() and expect the appropriate query name back and assign it to the $qname variable. We are then ready to send the query and attempt to do so at line 5. If the query fails or no answer data is returned, we abruptly leave the subroutine at line 7 or 8 respectively. At any time we leave the subroutine early we will return with an undefined value, so it will be up to the caller to handle such a condition gracefully.

As an aside, let us take a quick look at the get_ptr_name() utility routine and see what it might do:

a.    sub get_ptr_name {
b.        my $addr = shift || return;
c.
d.        if ( $addr =~ /:/ ) {
e.            $addr  = substr new Net::IP ($addr)->reverse_ip, 0, -10;
f.            $addr .= '.origin6.asn.cymru.com';
g.        }
h.        else {
i.            $addr  = join( '.', reverse split( /\./, $addr ) );
j.            $addr .=  '.origin.asn.cymru.com';
k.        }
l.
m.        return $addr;
n.    }

This subroutine uses a simple regular expression to test for an IPv6 address. If a colon (':') character is found in the address string, the address is presumed to be an IPv6 address, otherwise it must be an IPv4 address. In line e. we use the power of CPAN and the Net::IP module to get the reverse nibbles for an IPv6 address, because doing so by hand is a pita. However, in that case we also must strip off the trailing .ip6.arpa. zone the module includes by default and in its place append the Team Cymru IPv6 route origin zone. We perform a similar, but simpler transformation on an IPv4 address and return the final result.

Presuming everything has gone well up to this point, we want to process a DNS answer we get back. What does an answer look like? This is where things can get a little hairy. We might get multiple answers and there may be multiple ASNs listed in each answer. In DNS-speak, here is what the general format of the RDATA in an RRset will look like:

"49152 [...] | 192.0.2.0/24 | AA | registry | 1970-01-01"
[...]

There are five fields per answer, separated by a pipe ('|') symbol. The first field is an ASN list. Often it will be a single ASN, but due to multiple origin autonomous system (MOAS) routes there may be more separated by whitespace. The second field is the covering route prefix. The third field is a two-letter country-code based on IP address registry allocation information. The fourth field is the registry responsible for the address allocation. The fifth and final field is the date the registry allocated the covering prefix. If there is a route, you should get an answer and at least one ASN and prefix. Beyond that, you should code defensively. Most of the time you get a single answer and a single ASN, but don’t count on it. In our case, we won’t care about more specific prefixes nor multiple ASNs. Continuing on then…

10. ANSWER:
11.     for my $answer ( $query->answer ) {
12.         next ANSWER if $answer->type ne 'TXT';
13.         ($asn) = $answer->rdatastr =~ m{ \A ["] (\d+) }xms;
14.         $asn ? last ANSWER : next ANSWER;
15.     }
16.
17.     return $asn;
18. }

We conclude our simple ASN mapping routine by finding the first TXT RR in the set and capturing the first ASN in that RR before returning it to the caller. Keep in mind that this routine is very simple and likely not suitable for any truly robust project where you care about multi-homing, MOAS, different covering prefix announcements, upstream routes or a descriptive name for an ASN. Constructing code that deals with those situations is probably more appropriate for a library than a blog post (not a bad idea eh?). The Net::Abuse::Utils module contains a subroutine called get_asn_info() which uses our mapping service and goes a little further than I show here. I wrapped the routines above into a small script called sample-tcbgp-mapping.pl which you may freely use and expand on for your projects. It will take a list of IPv4 or IPv6 addresses, one per line via STDIN, and give back a pipe-delimited list of the first associated ASN it finds or ‘NA’ if none. Go forth and do battle.