For years I have relied upon tools such as Team Cymru’s IP address to AS mapping service. When I worked there I tried to help encourage others to use the DNS interface in a blog post and with sample Perl code. In my research work as well as the growing size and number of DataPlane.org feeds I have found myself needing to get the origin ASN and AS name of an increasing number of IP addresses quickly. The problems with performing a large number of mappings against remote third party services vary. Most notably, performance can be severely limited by the network API interface (e.g. repeated DNS queries, even with a local resolver and caching is still relatively slow) and providers may impose limits on remote client usage. For optimal performance, having a current, local AS data set to interface with is an appealing option. This article describes how to do just that with some example Python code.
There are two main components to the approach outlined here. First, and most importantly is pyasn, a Python-based package that performs the bulk of the work. pyasn can fetch a current RIB from the Route-Views MRT archive, construct a local radix trie, and provide an IP address to origin ASN look up function. Since AS names are not carried in BGP, we need a source that provides ASN to AS name mappings. The second component comes from RIPE who publishes such a list in a relatively easily parsed asn.txt text file. Now it is just a matter of putting these two components together in some code and feeding it some IP addresses.
See prr-asn.py for an example script from which this article is based.
Included with pyasn are two standalone utilities:
pyasn_util_download.py
and pyasn_util_convert.py
. The former can be
used to fetch a current IPv4 and IPv6 MRT-formatted RIB data file (e.g.
pyasn_util_download.py --latestv46
) and the later will convert the
downloaded file into pyasn’s native text-based lookup format. This
example will assume the converted resultant file name is pyasn.dat
.
The example script is designed for Python 3. The pyasn module is
probably the only one the script uses that you may not have by default.
As previously indicated, this approach relies on the native pyasn data
file produced with the conversion tool as well as the ASN to AS name
mapping text file supplied by RIPE. The -a
and -n
arguments can be
used to specify the full file path to those respective files if the
default file names are not located in the current directory. The
example script expects one IP address, either IPv4 or IPv6, per line
from STDIN. The unusual looking files
argument will enable the script
to accept a list of IP addresses from STDIN through a pipe and from
files on the command line:
# to combine argparse and fileinput from stdin:
# https://gist.github.com/martinth/ed991fb8cdcac3dfadf7
#
parser.add_argument('files', metavar='FILE', nargs='*', help='files to read, if empty, stdin is used')
Loading and using the IP address to ASN look up file is left to pyasn and relatively straightforward. The RIPE ASN to AS name mapping file must be read and parsed with a little help from our script. This data file has a relatively simple structure, but we take some care to make sure we get what we expect. The file may contain some surprises such as quoted text within a name. If present, a two-letter ISO country code follows the AS name. The example code does not use it so we simply trim this with a regular expression from the end of the string if present:
line = re.sub( r",\s+[A-Z]{2}\Z", "", line.strip() )
The script performs some sanity checks on the remaining line input. It makes sure we end up with two fields, one for the ASN and one for the AS name. Then it verifies the first field is indeed an integer:
# skip if string is empty
if not line:
continue
try:
# split on the first whitespace
asn, name = line.split(None, 1)
except:
# skip if unexpected input
continue
# make sure first field is only digits
if asn.isdigit() == False:
continue
Some AS names can be quite lengthy. Most of the time only the leftmost portion of the full string is enough for humans to get a sense of which network the ASN belongs to. A length-limited AS name also makes text-based reports easier to read on a typical screen. Therefore we store only the first 30 characters of the name in our ASN to AS name mapping dictionary:
asnames[int(asn)] = name[:30]
Having both the IP address to origin ASN mapping from pyasn and the ASN to AS name mapping from RIPE loaded into memory, we can loop through the IP addresses presented to this example code via STDIN. The origin ASN lookup is relatively straightforward, but we perform some tests to ensure we have the IP address to ASN and ASN to AS name mappings. If we do not, we set the missing mappings to the string “NA” for “not available” by default. Some IP addresses may not have a covering origin route, so an “NA” should be set for both the ASN and AS name. In some rare cases an origin ASN may appear in the RIB, but there is no corresponding AS name associated with the ASN. The example code attempts to handle all these cases:
asn = asndb.lookup(ipaddr)
if asn[0] == None:
asn = "NA"
asname = "NA"
else:
asn = asn[0]
try:
asname = asnames[asn]
except:
asname = "NA"
At this point the only thing remaining to do is output the AS and AS
name mappings for the IP address in the iteration of the STDIN loop.
The formatting is slightly different depending on whether the IP address
is IPv4 or IPv6. Additional field space is reserved for IPv6 addresses.
Also notice that the asn
variable is wrapped in the str()
function.
Unless the value was “NA”, the asn value would be of type integer.
if ':' in ipaddr:
sys.stdout.write("%-37s | %10s | %s\n" % (ipaddr,str(asn),asname))
else:
sys.stdout.write("%-15s | %10s | %s\n" % (ipaddr,str(asn),asname))
Voila! Now compare how quickly you can map IP addresses to ASNs against remote service interfaces. This should prove a reasonable alternative, especially if you have a lot of addresses to map on a recurring basis.
Now a few words of caution. There may be some inherent limitations and even potential problems with the example outlined here. For instance, pyasn does not support multiple origin AS (MOAS). In other words, if multiple networks originate the same covering prefix for an address, pyasn will show only one. MOAS is relatively uncommon, but if seeing all origin ASes matters to you, this could be a serious problem. Additionally, there are at least three service dependencies this code relies upon. First, is the ability to fetch the latest IPv4 and IPv6 MRT RIB from Route-Views. Second, pyasn sees very little development these days. Any changes to Route-Views data formats or file locations may cause pyasn to fail. Third, RIPE may alter their ASN to AS name mapping file process without warning. The good news is that Route-Views and RIPE tend to be very reliable, stable, and conservative in making changes to their services. While past performance is no guaranteee of future success, they both have a track record of avoiding changes that would break all the automated processes people have dreamt up.
I know of at least one colleague that produces much the same result by utilizing Frank Denis' iptoasn data set. This could be used to create an elegant alternative at the possible expense of trading a set of dependencies for a centralized one. There are surely many ways to do what has been described here.
Lastly, I have little doubt the Python experts finding this page will quickly spot ways of improving upon my example code or have their own perfected solution they prefer. I welcome input, suggestions, enhancements, and bug reports. You see, over the past few years I’ve been making a slow migration away from Perl to Python as my language of choice for the majority of tooling needs. Just as I was starting to get pretty good with Perl, I am finding the pull towards Python increasingly necessary as Perl has fallen out of favor. Consequently, this has led me to adapt the quip I used to describe my ability to code in Perl. Now I say, “I don’t know Python, I know combat Python.”