Exim, DNS and rbldnsd

I am very fond of the exim MTA. Its extremely flexible configuration file format may just be Turing-complete, and it allows me to play with my email in just about any way I like.

In one way, though, exim is strangely limited. When I think of the many intersections between email and DNS, I expect an MTA to have many varied ways to access DNS information. Exim uses DNS widely, of course, as it must, but it can only access it the “dog standard” way, through the stub resolver in the C runtime (ie. the gethostbyname library function). It doesn’t use any of the modern alternative DNS libraries such as ares. As a consequence, it is not possible to specify an alternative nameserver, nor an alternative port number for DNS queries.

Another place where I often feel exim slightly deficient is in the choices it offers for storing data sets of IP addresses (usually called lookups in the exim world). Such data sets are needed, to give just the most basic example, for blocking SMTP connections from known active spam bots. There are really only two ways of storing such data built directly into exim: flat files accessed via linear search, and hashed key-value stores such as the legendary Berkeley DB, now owned and maintained by Oracle.

But hashing is not the most natural data structure for IP address data. For most purposes one doesn’t want to store individual addresses but rather CIDR ranges, ie. parts of IP space carved by fixing some number of most significant bits and letting the least significant bits vary. And this shape of data is a great fit for a radix tree structure, or a trie which is bascially a radix tree cleverly optimized by path compression.

corkipset

One library that offers this kind of storage and functions to query it is corkipset. But there are still two problems with getting exim to use it:

  • Unless one’s ready to locally hack on exim code1 it’s necessary to use “dlopening” ie. runtime loading of shared libraries. This has uncomfortable security connotations, as in most installations exim runs setuid root, for rather complex reasons.
  • Because the exim daemon spawns a worker process to handle each email message, and each worker process calls the exec syscall to restart the exim program image from scratch, there can be no permanently accessible data available to the worker processes. Instead each worker process must open the necessary data files anew. This is not a problem for small files and pieces of data but is clearly inefficient for huge data such as spammer block lists, especially when it must be parsed anew every time from a human-readable format.

rbldnsd

rbldnsd is a nice solution to the last problem I wrote above. It is an extremely lightweight daemon which stores sets of CIDR ranges in its working memory and answers queries about them on a socket. But, conveniently or not depending on situation and point of view, the socket is an UDP one and the protocol spoken is DNS. For example, to query if the address 12.34.56.78 is in the dataset, or to get the value associated to it, we must make a DNS A record query for

78.56.34.12.ds.rbl.example.net

where ds.rbl.example.net is the DNS zone for which rbldnsd is made authoritative. The result of this is, if we wanted to query rbldnsd directly from exim, using the native way exim does DNS, we’d have to let rbldnsd use port 53 for its listening socket, and make the listening address the systemwide resolver address, ie. put it on the nameserver line in /etc/resolv.conf. This is clearly impossible for any number of reasons.

Stub zones

Fortunately there is a workaround. Recursive DNS resolver daemons such as unbound have a concept of stub zones. For example, we can configure the ds.rbl.example.net zone to be a stub zone in unbound, which means unbound will delegate to another fixed nameserver any query for the zone if it cannot find the answer in its own cache. The only remaining problem is to find an address and a port for rbldnsd to listen on such that it doesn’t clash with the system resolver. This can be done in a couple of different ways; I have found it easiest to use an IPv6 address, because I have a huge supply of them while I only have a single IPv4 address for the only interface on my VPS (other than loopback).


  1. Hacking exim is not for everyone, it is written in a peculiar style of C 

Comments and patches

Feeds