guinix international

Computing Without Borders

 

guinix TechNote: DSL in Portland, Part 1

Fixup for DNS service behind flawed NAT on Cisco 678 router

Introduction

We recently relocated to Portland, Oregon after several years of various international migrations, most recently in Uganda, East Africa.

The first order of business, of course, was setting up a broadband connection and moving all of our services and content off the pathetic 32K link we had in Kampala. A quick call to Qwest got us a DSL-provisioned line in a matter of days. We dug our Cisco 678 DSL router out of mothballs and configured it exactly as we had in Butte. In no time at all, we were good to go.

We setup our main server on a little Soekris box running OpenBSD and the djb way, and watched the multilogs hum as the world quickly caught up to us at our new home.

Aarrgh: Cisco 678 NAT breaks DNS!

As with our setup in Butte, we had the Cisco perform network address translation (NAT) for our single static public IP address to/from the internal network. The relevant CBOS (version 2.4.6) commands for this configuration are:

set nat enable
set nat entry add 192.168.0.254 25 0.0.0.0 25 tcp
set nat entry add 192.168.0.254 53 0.0.0.0 53 udp
set nat entry add 192.168.0.254 80 0.0.0.0 80 tcp
write
reboot

Now all outbound traffic from the local network is made to appear as if coming from the single public IP address, and all incoming traffic for SMTP, DNS, and HTTP services are redirected to an internal server. Since we were quickly online and getting all of our mail and web traffic as usual, we had no reason to suspect anything was amiss.

Then we added some DNS records pointing to the server running back in Kampala, to continue support for some of the services running under the "guinix.com" domain there. The tinydns database for the "guinix.com" domain included these entries:

# tinydns data
# public dns records
# ===
##
## guinix.com domain:
.guinix.com:209.180.174.155:dns1.guinix.com:259200
.guinix.com:209.180.174.155:dns2.guinix.com:259200
@guinix.com:209.180.174.155:mailhub.guinix.com:0:86400
=nimba.guinix.com:209.180.174.155:86400
+guinix.com:209.180.174.155:86400
+www.guinix.com:209.180.174.155:86400
## arc-nile support:
+arc-nile.guinix.com:216.104.202.70:86400
+sudan.guinix.com:216.104.202.70:86400

The "arc-nile" and "sudan" records point to the server in Kampala, while the other entries point to our IP address in Portland. Using the dnsq utility, a quick test of the tinydns server from within the local network gives the expected results:

$ dnsq any sudan.guinix.com 192.168.0.254
255 sudan.guinix.com:
120 bytes, 1+1+2+2 records, response, authoritative, noerror
query: 255 sudan.guinix.com
answer: sudan.guinix.com 86400 A 216.104.202.70    <--ok
authority: guinix.com 259200 NS dns1.guinix.com
authority: guinix.com 259200 NS dns2.guinix.com
additional: dns1.guinix.com 259200 A 209.180.174.155
additional: dns2.guinix.com 259200 A 209.180.174.155

But when we performed the same test query, directly to the same tinydns server, from a host outside our network, we were surprised to see this:

$ dnsq any sudan.guinix.com 209.180.174.155
255 sudan.guinix.com:
120 bytes, 1+1+2+2 records, response, authoritative, noerror
query: 255 sudan.guinix.com
answer: sudan.guinix.com 0 A 209.180.174.155    <--huh?
authority: guinix.com 0 NS dns1.guinix.com
authority: guinix.com 0 NS dns2.guinix.com
additional: dns1.guinix.com 0 A 209.180.174.155
additional: dns2.guinix.com 0 A 209.180.174.155

That is, the results to queries coming from outside the network have been altered:

  • all the TTL fields have been set to zero
  • the IP address for "sudan" has mysteriously changed!

In each case, both inside and outside the network, dnsq queries the same tinydns service directly and reports no errors. Yet the answers are different, and responses we got from others testing for us from their own networks reported the same anomaly. (And double checking with the dig utility from the two locations also reported the same differences.)

What's happening?

At first we never even considered the NAT on the Cisco. After all, Cisco should get NAT right, right? And here the actual payload inside the packets was being altered, and in some systematic, non-random way, not just the packet headers. We figured something on the Qwest network was messing with us instead.

So we googled and googled. Gradually our suspicions were directed toward the Cisco after all.

Workaround

It turns out the NAT implementation on the Cisco 678 is broken. It actually gets in and rewrites data within the packets, not just the headers. In fact, it even makes a special effort to do this on DNS packets, that is, packets routed on port 53.

Strangely, if a DNS query is NAT'ed to a port other than 53, the data returned inside the packet is undisturbed!

Fortunately, this inexplicable behavior leads us to a workaround. Setup a static NAT entry to redirect incoming port 53 traffic to some other port, say 5300. Then configure the dns server on the internal network to listen on this other port.

An additional complication with tinydns, though, is that tinydns is hardwired to listen on port 53. But rather than patch and compile a special version of tinydns to listen on another port, we setup a redirect rule in the packet filter running on the internal gateway, pointing incoming port 5300 traffic back to port 53.

Yes, that's a few twists and turns for a DNS packet:

query: 209.180.174.155:53
  |
  v
  cisco nat: 192.168.0.254:5300
    |
    v
    gateway rdr: 192.168.0.254:53
      |
      v
      tinydns

It might be better to describe the configuration exactly. First, on the Cisco router, the CBOS commands for the NAT configuration with emphasis added to the modified NAT entry for DNS:

set nat entry delete all
set nat entry add 192.168.0.254 25 0.0.0.0 25 tcp
set nat entry add 192.168.0.254 5300 0.0.0.0 53 udp
set nat entry add 192.168.0.254 80 0.0.0.0 80 tcp
write
reboot

Next, on the internal gateway running the PF packet filter on OpenBSD 3.5, the redirect ("rdr") rule for the second piece of our DNS/NAT fixup:

rdr on $ext_if proto udp from any to 192.168.0.254 port 5300 \
  -> 192.168.0.254 port 53

This gets DNS packets pointed back at the tinydns service listening on port 53.

To be sure, an ugly workaround. But it works. Responses returned to queries from remote hosts are correct, with all DNS fields intact, including the TTL data and IP addresses as specified in the tinydns record set.

Conclusions

The issue described here is not limited to tinydns. Any DNS server behind a Cisco 67x router with NAT enabled will be similarly affected, including bind.

The problem is in the NAT implementation in the Cisco 67x series devices. As the 67x series has long been discontinued now, we don't expect the problem will ever be corrected with an updated CBOS release. And so we are left with the type of workaround as described here.

As a matter of record, we first became suspicious of the NAT implementation on the Cisco when we encountered this passage in the CBOS documentation itself (emphasis added):

Network Address Translation is predominantly application-independent. Applications that include IP addresses within the packet payload will fail without special NAT-wise consideration...

We don't know what they mean by "special consideration" here (perhaps RFC 2694?). But, evidently, Cisco seems to consider corruption of packet data a feature.

As for us, we want a NAT that doesn't dick with our data. So we have since bypassed the use of Cisco's NAT entirely. By configuring the Cisco as a bridge and using PPPoE, we now do all packet filtering and network address translation on an OpenBSD host with the excellent PF packet filter, for a system we know and can trust. See our TechNote DSL in Portland, Part 2 for more information.

Unsuspectingly, we ran a tinydns service behind a Cisco 678 on our DSL connection in Butte for months, without the slightest clue there was ever any problem.

That's because, as long as we were only serving responses with our single public IP address, incoming queries did in fact receive correct name service resolution. Never mind that the zero TTLs may have created inefficiencies in client caching: DNS lookups did resolve to our IP address.

Now that we are aware of the problem, we suspect that many others may be similarly affected. Yet, like us, they probably have no reason to be aware of it. A lot of Google searching provided us with only one explicit reference to this issue.

This TechNote is another. If you have any DNS service behind a Cisco 67x router, be sure to test it from a host outside your own network. It may not be serving the answers you are expecting!


Copyright © 2002 - 2005, Wayne Marshall. All rights reserved.
Last edit 2005.03.07, wcm.