Solving lameness in the reverse zones

Shane Kerr

22 Oct 2009 22 Oct '09

7:58 p.m.

All, DNS Lameness Philosophy ----------------------- There are two fundamentally different philosophies about lameness in reverse DNS. The first (mine): DNS misconfiguration - lameness in this case - is bad when it causes operational problems for a human being somewhere. This may be a network administrator, or a system administrator, or an end user. The second: Lameness is bad because the maintainers of the zone data (the RIPE NCC and the administrator of the reverse domain) have a responsibility to keep this information correct. Proposals Based on Reducing User Pain ------------------------------------- Because I wrongly assumed that there was only one way to think about lameness, I made a set of proposals to how we deal with lameness in the reverse delegation at the RIPE meeting in Lisbon: 1. Stop mass-mailing to lame delegations 2. Include lameness information in an annual report to LIRs (possibly a good thing to have along with the annual bill anyway) 3. Perhaps check for the lame delegations causing the worst problems and send targeted mails based on those These suggestions were based on my philosophy that lameness is bad only inasmuch as it causes problems for actual human beings. So, that is one set of proposals. Proposal Based on DNS as a Platonic Form ---------------------------------------- I thought about it, and if we as a community decide that the 2nd philosophy is more appropriate, then our current strategy of asking people nicely to please fix the problems is ridiculous. A lame delegation is at best worthless, and often harmful. We can find lame delegations with very high certainty. The delegation is the responsibility of the parent as well as the child. So, I propose we modify the current process to work something like this: 1. Tell users that their delegations are lame. 2. Wait, then tell them again if not fixed. 3. Wait, then PULL THE DELEGATION if not fixed. If having clean data is important, lets stop with these half-measures, and do it right. Speaking only if we decide that we believe in clean data as a goal, of course. :) -- Shane

Show replies by date

Havard Eidnes

22 Oct 22 Oct

11:59 p.m.

...

So, I propose we modify the current process to work something like this:

1. Tell users that their delegations are lame. 2. Wait, then tell them again if not fixed. 3. Wait, then PULL THE DELEGATION if not fixed.

One interpretation could be "pull the delegation to the lame name server, but leave the working ones in place". Do note, though, that if the zone itself still lists the lame server in its NS RRset, that RRset will override the NS RRset received from the delegating zone, since the latter is non- authoritative information, and recursive name servers may think it's a good idea to validate the NS RRset from one of the authoritative name servers. So... It's not a given that removing the delegation record for the lame name server will actually make much of a difference. Or perhaps you meant "remove the entire delegation"? It sounds kind of drastic... Regards, - Håvard

Michael Graff

23 Oct 23 Oct

12:22 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Havard Eidnes wrote:

...

...
So, I propose we modify the current process to work something like this:

1. Tell users that their delegations are lame. 2. Wait, then tell them again if not fixed. 3. Wait, then PULL THE DELEGATION if not fixed.

One interpretation could be "pull the delegation to the lame name server, but leave the working ones in place".

Or, notify and stop. Are lame delegations really such a problem that we need to take drastic action? How often would this be checked, and how much liability would be here in modifying what is published? I'd hate to see the papers when a large content provider lost due to a temporary outage on one name server... - --Michael -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkrg2ywACgkQ+NNi0s9NRJ2LdQCfacCLIVOMjBBMxw4anm0soiiq TgUAnRGX99ldmU3MPMn1BdIA8h+ZcLpB =GD/r -----END PGP SIGNATURE-----

Shane Kerr

12:11 p.m.

Michael, On Thu, 2009-10-22 at 17:22 -0500, Michael Graff wrote:

...

Are lame delegations really such a problem that we need to take drastic action? How often would this be checked, and how much liability would be here in modifying what is published? I'd hate to see the papers when a large content provider lost due to a temporary outage on one name server...

This approach is based on the assumption that lameness is in itself a problem, and must be fixed. Also note that we are talking about the reverse DNS in this case, so nobody is going to lose revenue if we a delegation is removed. -- Shane

Matus UHLAR - fantomas

1:57 p.m.

...

On Thu, 2009-10-22 at 17:22 -0500, Michael Graff wrote:

...
Are lame delegations really such a problem that we need to take drastic action? How often would this be checked, and how much liability would be here in modifying what is published? I'd hate to see the papers when a large content provider lost due to a temporary outage on one name server...

On 23.10.09 12:11, Shane Kerr wrote:

...

This approach is based on the assumption that lameness is in itself a problem, and must be fixed.

Also note that we are talking about the reverse DNS in this case, so nobody is going to lose revenue if we a delegation is removed.

well, there are still servers and services requiring reverse resolution... -- Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. REALITY.SYS corrupted. Press any key to reboot Universe.

Shane Kerr

12:09 p.m.

Håvard, Thanks for your reply. Clearly some additional thinking is necessary... On Thu, 2009-10-22 at 23:59 +0200, Havard Eidnes wrote:

...

...
So, I propose we modify the current process to work something like this:

1. Tell users that their delegations are lame. 2. Wait, then tell them again if not fixed. 3. Wait, then PULL THE DELEGATION if not fixed.

One interpretation could be "pull the delegation to the lame name server, but leave the working ones in place".

Yes, that is the idea I was going for. Apologies for being unclear.

...

Do note, though, that if the zone itself still lists the lame server in its NS RRset, that RRset will override the NS RRset received from the delegating zone, since the latter is non- authoritative information, and recursive name servers may think it's a good idea to validate the NS RRset from one of the authoritative name servers. So... It's not a given that removing the delegation record for the lame name server will actually make much of a difference.

You bring up a very good point. AFAIK the lameness checking at the RIPE NCC only looks at things from the parent point of view. There is a different class of error, which you touch on here, which is mismatch between parent NS RRSET and child (authoritative) NS RRSET. This has not been discussed. NS RRSET Mismatches ------------------- A mismatch can be one of three types: 1. NS in parent not in child 2. NS in child not in parent - server is not lame 3. NS in child not in parent - server is lame The first case is a sort of lameness, and actually quite easily detected. I think that it can be covered exactly as any other sort of lameness. (It is possible for a name server listed in the parent to answer correctly even though it is not listed in the NS set of the child; this may happen during a migration for example. I don't think that affects this discussion, but I thought I would mention it.) The second case is not lameness, but is an incorrectness at the parent. Again, if data accuracy is our goal (and for this proposal we assume that it is), then we must fix it, somehow. I propose the same algorithm as for removing lame delegations: warn, warn, update. Except in this case "update" means adding the appropriate NS. The third case is the tricky one. We have no good solution here. If we cared about user experience, then we would eliminate the NS from the parent RRSET, because that will result in a slightly better average query pattern. However, we do not care about users, we care about data, so it is difficult to say what the best way forward is. (See more below.)

...

Or perhaps you meant "remove the entire delegation"? It sounds kind of drastic...

It is drastic, but in the 3rd case we have no good options. Since we care about data accuracy, we may need drastic measures. We have two possible approaches: * We follow the normal lameness process for the lame server: warn, warn, delete. Then we must spam the administrator every time we re-run our check until the zone is fixed. Yes, it is annoying and not likely to get things fixed, but for the sake of the data, it is necessary that we try. * Otherwise, yes, we simply remove the entire delegation. One could argue that "we have killed the patient to cure the disease", but please keep in mind that data consistency is the goal. If I was the administrator for the child zone, I would actually prefer the second option (spam is annoying). It is also better because it results in a correct parent zone. But I leave it up to the working group to decide. Thankfully there is no glue in the reverse tree, so we can ignore that class of mismatch. :) But I am reminded of another missing point in our quest for correctness: NS with partial lameness. NS with Partial Lameness ------------------------ In this case, we have something like this: 2.0.192.in-addr.arpa NS ns1.example.net. ns2.example.net. ns1.example.net A 192.0.2.0 ; working server A 192.0.2.1 ; broken server What we have here is lameness caused by a NS record with multiple addresses, only some of which are answering properly. Since we have no control over this NS to A/AAAA mapping, we have the same options as case #3 above: we can pester continuously or we can pull the entire delegation. Reading my proposals here, one might get the idea that I don't support the idea of data correctness as the correct philosophy for DNS lameness checking. You are correct. In a sense, this is a sort of reducto ad absurdum discussion: http://en.wikipedia.org/wiki/Reductio_ad_absurdum If you begin with the premise that data quality is important as an end goal, rather than starting with the premise that data quality is important only when it helps people, you have no way to measure when a technique for improving data quality is simply not worth the bother. HOWEVER, I do accept the possibility that the community may say "damn the torpedoes, full speed ahead!"(*) If we're going to go for data quality, lets not be half-assed(**), lets get it right this time. :) -- Shane (*) Excuse the Americanism, but it seems somehow appropriate: http://tinyurl.com/damn-the-torpedoes (**) Another Americanism, also equally appropriate IMHO: http://www.urbandictionary.com/define.php?term=half-assed

Wilfried Woeber, UniVie/ACOnet

10:44 a.m.

Shane Kerr wrote:

...

All,

DNS Lameness Philosophy ----------------------- There are two fundamentally different philosophies about lameness in reverse DNS.

Well, my perception is that reverseDNS is not well-understood by a big portion of all the players involved. On top of that, the built-in redundancy and resilience of DNS tend to hide structural problems from the "consumers" till the very last moment. Having many organisations out-source management of network services cerainly does not help. My feeling is that we should consider an information campaign first, and then think about technical measures how we can improve the service. Btw, the same problem is there on the lower levels of the delegation chains. Sigh.... Wilfried.

Shane Kerr

12:14 p.m.

Wilfried, On Fri, 2009-10-23 at 08:44 +0000, Wilfried Woeber, UniVie/ACOnet wrote:

...

Shane Kerr wrote:

...
All,

DNS Lameness Philosophy ----------------------- There are two fundamentally different philosophies about lameness in reverse DNS.

Well, my perception is that reverseDNS is not well-understood by a big portion of all the players involved.

On top of that, the built-in redundancy and resilience of DNS tend to hide structural problems from the "consumers" till the very last moment.

In my mind, this means that there is no problem, and that most attempts to solve the non-existent problem are doomed to be more expensive than the benefit they bring. I think my lightweight proposal (include lameness in an annual report about the state of delegations per LIR, notify the top offenders) is the right way forward. It should provide the most benefit for the least annoyance. -- Shane

6003

Age (days ago)

6004

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

Havard Eidnes
Matus UHLAR - fantomas
Michael Graff
Shane Kerr
Wilfried Woeber, UniVie/ACOnet