Lower TTLs for NS and DS records in reverse DNS delegations
Dear colleagues, Users may request reverse DNS delegation by creating "domain" objects in the RIPE Database. Such domain objects must contain "nserver" attributes to specify the name servers for a reverse DNS zone, and may contain "ds-rdata" attributes, to specify delegation signer (DS) records. When the RIPE NCC publishes these records in the appropriate parent zones, the Time to Live (TTL) of all these records is set at 172800 (two days). The TTL of delegation NS records may be overridden by the TTL of NS records from a zone's apex. Alternatively, many large resolvers ignore the TTL values of NS records and cap them at much lower values such as 21600. Finally, there is no way for a zone operator to change the TTL of a DS record, which is only present in a parent zone. Long TTLs can cause problems for users when they want to change their name servers or perform DNSSEC key roll-overs. A long TTL on a DS record is especially harmful when a user needs to do a key roll-over in an emergency. We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600. We welcome feedback or discussion about this, ideally via the DNS Working Group mailing list. If you prefer to send your feedback directly to us, you can email dns@ripe.net. Regards, Anand Buddhdev RIPE NCC
Moin! On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:
We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600. I very much support that and would go even lower for for NS records. Maybe consider 21600 there.
So long -Ralf ——- Ralf Weber
On Mon, 2021-11-29 at 17:07 +0100, Ralf Weber wrote:
Moin!
On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:
We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600. I very much support that and would go even lower for for NS records. Maybe consider 21600 there.
Same. I support this, and I also support lowering NS even further, even to 3600. Kind regards, -- Peter van Dijk PowerDNS.COM BV - https://www.powerdns.com/
I support lowering the TTL on the DS records to 3600. I support lowering the TTL on the NS records - I was going to put my hat in for 21600, but Mr van Dijk's suggestion of 3600 is very enticing. But I liked Mr. Lawrence's suggestion on gathering data on lowering the NS records TTL. Perhaps the TTL can be lowered from 172800 to 86400 to 21600, then to 3600, collecting data along the way. tim On Thu, Dec 2, 2021 at 6:35 AM Peter van Dijk <peter.van.dijk@powerdns.com> wrote:
On Mon, 2021-11-29 at 17:07 +0100, Ralf Weber wrote:
Moin!
On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:
We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600. I very much support that and would go even lower for for NS records. Maybe consider 21600 there.
Same. I support this, and I also support lowering NS even further, even to 3600.
Kind regards, -- Peter van Dijk PowerDNS.COM BV - https://www.powerdns.com/
--
To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/dns-wg
Maybe allow users to set TTL on "domain" object in RIPE database? Allowed values can be constrained to few common lengths of time? The default would be one decided here. This way users would have a choice. Gregory On 02/12/2021 12:49, Tim Wicinski wrote:
I support lowering the TTL on the DS records to 3600.
I support lowering the TTL on the NS records - I was going to put my hat in for 21600, but Mr van Dijk's suggestion of 3600 is very enticing.
But I liked Mr. Lawrence's suggestion on gathering data on lowering the NS records TTL. Perhaps the TTL can be lowered from 172800 to 86400 to 21600, then to 3600, collecting data along the way.
tim
On Thu, Dec 2, 2021 at 6:35 AM Peter van Dijk <peter.van.dijk@powerdns.com> wrote:
On Mon, 2021-11-29 at 17:07 +0100, Ralf Weber wrote: > Moin! > > On 29 Nov 2021, at 12:59, Anand Buddhdev wrote: > > > We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600. > I very much support that and would go even lower for for NS records. Maybe consider 21600 there.
Same. I support this, and I also support lowering NS even further, even to 3600.
Kind regards, -- Peter van Dijk PowerDNS.COM BV - https://www.powerdns.com/
--
To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/dns-wg
gregory> Maybe allow users to set TTL on "domain" object in RIPE gregory> database? Be nice if we could expand to a more sane/standard set of TTLs for NS/DS in TLD/SLDs, which makes this non-functional for a slew of such zones. gregory> Allowed values can be constrained to few common lengths of gregory> time? The default would be one decided here. I made this same comment during the recent DNS-OARC. There is value in a somewhat longer TTL for steady state, to weather temporary glitches in routing (though current values are way too long for even that). When doing provider changes and/or KSK rollovers, a shorter value makes more sense. I think with some testing and actual data, we can come up with a short list of useful values without having to allow unconstrained, random user-picked values that will just create support issues and not really improve the situation.
On Thu, Dec 2, 2021 at 6:35 AM Peter van Dijk <peter.van.dijk@powerdns.com> wrote:
On Mon, 2021-11-29 at 17:07 +0100, Ralf Weber wrote:
Moin!
On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:
We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600. I very much support that and would go even lower for for NS records. Maybe consider 21600 there.
Same. I support this, and I also support lowering NS even further, even to 3600.
Another Aye from me on DS & NS to TTL 3600. I think this will definitely help in DNSSEC deployment as then a mistake is much easier corrected, which thus means more people might deploy DNSSEC. For reverses there is low risk of course, till one realizes that most SMTP servers verify it hard, and missing reverse typically is considered misconfiguration. But especially for the SMTP case, an hour outage is doable, mail will be delayed but will be retried. Greets, Jeroen
On Thu, Dec 02, 2021 at 01:11:17PM +0100, Jeroen Massar via dns-wg wrote:
Same. I support this, and I also support lowering NS even further, even to 3600.
Another Aye from me on DS & NS to TTL 3600.
I'm slightly reminded of the solar activity cycle by another instance of a race to low TTLs, to be followed by another train of thought recommending high (infrastructure RRSet) TTLs in favour of resilience. No objection to Anand's proposal at all, but maybe there are limits to committees finding "optimum" numbers, especially under the impression of a prominent incident. -Peter
Tim Wicinski <tjw.ietf@gmail.com> wrote: > I support lowering the TTL on the DS records to 3600. > I support lowering the TTL on the NS records - I was going to put my > hat in for 21600, but Mr van Dijk's suggestion of 3600 is very > enticing. > But I liked Mr. Lawrence's suggestion on gathering data on lowering the > NS records TTL. Perhaps the TTL can be lowered from 172800 to 86400 to > 21600, then to 3600, collecting data along the way. Aside from performance/load on the distributing name servers, what else would you collect? Or perhaps I am asking: What is your hypothesis, and what kind of data do you need to prove/disprove it? -- Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works -= IPv6 IoT consulting =-
Anand Buddhdev writes:
We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600.
I am in favor of this change. I'd also like if the change was accompanied by measurements of the effect on the relevant authoritative nameservers to determine whether it would be reasonable to reduce the NS TTL even further.
On 11/29/21 10:55 PM, Dave Lawrence wrote:
I am in favor of this change. I'd also like if the change was accompanied by measurements of the effect on the relevant authoritative nameservers to determine whether it would be reasonable to reduce the NS TTL even further.
For folks interested in measurement the impact of TTL changes, we did two studies in the past: In [0] we look into the trade-offs between long and short TTLs. You can skip the measurements details and go to Section 6 for the discussion on pros and cons of longer/shorter TTLs. (closely related to what folks are posting to this thread). In [1] we look into the impact of TTLs and caching while auth servers suffer DDoS. -- /giovane SIDN Labs [0] https://www.isi.edu/~johnh/PAPERS/Moura19b.pdf [1] https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf
On 29. 11. 21 12:59, Anand Buddhdev wrote:
Dear colleagues,
Users may request reverse DNS delegation by creating "domain" objects in the RIPE Database. Such domain objects must contain "nserver" attributes to specify the name servers for a reverse DNS zone, and may contain "ds-rdata" attributes, to specify delegation signer (DS) records.
When the RIPE NCC publishes these records in the appropriate parent zones, the Time to Live (TTL) of all these records is set at 172800 (two days).
The TTL of delegation NS records may be overridden by the TTL of NS records from a zone's apex. Alternatively, many large resolvers ignore the TTL values of NS records and cap them at much lower values such as 21600. Finally, there is no way for a zone operator to change the TTL of a DS record, which is only present in a parent zone.
Long TTLs can cause problems for users when they want to change their name servers or perform DNSSEC key roll-overs. A long TTL on a DS record is especially harmful when a user needs to do a key roll-over in an emergency.
We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600.
We welcome feedback or discussion about this, ideally via the DNS Working Group mailing list. If you prefer to send your feedback directly to us, you can email dns@ripe.net.
I think lowering both TTLs is a step in right direction, but let me ask provocative question: Why not make the TTL _dynamic_, based on time of last change in the RIPE database? Here is a wild example how it could work - all constants are made up, feel free to substitute your own: Step 1: Define upper bound for NS & DS TTLs which are "stable". Say 1 day for both NS and DS. Step 2: At the moment when someone updates NS or DS, lower respective TTL to 1 minute. Step 3: Cycle: Step 3a: If there was no update to the record in the last 1 hour, double the respective TTL. Repeat until defined upped bound is reached. -> Go to Step 3 Step 3b: If there _was_ another update, reset TTL to 1 minute and reset the timer. -> Go to Step 3 If the upper bound was 1 hour then the maximum would be reached in ~ 6 steps (6 hours since the change was introduced). 1 day TTL would be reached in 11 steps, i.e. 11 hours. I think something like this would provide best of both worlds: - Quick turnaround around changes and potential problems. Most problems happen right after change, in which case even 1 hour is PITA. - Automatic TTL adjustment of "stable" records lowers load on servers and improves reliability when outages in the DNS infrastructure happen. - Even if the delegation was hijacked (unlikely for reverse zone, so here just to illustrate) the lower TTL would help fixing it/pointing it back to the rightful owner. What do you think? It seems so simple that I now have to wonder why registries are not doing it? -- Petr Špaček @ Internet Systems Consortium
On 12/2/21 2:46 PM, Petr Špaček wrote:
Why not make the TTL _dynamic_, based on time of last change in the RIPE database?
Here is a wild example how it could work - all constants are made up, feel free to substitute your own:
Step 1: Define upper bound for NS & DS TTLs which are "stable". Say 1 day for both NS and DS.
Step 2: At the moment when someone updates NS or DS, lower respective TTL to 1 minute.
Step 3: Cycle: [...] What do you think? It seems so simple that I now have to wonder why registries are not doing it?
One problem I see is that if you change or add NS/DS records, and the TTL is set to a low value without your active participation, you can no longer figure out for how long old values (pre-change) are cached somewhere, so you don't know when stale stuff will globally expire. But knowing this may be relevant in some recovery scenarios. For example, if you remove a DS record and throw away the corresponding key, and later realize that this was an error, you will see a DS TTL on the order of a minutes. That may make you think that it would not be worth recovering the old key from the backup, and that it would be better to create a new key pair and deploy it (including the DS). Unfortunately, that won't work, because resolvers may have cached the old values for a time period that you can't determine in hindsight. Only if modifying the TTL would be an explicit step, you could know this (by first looking, then changing).* So it seems to me that explicit is better than implicit (as usual). If communication channels for that are missing (e.g in EPP), perhaps that's what the actual problem is? * One could keep a history of TTL values somewhere, but that seems overengineered. Thanks, Peter
On 02/12/2021 14:46, Petr Špaček wrote:
Why not make the TTL _dynamic_, based on time of last change in the RIPE database?
I belive this is an interesting approach, however requires that same logic will be applied to all users on server side. The question arises if same logic fits all users. When TTL can be set by a user on "domain" object in RIPE database then this logic is controlled by a user, what allows different users to have different strategies, allowing for flexible approach. Gregory Brzeski
- Quick turnaround around changes and potential problems. Most problems happen right after change, in which case even 1 hour is PITA.
One hour should then be your upper (stable) limit. From experience I know DNS problems can occur anytime anywhere unplanned, not just after a change in the RIPE DB.
On 2 Dec 2021, at 13:46, Petr Špaček <pspacek@isc.org> wrote:
Why not make the TTL _dynamic_, based on time of last change in the RIPE database?
Because it’s a very bad idea? 1) The RIPE database and its reverse zone DNS data are orthogonal things (modulo the nameserver objects for bits of the reverse tree). These two different things shouldn’t get linked in this way. There needs to be a clean and clear separation between the two. If they get entangled, the outcome will be painful for everyone. 2) It imposes (IMO unwanted) operational requirements on the database -- uptime, availability, extra tooling, new processes, opportunites for adding cruft, etc -- unrelated to the database's prime function. 3) Changes to the RIPE database for some reverse zone do not necessarily mean changes to that zone’s DS TTLs or the LIR’s DNSSEC policies. Anyways, to get back on topic I think it would be better to discuss TTL values for NS and DS records based on solid engineering. At present, we seem to be plucking numbers out of the air based on gut feel. Simply saying “I think the TTL should be X” is not helpful when without some justification for choosing X - or why X is better than Y - or an explanation of the operational impacts. Anand and his colleagues have identified an issue. But I’m not convinced his proposal is the right one. LIRs may well have good reasons for choosing TTLs for NS and DNSKEY RRs that are higher or lower than the defaults that are being proposed. I think this needs careful WG consideration: unintended consequences and all that.
Disclaimer: I agree with everyone in this thread that explicit is better than implicit, and that auto-magic is much worse than operators lowering their TTL in time and then setting it back when they are done. Of course, RIPE NCC can be a pioneer among registries and expose TTL to domain admins. In that case I will sit silently and watch how it goes. Rest of this e-mail applies only to situation when explicit TTL configuration is not possible or practical. ---- Further musing about dynamic TTL below: ---- ---- Ignore if explicit TTL control is introduced ---- This ideal IMHO has several practical problems: - In my (admittedly limited, anecdotal) experience most operators do not lower their TTLs before doing changes, and then when problems happen they are in a trap. Maybe RIPE NCC's audience would be significantly better in that respect, who knows. We cannot have data about that without exposing the explicit TTL knob. - It does not work at all CDS/CDNSKEY automation because AFAIK there is no way for child to signal desired TTL to the parent. One could argue that CDS/CDNSKEY should have lower risks so it might not be necessary. - In my (again admittedly limited, anecdotal) experience registries do not _want_ to expose interface to change TTLs (for various reasons). Another angle how to look at this is that explicit manual configuration, while theoretically the best, very much resembles the way how DNS was done in 1980s and not operational reality of 2020s. Manual and error prone processes are being replaced with automatic everywhere, and DNS should not be an exception. In other words, I agree with purists on the theoretical level: Static and explicit TTLs are perfect for world full of cooperating DNS experts and registries, but I don't believe we are in this ideal world. And if the "explicit" option not practical for any reason, we are left either with static or dynamic "defaults" imposed by the registry. Pick you poison then. On 02. 12. 21 15:37, Jim Reid wrote:
On 2 Dec 2021, at 13:46, Petr Špaček <pspacek@isc.org> wrote: Why not make the TTL _dynamic_, based on time of last change in the RIPE database?
Because it’s a very bad idea?
1) The RIPE database and its reverse zone DNS data are orthogonal things (modulo the nameserver objects for bits of the reverse tree). These two different things shouldn’t get linked in this way. There needs to be a clean and clear separation between the two. If they get entangled, the outcome will be painful for everyone.
Except that they already are entangled. You cannot plausibly claim they are orthogonal if DS & NS records read from the database and used to generate zone data. (I'm not database expert of course, but that's my understanding.)
2) It imposes (IMO unwanted) operational requirements on the database -- uptime, availability, extra tooling, new processes, opportunites for adding cruft, etc -- unrelated to the database's prime function.
I don't think so. The database already has "changelog", and there already has to be a component which generates zone data from the relevant fields in the database. Whatever theoretical logic for dynamic TTLs would belong to this "database->zone translation layer".
3) Changes to the RIPE database for some reverse zone do not necessarily mean changes to that zone’s DS TTLs or the LIR’s DNSSEC policies.
Agreed. I'm theorizing about the case where "registry" does not want to expose TTL configuration directly.
Anyways, to get back on topic I think it would be better to discuss TTL values for NS and DS records based on solid engineering. At present, we seem to be plucking numbers out of the air based on gut feel. Simply saying “I think the TTL should be X” is not helpful when without some justification for choosing X - or why X is better than Y - or an explanation of the operational impacts.
Anand and his colleagues have identified an issue. But I’m not convinced his proposal is the right one. LIRs may well have good reasons for choosing TTLs for NS and DNSKEY RRs that are higher or lower than the defaults that are being proposed. I think this needs careful WG consideration: unintended consequences and all that.
Let's be honest here. TTLs are _always_ wrong: Either too long when you need to do a change, or too short when there is an outage and long TTLs would have helped to paper over it :-) -- Petr Špaček
Hi Petr,
I think lowering both TTLs is a step in right direction, but let me ask provocative question:
Why not make the TTL _dynamic_, based on time of last change in the RIPE database?
Because explicit is better than implicit. Magically calculated dynamic values rarely match operational expectations :) Cheers, Sander
Anand, On 29/11/2021 12.59, Anand Buddhdev wrote:
We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600.
I support this change. I would also support: - Setting the TTL on NS records even lower. As a data point, at least one ccTLD (.CL) already has a 3600 TTL on NS records. - Allowing LIR to set their own TTL on NS/DS records explicitly. - Allowing LIR to request that the TTL on NS/DS records get set from the child (either from the TTL on the NS in the child's servers or from the CDS/CDNSKEY TTL). - Allowing LIR to choose from a set of pre-defined TTL, as suggested by Gregory Brzeski. - Adopting a back-off algorithm as suggested by Petr Špaček. Cheers, -- Shane
participants (17)
-
Anand Buddhdev
-
Dave Lawrence
-
Giovane C. M. Moura
-
Gregory Brzeski
-
Jeroen Massar
-
Jim Reid
-
Michael Richardson
-
Michiel Klaver
-
Paul Ebersman
-
Peter Koch
-
Peter Thomassen
-
Peter van Dijk
-
Petr Špaček
-
Ralf Weber
-
Sander Steffann
-
Shane Kerr
-
Tim Wicinski