Dear colleagues, RIPE document 203 "Recommended for DNS SOA values" gives recommendations for the values of the SOA record used in simple and stable zones. The document has been aimed at beginner level DNS administrators, to give guidance when creating a zone file. <https://www.ripe.net/publications/docs/ripe-203> The document has been successful, and despite its age, it is still being used and referred to today. Over time the document has aged and the recommendations given in the text are not a good match for today's Internet DNS anymore. Some while ago Peter Koch and I volunteered in the RIPE DNS WG to update the document. In my work (DNS training, consulting and support), I get occasional requests for recommendations for SOA values to use. This shows that such a document is still useful today. Before starting the process of wordsmithing the entire text, we would like to discuss the new values here in the mailing list. The aim for the document is not to provide the one and only set of "correct" values (there are many), but a set of values that are "not wrong" in most DNS use cases. Also keep in mind that the document should be of help for novice DNS administrators with relative simple zones (less than 1000 RRs, no more than 1 change/week). Large zones, very agile zones with dynamic changes, specialty zones like Active Directory zones are out of scope of the document. Like the original document, we present one set of values, not ranges. The document should be a "as simple as possible" starting point for new DNS admins. Copy and pasting is encouraged. As Roland v. Rijswijk mentioned in his talk at SHA 2017 last weekend (recommendation --> "OpenINTEL: digging in the DNS with an industrial size digger" https://media.ccc.de/v/SHA2017-130-openintel_digging_in_the_dns_with_an_indu...) the Internet is build on the premise of de-centralization. To support DNS operators today that like to run their own DNS infrastructure (instead going central in "the cloud"), the entry barrier for setting up a simple but working DNS zone should be as low as possible. The original SOA values for RIPE 203: example.com. 3600 SOA dns.example.com. hostmaster.example.com. ( 1999022301 ; serial YYYYMMDDnn 86400 ; refresh ( 24 hours) 7200 ; retry ( 2 hours) 3600000 ; expire (1000 hours) 172800 ) ; minimum ( 2 days) the new proposed and updated values $TTL 3600 example.com. 3600 SOA dns.example.com. hostmaster.example.com. ( 2017080101 ; serial YYYYMMDDnn 7200 ; refresh ( 2 hours) 1800 ; retry ( 30 minutes) 3600000 ; expire (1000 hours) 3600 ) ; minimum/negative TTL ( 1 hour) One observation from the past years is that in situations where the time required for an DNS change is longer that the average working day (8 hours), the change is more likely to fail. Operators get distracted and abandon the change or pick up the change at a later day having lost the context of the original intend for the change. The new values are chosen in a way that most changes to the DNS zone and infrastructure can be done in one work day. Lower values will cause more DNS queries and more load on the DNS infrastructure (resolver and authoritative server), but recent experiments ("How the TTL reducing impacted the .cz zone" https://en.blog.nic.cz/2017/05/18/how-the-ttl-reducing-impacted-the-cz-zone/) has shown that the effects are manageable. Some changes to the DNS infrastructure (like change of delegation information) depends on the TTL values used in the parent domain, but we also see a trend to use lower TTL values in the parent zones as well ("Keeping DNS Parents and Children in Sync at Internet Speed!, Olafur Gudmundsson, Cloudflare" https://ripe70.ripe.net/presentations/51-RIPE-20150309-cf-DNS.pdf). Given these constraints, we would like to get feedback from the mailing list on the new values: * do you see potential issues with the proposed values for zones that are in scope of the document? Best regards Carsten Strotmann
On 11 Aug 2017, at 5:40, Carsten Strotmann wrote:
The original SOA values for RIPE 203:
example.com. 3600 SOA dns.example.com. hostmaster.example.com. ( 1999022301 ; serial YYYYMMDDnn 86400 ; refresh ( 24 hours) 7200 ; retry ( 2 hours) 3600000 ; expire (1000 hours) 172800 ) ; minimum ( 2 days)
the new proposed and updated values
$TTL 3600 example.com. 3600 SOA dns.example.com. hostmaster.example.com. ( 2017080101 ; serial YYYYMMDDnn 7200 ; refresh ( 2 hours) 1800 ; retry ( 30 minutes) 3600000 ; expire (1000 hours) 3600 ) ; minimum/negative TTL ( 1 hour)
The new values seem fine, and should not cause strain to an authoritative server unless the zone's number of NXDOMAIN queries is massively mis-matched with the capabilities of the server. Dropping the retry value down further seems reasonable, maybe to 5 minutes. You always want your secondaries to have fresh data. If you have secondaries that are having problems contacting you, you have an operational problem. Maybe add some text to the new version explaining why this number is lower and suggesting that the watch the logs on their secondaries for failures to refresh. The idea of matching the negative TTL to the SOA TTL makes good sense, and certainly is better than having a huge negative TTL. Adding the "$TTL 3600" is a great addition. If you can add text about the semantic differences between the three 3600 values, that would be very useful. --Paul Hoffman
Dropping the retry value down further seems reasonable, maybe to 5 minutes. You always want your secondaries to have fresh data.
While I agree with the latter, I don't agree that's the preferred way to do this. DNS Notify usually accomplishes the goal of keeping your slaves to have up-to-date data. Regards, - Håvard
On 11 Aug 2017, at 9:41, Havard Eidnes wrote:
Dropping the retry value down further seems reasonable, maybe to 5 minutes. You always want your secondaries to have fresh data.
While I agree with the latter, I don't agree that's the preferred way to do this. DNS Notify usually accomplishes the goal of keeping your slaves to have up-to-date data.
From RFC 1035: RETRY A 32 bit time interval that should elapse before a failed refresh should be retried. My reading of this is that if a secondary is doing a refresh (either based on a timer or on a Notify) and it fails, it should try again in that many seconds. If so, you would still want a short retry value. Is that how others see the value? --Paul Hoffman
Hello Paul, Paul Hoffman writes:
On 11 Aug 2017, at 5:40, Carsten Strotmann wrote:
The original SOA values for RIPE 203:
example.com. 3600 SOA dns.example.com. hostmaster.example.com. ( 1999022301 ; serial YYYYMMDDnn 86400 ; refresh ( 24 hours) 7200 ; retry ( 2 hours) 3600000 ; expire (1000 hours) 172800 ) ; minimum ( 2 days)
the new proposed and updated values
$TTL 3600 example.com. 3600 SOA dns.example.com. hostmaster.example.com. ( 2017080101 ; serial YYYYMMDDnn 7200 ; refresh ( 2 hours) 1800 ; retry ( 30 minutes) 3600000 ; expire (1000 hours) 3600 ) ; minimum/negative TTL ( 1 hour)
The new values seem fine, and should not cause strain to an authoritative server unless the zone's number of NXDOMAIN queries is massively mis-matched with the capabilities of the server.
Dropping the retry value down further seems reasonable, maybe to 5 minutes. You always want your secondaries to have fresh data. If you have secondaries that are having problems contacting you, you have an operational problem. Maybe add some text to the new version explaining why this number is lower and suggesting that the watch the logs on their secondaries for failures to refresh.
We'll consider this. Care must be taken that once a server is not reachable because of too much traffic, a too low RETRY value might make things worse. But I agree it is preferrable to have fast recovery.
The idea of matching the negative TTL to the SOA TTL makes good sense, and certainly is better than having a huge negative TTL.
Adding the "$TTL 3600" is a great addition. If you can add text about the semantic differences between the three 3600 values, that would be very useful.
Yes, good point, I will write some info about the different TTL values in the document. Best regards Carsten Strotmann
On 14 Aug 2017, at 0:17, Carsten Strotmann wrote:
Dropping the retry value down further seems reasonable, maybe to 5 minutes. You always want your secondaries to have fresh data. If you have secondaries that are having problems contacting you, you have an operational problem. Maybe add some text to the new version explaining why this number is lower and suggesting that the watch the logs on their secondaries for failures to refresh.
We'll consider this. Care must be taken that once a server is not reachable because of too much traffic, a too low RETRY value might make things worse. But I agree it is preferrable to have fast recovery.
The "retry" value only applies to secondary servers. If a master is overloaded by thousands of customers, adding in its one or two secondary servers will barely be noticed. --Paul Hoffman
Serial numbers should ideally be managed automatically, so that you don't have to care, but ISO 8601 style is definitely the most friendly of the common options. The minimum/negative TTL should match the default TTL, and I agree 1 hour is a good starting point. Regarding the refresh timer, NOTIFY should make it irrelevant, but there are cases like stealth secondaries where it still matters. I think that batch rebuild jobs are most easy to communicate to colleagues if they happen hourly, so the refresh time should probably be 1 hour, to match. (The result is that routine updates propagate within an hour if things are working properly, or two hours in awkward cases.) I agree with Paul that a short retry timer also makes sense, so recovery from failure is short. I use 15 minutes, but it is happily not something I have had to worry about :-) Novices are not expected to be responsible for DNSSEC but they might be looking after a zone signed by someone else. In a signed zone, the expiry time needs to be less than the RRSIG lifetime. A broken secondary should return an error (making resolvers try other, hopefully working, secondaries) before it returns bogus data. The default RRSIG lifetime (in BIND and I think other signers) is 30 days and records are re-signed weekly, so the default expiry time should be about 3 weeks (500 hours). Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at
On 11 Aug 2017, at 15:40, Carsten Strotmann <carsten@strotmann.de> wrote:
Dear colleagues,
RIPE document 203 "Recommended for DNS SOA values" gives recommendations for the values of the SOA record used in simple and stable zones. The document has been aimed at beginner level DNS administrators, to give guidance when creating a zone file.
<https://www.ripe.net/publications/docs/ripe-203>
The document has been successful, and despite its age, it is still being used and referred to today.
Over time the document has aged and the recommendations given in the text are not a good match for today's Internet DNS anymore.
Some while ago Peter Koch and I volunteered in the RIPE DNS WG to update the document.
In my work (DNS training, consulting and support), I get occasional requests for recommendations for SOA values to use. This shows that such a document is still useful today.
Before starting the process of wordsmithing the entire text, we would like to discuss the new values here in the mailing list.
The aim for the document is not to provide the one and only set of "correct" values (there are many), but a set of values that are "not wrong" in most DNS use cases. Also keep in mind that the document should be of help for novice DNS administrators with relative simple zones (less than 1000 RRs, no more than 1 change/week). Large zones, very agile zones with dynamic changes, specialty zones like Active Directory zones are out of scope of the document.
Like the original document, we present one set of values, not ranges. The document should be a "as simple as possible" starting point for new DNS admins. Copy and pasting is encouraged.
As Roland v. Rijswijk mentioned in his talk at SHA 2017 last weekend (recommendation --> "OpenINTEL: digging in the DNS with an industrial size digger" https://media.ccc.de/v/SHA2017-130-openintel_digging_in_the_dns_with_an_indu...) the Internet is build on the premise of de-centralization. To support DNS operators today that like to run their own DNS infrastructure (instead going central in "the cloud"), the entry barrier for setting up a simple but working DNS zone should be as low as possible.
The original SOA values for RIPE 203:
example.com. 3600 SOA dns.example.com. hostmaster.example.com. ( 1999022301 ; serial YYYYMMDDnn 86400 ; refresh ( 24 hours) 7200 ; retry ( 2 hours) 3600000 ; expire (1000 hours) 172800 ) ; minimum ( 2 days)
the new proposed and updated values
$TTL 3600 example.com. 3600 SOA dns.example.com. hostmaster.example.com. ( 2017080101 ; serial YYYYMMDDnn 7200 ; refresh ( 2 hours) 1800 ; retry ( 30 minutes) 3600000 ; expire (1000 hours) 3600 ) ; minimum/negative TTL ( 1 hour)
One observation from the past years is that in situations where the time required for an DNS change is longer that the average working day (8 hours), the change is more likely to fail. Operators get distracted and abandon the change or pick up the change at a later day having lost the context of the original intend for the change.
The new values are chosen in a way that most changes to the DNS zone and infrastructure can be done in one work day.
Lower values will cause more DNS queries and more load on the DNS infrastructure (resolver and authoritative server), but recent experiments ("How the TTL reducing impacted the .cz zone" https://en.blog.nic.cz/2017/05/18/how-the-ttl-reducing-impacted-the-cz-zone/) has shown that the effects are manageable.
Some changes to the DNS infrastructure (like change of delegation information) depends on the TTL values used in the parent domain, but we also see a trend to use lower TTL values in the parent zones as well ("Keeping DNS Parents and Children in Sync at Internet Speed!, Olafur Gudmundsson, Cloudflare" https://ripe70.ripe.net/presentations/51-RIPE-20150309-cf-DNS.pdf).
Given these constraints, we would like to get feedback from the mailing list on the new values:
* do you see potential issues with the proposed values for zones that are in scope of the document?
Best regards
Carsten Strotmann
Hello Tony, Tony Finch writes:
Serial numbers should ideally be managed automatically, so that you don't have to care, but ISO 8601 style is definitely the most friendly of the common options.
The minimum/negative TTL should match the default TTL, and I agree 1 hour is a good starting point.
Regarding the refresh timer, NOTIFY should make it irrelevant, but there are cases like stealth secondaries where it still matters.
In about 50 % of all customer DNS setups that I've seen over the last years, NOTIFY was not working and zone refresh was only relying on SOA values. Of course that was because of other misconfigurations that are out of scope of this document. For us that we work with DNS for some time NOTIFY is "just working", but surprisingly there are many ways to get DNS wrong enough to break NOTIFY.
I think that batch rebuild jobs are most easy to communicate to colleagues if they happen hourly, so the refresh time should probably be 1 hour, to match. (The result is that routine updates propagate within an hour if things are working properly, or two hours in awkward cases.)
Good point.
I agree with Paul that a short retry timer also makes sense, so recovery from failure is short. I use 15 minutes, but it is happily not something I have had to worry about :-)
15 minutes sounds like a good value for me as well.
Novices are not expected to be responsible for DNSSEC but they might be looking after a zone signed by someone else. In a signed zone, the expiry time needs to be less than the RRSIG lifetime. A broken secondary should return an error (making resolvers try other, hopefully working, secondaries) before it returns bogus data. The default RRSIG lifetime (in BIND and I think other signers) is 30 days and records are re-signed weekly, so the default expiry time should be about 3 weeks (500 hours).
good point, I've missed that. We'll adjust the EXPIRE value to take the RRSIG validity into account, and I will also add text explaining the dependency between RRSIG validity and EXPIRE. Best regards Carsten Strotmann
Tony Finch:
The default RRSIG lifetime (in BIND and I think other signers) is 30 days and records are re-signed weekly, so the default expiry time should be about 3 weeks (500 hours).
Tony, could you explain more verbose the relation of "30 days" and "resign one a week" to "a expire time of 3 weeks"? I would like to understand that :-) Andreas
On 14 Aug 2017, at 13:20, A. Schulze <sca@andreasschulze.de> wrote:
could you explain more verbose the relation of "30 days" and "resign one a week" to "a expire time of 3 weeks"?
I would like to understand that :-)
No problem - it is tricky, and I probably didn't help by being very imprecise with the numbers :-) I have quoted the relevant part of the BIND ARM below. To unpack it a bit, by default signatures last 30 days from the time they are generated - they have a fixed expiry time (not a relative time like TTLs). So as the signatures get older, there is less of the 30 days left. This remaining time must be longer than the zone expiry time plus maximum TTL, to ensure that no legitimate queries get an invalid signature in response. The oldest a signature can get depends on how frequently it is regenerated. By default this is every 7.5 days, leaving 22.5 days remaining before the old signatures become invalid, i.e. a little more than three weeks. The "several multiples" advice at the end of the quote below is interesting. If you have a deep zone transfer graph, with multiple hops from the primary master to the furthest secondary, the zone can take a long time to fully expire. But there is an awkward tension between the desire for short signature lifetimes (to reduce the scope for replay attacks) and long expiry times (to make operational response to transfer failures less of an emergency). I tend to favour shorter signatures and better monitoring :-)
sig-validity-interval Specifies the number of days into the future when DNSSEC signatures automatically generated as a result of dynamic updates will expire. There is an optional second field which specifies how long before expiry that the signatures will be regenerated. If not spec-ified, the signatures will be regenerated at 1/4 of base interval. The second field is specified in days if the base interval is greater than 7 days otherwise it is specified in hours. The default base interval is 30 days giving a re-signing interval of 7 1/2 days. The maximum values are 10 years (3660 days).
The signature inception time is unconditionally set to one hour before the current time to allow for a limited amount of clock skew. The sig-validity-interval should be, at least, several multiples of the SOA expire interval to allow for reasonable interaction between the various timer and expiry dates. << Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at
participants (5)
-
A. Schulze
-
Carsten Strotmann
-
Havard Eidnes
-
Paul Hoffman
-
Tony Finch