I just watched Ondřej Surý's excellent talk about "How many DNS queries does it take to resolve a single domain name?" (I declined to get up to watch live, now I regret missing the interactive part...) Jim Reid asked about turning this into a RIPE document. And Joe Abley noted how the empty cache is not really the static state. My completely unsurprising comment is that the right "goldilocks" number is going to depend, and I think that what we probably need is a way to do reproduceable tests. The observation that bind 9.11 thru .21 have radically different numbers... And I imagine unbound, and windows server's resolver, and dnsmasq, and whatever cloudflare, google, and the other public resolvers use using... each will have different results. Is there also some dependancy on distance to authoratitive resolvers? Certainly, the latency test is done for . and then the queries tend to stick to the best one. And LocalRoot changes this. I've never been quite clear what the behaviour is for other levels. "Easy" for most of the people reading this list to pop-off and start a container or VM or ./named-I-just-compiled and do some measurements. Not sure how I'd test an empty public resolver instance! Whether people like public resolvers or hate them, some portion of people use them, and most zone owners probably want to make sure they aren't tripping up some pathology in one place by optimizing another. So one thing that I think we need is something that explains how to do the tests. Command lines in appendices or better, on a forkable-on-coding-site wiki. Concepts in the document. The next thing I think that we need is then a way (a proceedure, not a tool), given the above, to simulate some kind of failure. Ondřej's reply about how, if I'm running potatocoding.org, and .org servers are all down, then I'm toast, even if I've decided to put an NS in .net/.com and .it. That's relevant, but it's not the whole story. Such broad, high-level outages are now rare, I think. What isn't rare are 2016-style Murai attacks on parts of the infrastructure. I wonder how long teams.office.com takes to resolve if ns3-39.azure-dns.org and ns3-39.azure-dns.info are both down/under-multi-TB/s attack, along with ns3-05.azure.dns.org (where office.com is). The resolver that did more queries, and cached more answers might be ahead... unless the more answers meant that it was more likely to have LRU'ed out some other useful answers. How do I simulate/test that? How do I simulate my resolver being in some other continent? Can I still resolve canada.ca when all our fiber to the US is turned off as part of another trade dispute? (Maybe more relevant to smaller *island* nations!!!) It seems to me that knowing how things degrade (not if, but how) could become an important part of due-diligence. I think someone will need to fund this, even if it's "only" in the form research grants to graduate students. -- Michael Richardson <mcr+IETF@sandelman.ca> . o O ( IPv6 IøT consulting ) Sandelman Software Works Inc, Ottawa and Worldwide ** My working hours and your working hours may be different. ** ** Please do not feel obligated to reply outside your normal working hours **