Analysis of NSD

newer
Re: [dns-wg] Re: ORSN-SERVERS.NET

older
Re: [dns-wg] Re: Elimination of...

Jim Reid

27 Oct 2004 27 Oct '04

1:16 p.m.

...

...
...
...
...
"Stephane" == Stephane Bortzmeyer <bortzmeyer@nic.fr> writes:

>> Well, for NSD, using large zones will cause it to eat memory >> exponentially. Stephane> Here is the NSD maintainer's analysis: Stephane> http://open.nlnetlabs.nl/pipermail/nsd-users/2004-October/000276.html While there is no reason to dispute the analysis from NSD's authors, I do think it would be nice if there was an independent analysis of NSD. And other DNS implementations for that matter. The work Brad did a couple of years ago was a step in that direction. I would also encourage operators to share their experiences of different name server implementations here: performance, protocol compliance, features, management & support issues, SLAs with hosting providers, etc, etc. Capturing and documenting this anecdotal information would be very useful.

Show replies by date

Jørgen Hovland

28 Oct 28 Oct

4:47 a.m.

----- Original Message ----- From: "Jim Reid" <jim@rfc1035.com>

...

I would also encourage operators to share their experiences of different name server implementations here: performance, protocol compliance, features, management & support issues, SLAs with hosting providers, etc, etc. Capturing and documenting this anecdotal information would be very useful.

Hello We are running our own implementation of a domain name server for our hosting company. Its unfortunately not publicy available and I have no additional information, but I thought the implementations might be interesting anyway. We have been running this software for 2 years now. The nameserver is using SQL as a direct back-end but all servers loads data into memory for optimisation reasons during startup. They sort the data into a tree-like structure but doesn't calculate all possible queries or answers. This reduces the amount of ram needed. Besides not all records return static data. We have a few special features so it could easily eat a million gb of ram if we did do it. Some of these are autogeneration of forward/reverse dns based on wildcard/regexp records (so that 100 billion ipv6 addresses doesn't make 200 billion records in dns but only 2) and different replies depending on the country origin of the source ip of the querying nameserver/client. The answer depends on another external database in other words. We also dropped all RRs which no customer have ever asked for (we add it on demand though). When it comes to zone updates we use a tcp/ip connection to the master server instead of altering the database directly. We don't need to probe for db changes this way. The user issues a command and the nameserver replies ok or error and so on. You can only add/del/change data, not list/view it. A change results in that the master updates the local memory of the change only and then the sql server, finally it sends the update to any connected slaves. Slaves are registered with the master through a permanent tcp connection and receive updates whenever somebody alters data in a non zone-transfer/axfr compatible way. We actually don't have support for normal zone transfers at all. The master sends the actual change, not the entire zone or a zone reload request. So both master and slaves updates their local copy instantly. If a slave gets disconnected from the master, it will reload the whole database from the sql server when reconnected to the master in order to ensure no loss of data since the master doesn't keep track on the status of any of the slaves. This reload takes seconds for a small/medium sized database (10k+). When the data has been loaded and sorted the nameserver replaces it with the existing local copy and deletes it from memory. This ensures no downtime. Serial numbers are simply autocalculated from the current time and its not possible for a user to do anything about it. The response latency is the same as with bind but it uses significantly less memory. I havent really done the research to know why. Since we use a tcp connection to the nameserver as way of managing zones, the wrong things you can do is very few. We do have a different policy with regards to valid characters (iso-xx, utf8, idn). The nameserver accepts anything and it's up to the customer to decide what they want. Browsers like msie and internet daemons using different encodings will be able to use the same host name when the customer have the ability to add all possible encoded versions of it (hope I'm not starting an argument here.). Cheers, Joergen Hovland ENK

Johan Ihrén

10:02 a.m.

Hi Jørgen, On Oct 28, 2004, at 04:47, Jørgen Hovland wrote:

...

and different replies depending on the country origin of the source ip of the querying nameserver/client.

Oops. What's the justification for that? And what about maintaining DNS coherency? Any ideas on how to make this work with DNSSEC (I realize you're not doing DNSSEC today, but this would seem to be fundamentally incompatible with any DNSSEC use whatsoever).

...

When it comes to zone updates we use a tcp/ip connection to the master server instead of altering the database directly. We don't need to probe for db changes this way.

Well, that makes sense, but no one in their right mind is probing for changes since NOTIFY was defined many years ago.

...

The user issues a command and the nameserver replies ok or error and so on. You can only add/del/change data, not list/view it. A change results in that the master updates the local memory of the change only and then the sql server, finally it sends the update to any connected slaves. Slaves are registered with the master through a permanent tcp connection and receive updates whenever somebody alters data in a non zone-transfer/axfr compatible way. We actually don't have support for normal zone transfers at all. The master sends the actual change, not the entire zone or a zone reload request. So both master and slaves updates their local copy instantly.

...

If a slave gets disconnected from the master, it will reload the whole database from the sql server when reconnected to the master in order to ensure no loss of data since the master doesn't keep track on the status of any of the slaves. This reload takes seconds for a small/medium sized database (10k+). When the data has been loaded and sorted the nameserver replaces it with the existing local copy and deletes it from memory. This ensures no downtime.

What happens to updates (to the master) that occur *during* the reload? My guess is that they get added to "the tail" of the reload to ensure that no change is left out during the reload. But in that case it seems to me that you already have the zone sorted in "transaction order" and the only thing needed to steer around the complete reloads would be some sort of version stamp that is shared between slave and master. Doesn't really have to be the SOA serial, you can use whatever you want. If you need to serve zones of non-trivial size that would seem to be a good idea.

...

Serial numbers are simply autocalculated from the current time and its not possible for a user to do anything about it.

I.e. the SOA serial is *changed* by the slave. This too won't fly too well with DNSSEC.

...

The response latency is the same as with bind but it uses significantly less memory. I havent really done the research to know why. Since we use a tcp connection to the nameserver as way of managing zones, the wrong things you can do is very few.

Regards, Johan Ihrén Autonomica

Jay Daley

10:40 a.m.

Johan Johan wrote on 28/10/2004 09:02:10:

...

...
When it comes to zone updates we use a tcp/ip connection to the master server instead of altering the database directly. We don't need

...

...
to probe for db changes this way.

Well, that makes sense, but no one in their right mind is probing for changes since NOTIFY was defined many years ago.

As it happens we use probing not NOTIFY since it gives a greater level of control over the number of secondaries being updated at once (and the bandwidth) and the order in which they are updated. This does rely on strict calculations of the timings of changes and probes but that is fairly easy to do. Jay Daley Nominet UK

Peter Koch

10:45 a.m.

Jay Daley <td@nominet.org.uk>:

...

As it happens we use probing not NOTIFY since it gives a greater level of control over the number of secondaries being updated at once (and the

NOTIFY already specifies precautions against a 'rush', although implementations may differ and depending on the size of the zone 'at once' may have different meanings.

...

bandwidth) and the order in which they are updated. This does rely on strict calculations of the timings of changes and probes but that is fairly easy to do.

How about sending the NOTIFY messages out of band instead of relying upon retry (and not refresh?)? There's a little 'DoS' potential, though. -Peter

Jay Daley

11:07 a.m.

Peter Peter wrote on 28/10/2004 09:45:49:

...

Jay Daley <td@nominet.org.uk>:

...
As it happens we use probing not NOTIFY since it gives a greater level of control over the number of secondaries being updated at once (and the

NOTIFY already specifies precautions against a 'rush', although implementations may differ and depending on the size of the zone 'at once' may have different meanings.

Just to give you an idea, the total size of the zone files is around 250MB and as we do a full transfer every day these take about 2 hours per secondary. This will change soon as we move to incremental updates, at which point we may well use NOTIFY. From looking at our logs the largest number of zone files changes on one day is generally about 250,000 DNs being added or removed. Jay Daley Nominet UK

Johan Ihrén

10:48 a.m.

Hi Jay, On Oct 28, 2004, at 10:40, Jay Daley wrote:

...

Johan

Johan wrote on 28/10/2004 09:02:10:

...
...
When it comes to zone updates we use a tcp/ip connection to the master server instead of altering the database directly. We don't need

...
...
to probe for db changes this way.

Well, that makes sense, but no one in their right mind is probing for changes since NOTIFY was defined many years ago.

As it happens we use probing not NOTIFY since it gives a greater level of control over the number of secondaries being updated at once (and the bandwidth) and the order in which they are updated. This does rely on strict calculations of the timings of changes and probes but that is fairly easy to do.

Certainly. But that only works if you relax the goal of more or less instant propagation of changes (to the slaves). I.e. that way you compromise synchronicity (is there such a word?) between slaves for the to you more important goal of managing your data streams. In your case the size of the data is non-trivial, so I understand the reasoning. Johan PS. I choose the word synchronicity rather than coherence, because as long as each slave publish exactly the same data for the same SOA serial they are technically coherent (to the version of the data they *have*) although they may be temporarily out of sync (with slaves having acquiered more recent data).

Jørgen Hovland

2:09 p.m.

----- Original Message ----- From: "Johan Ihrén" <johani@autonomica.se>

...

Hi Jørgen,

On Oct 28, 2004, at 04:47, Jørgen Hovland wrote:

...
and different replies depending on the country origin of the source ip of the querying nameserver/client.

Oops.

What's the justification for that? And what about maintaining DNS coherency? Any ideas on how to make this work with DNSSEC (I realize you're not doing DNSSEC today, but this would seem to be fundamentally incompatible with any DNSSEC use whatsoever).

Hi Johan Load balancy. A bit similar to what Akamai is doing. Regarding dnsssec we aren't quite there yet and don't have a solution to it.

...

What happens to updates (to the master) that occur *during* the reload? My guess is that they get added to "the tail" of the reload to ensure that no change is left out during the reload. But in that case it seems to me that you already have the zone sorted in "transaction order" and the only thing needed to steer around the complete reloads would be some sort of version stamp that is shared between slave and master. Doesn't really have to be the SOA serial, you can use whatever you want.

The slave connects and registers with the master first, locks sql in read-only mode, clears unprocessed zone change messages from the master (there is about 0.001% chance of any zone change messages at this stage anyway) and then reloads from sql. Changes sent by the master during this stage will be held in a queue and not processed before reload is finished. This should guarantee that the local zone data is equal to the master. If the slave should die during the lock a certain timeout would unlock it. We do a complete reload since it only takes 3 seconds. This is where it becomes interesting. I am quite confident that comparing SOA/zone changes would actually take longer time. At least for us using SQL since a SQL query will take prox 20ms before a reply is given. Lets say 10% of the domains were altered. This is a pretty high number though. You have to get the new SOA's from SQL. Lets just say that this has already been done. Now, 10% altered zones out of 30 millions equals to 1000 minutes in latency only to deal with sql zone retrieve calls, not the processing of the data. I am quite sure a raw dump would require less time and less cpu resources by the sql server and perhaps even on the slave depending on the size of each zone. If you only have a few large zones then of course the result would not be the same. However if you have frequent updates on these few large zones you would probably have to reload everything anyway. You could always try reducing the amount of sql calls by grouping them together, but that might look "ugly". Nameservers doesn't usually lose connectivity anyway, but of course they do time to time. There is also a solution doing transaction logging when a slave gets disconnected. We skipped this because it is easier to add and delete new slaves without having to configure the master without. A transaction log could also get very high if the slave was down for a large amount of time and the implications generally about knowing if a slave actually performed the change or not made us skip this. Joergen Hovland ENK

Johan Ihrén

3:38 p.m.

Hi Jørgen, On Oct 28, 2004, at 14:09, Jørgen Hovland wrote:

...

...
...
and different replies depending on the country origin of the source ip of the querying nameserver/client.

Oops.

What's the justification for that? And what about maintaining DNS coherency? Any ideas on how to make this work with DNSSEC (I realize you're not doing DNSSEC today, but this would seem to be fundamentally incompatible with any DNSSEC use whatsoever).

Load balancy. A bit similar to what Akamai is doing. Regarding dnsssec we aren't quite there yet and don't have a solution to it.

Hmm. Urgl. But I see your point (much as I don't like it). I think such things are rather evil (i.e. I'd rather have intelligent clients making informed decisions that have just barely sentient servers making decisions on my behalf based on assumptions about my environment). But I can understand that there's a demand for that type of service given that the average client is not intelligent at all but rather downright retarded. I want to see DNSSEC happen, so I'm naturally concerned when I see designs that I believe to be incompatible with DNSSEC, but I can understand that DNSSEC support has not been your most requested feature so far and sympathize with the lack of a solution.

...

...
What happens to updates (to the master) that occur *during* the reload? My guess is that they get added to "the tail" of the reload to ensure that no change is left out during the reload. But in that case it seems to me that you already have the zone sorted in "transaction order" and the only thing needed to steer around the complete reloads would be some sort of version stamp that is shared between slave and master. Doesn't really have to be the SOA serial, you can use whatever you want.

The slave connects and registers with the master first, locks sql in read-only mode, clears unprocessed zone change messages from the master (there is about 0.001% chance of any zone change messages at this stage anyway) and then reloads from sql. Changes sent by the master during this stage will be held in a queue and not processed before reload is finished. This should guarantee that the local zone data is equal to the master. If the slave should die during the lock a certain timeout would unlock it.

We do a complete reload since it only takes 3 seconds. This is where it becomes interesting.

Wait. Are you saying that a complete reload of the zone (where all the data moves from the master to the slave) takes 3 seconds? For how large a zone? Cannot be very large. Or are you saying that the nameserver basically closes the connection to the DB backend and then reopens it to read the data fresh (i.e. there is massive data movement between nameserver and DB backend within the slave, but only DB syncronization magic goes over the wire from the master)? Or are you saying that the slave does the sql reads over the wire from the DB (i.e. the SQL DB is not locally replicated on the slave)? I know next to nothing about DB machinery when it comes to stuff like replication, so please excuse my ignorance.

...

I am quite confident that comparing SOA/zone changes would actually take longer time. At least for us using SQL since a SQL query will take prox 20ms before a reply is given. Lets say 10% of the domains were altered. This is a pretty high number though. You have to get the new SOA's from SQL. Lets just say that this has already been done. Now, 10% altered zones out of 30 millions equals to 1000 minutes in latency only to deal with sql zone retrieve calls, not the processing of the data. I am quite sure a raw dump would require less time and less cpu resources by the sql server and perhaps even on the slave depending

I agree to the efficiency of a raw dump. However, I'm really dense today so I don't really understand why you're using a number as high as 30M *zones*. No offense, but no one ought to put that much infrastructure into any single system regardless of the underlying technology. I think a more realistic example would be to look at 30K zones and 1 minute. And furthermore I don't understand why it is not possible to parallellize those calls. Especially since they not all go to the same master. Or perhaps they do in your case? Is it possible to have a slave slave multiple zones from different masters with multiple TCP sessions in different directions?

...

on the size of each zone. If you only have a few large zones then of course the result would not be the same. However if you have frequent updates on these few large zones you would probably have to reload everything anyway. You could always try reducing the amount of sql

I mostly agree. If you have a few large zones (typically TLDs) my guess would be that even with a rather high volume of changes most of the changes would concern a smaller part of the data and hence IXFRs still make sense as long as you're able to keep the transaction logs. I.e. I have no idea whatsoever about the change frequency of .co.uk for example, but I'd be really surprised if not more than 60% stayed unchanged for a year.

...

calls by grouping them together, but that might look "ugly". Nameservers doesn't usually lose connectivity anyway, but of course they do time to time.

Exactly. And that's the situation that interests me. That "ordinary operation" works just fine doesn't surprise me at all.

...

There is also a solution doing transaction logging when a slave gets disconnected. We skipped this because it is easier to add and delete new slaves without having to configure the master without. A transaction log could also get very high if the slave was down for a large amount of time and the implications generally about knowing if a slave actually performed the change or not made us skip this.

This is exactly the reasoning behind how IXFR works. I.e. a slave can request an IXFR with all the transactions from version N until now, but the master alway has the right to respond with an AXFR. This way the master may "jettison" the transaction log if it grows too much to be convenient to keep. As to knowing if a slave performed a change or not that is also taken care of by the SOA serial. So, since you don't have that (or something similar OOB wrt to DNS data as I suggested) I understand your reasoning. Johan

Jørgen Hovland

4:44 p.m.

----- Original Message ----- From: "Johan Ihrén" <johani@autonomica.se> To: "Jørgen Hovland" <jorgen@hovland.cx> Cc: <dns-wg@ripe.net> Sent: Thursday, October 28, 2004 2:38 PM Subject: Re: [dns-wg] Analysis of NSD

...

Hi Jørgen,

On Oct 28, 2004, at 14:09, Jørgen Hovland wrote:

...
We do a complete reload since it only takes 3 seconds. This is where it becomes interesting.

Wait. Are you saying that a complete reload of the zone (where all the data moves from the master to the slave) takes 3 seconds? For how large a zone? Cannot be very large.

Yes it's a rather small db with only 10k domains. This number is of course growing so the implementation had to support our future demands.

...

Or are you saying that the nameserver basically closes the connection to the DB backend and then reopens it to read the data fresh (i.e. there is massive data movement between nameserver and DB backend within the slave, but only DB syncronization magic goes over the wire from the master)?

Or are you saying that the slave does the sql reads over the wire from the DB (i.e. the SQL DB is not locally replicated on the slave)?

All nameservers connect to the same SQL server (only the master has read-write). Master is always connected due to updates issued by web/asp connections etc. Slaves are only connected during startup and during a reconnect to the master. We could replicate the db but then we would have some more syncronisation issues and we really don't see the point doing this for now. We use replication only for backup purposes. If the sql server is down then the slaves wont reload during master reconnect right away. They will retry for a few times before giving up.

...

I agree to the efficiency of a raw dump. However, I'm really dense today so I don't really understand why you're using a number as high as 30M *zones*. No offense, but no one ought to put that much infrastructure into any single system regardless of the underlying technology. I think a more realistic example would be to look at 30K zones and 1 minute. And furthermore I don't understand why it is not possible to parallellize those calls.

The 30M was just to show some large numbers. There won't be less domains in the future than now thats for sure. A 30K ns with the same 10% would only result in 1minute latency, but this is still a lot more than what a full dump would use including handling the data and everything to having a fresh copy.

...

Especially since they not all go to the same master. Or perhaps they do in your case? Is it possible to have a slave slave multiple zones from different masters with multiple TCP sessions in different directions?

There can be only one to rule them all, but a slave can also be a master for other slaves again although we don't use this feature.

...

...
on the size of each zone. If you only have a few large zones then of course the result would not be the same. However if you have frequent updates on these few large zones you would probably have to reload everything anyway. You could always try reducing the amount of sql

I mostly agree. If you have a few large zones (typically TLDs) my guess would be that even with a rather high volume of changes most of the changes would concern a smaller part of the data and hence IXFRs still make sense as long as you're able to keep the transaction logs. I.e. I have no idea whatsoever about the change frequency of .co.uk for example, but I'd be really surprised if not more than 60% stayed unchanged for a year.

This is exactly the reasoning behind how IXFR works. I.e. a slave can request an IXFR with all the transactions from version N until now, but the master alway has the right to respond with an AXFR. This way the master may "jettison" the transaction log if it grows too much to be convenient to keep.

Sorry I should have said transaction log slash IXFR, not just transaction log. Anyway you are absolutely correct but we chose to skip this feature. IXFR is definately a good way to deal with zone synchronization (for existing zones as far as I am aware of). The dump implementation we use synchronizes everything including new and deleted zones. Joergen Hovland ENK

7559

Age (days ago)

7560

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Jay Daley
Jim Reid
Johan Ihrén
Jørgen Hovland
Peter Koch

Analysis of NSD

Jim Reid

Jørgen Hovland

Johan Ihrén

Jay Daley

Peter Koch

Jay Daley

Johan Ihrén

Jørgen Hovland

Johan Ihrén

Jørgen Hovland

tags

participants (5)