rsync://rpki.ripe.net rsyncd limits set too low?
Hi all, I noticed the RIPE NCC RRDP service (https://rrdp.ripe.net/) became unreachable at 2022-02-16 13:34:10 UTC+0 (and still is down). This RRDP outage event should not pose an issue for most RPKI validators, because most RPKI cache implementations (which follow best practises) will attempt to try to synchronize via RSYNC, in case RRDP is unavailable. However, it seems RIPE NCC adjusted the default rsyncd settings and lowered the concurrent connection count from 200 (which already is too low for RPKI Repository Servers) to 150? $ rsync --no-motd -rt rsync://rpki.ripe.net/repository/ @ERROR: max connections (150) reached -- try again later rsync error: error starting client-server protocol (code 5) at main.c(1666) [Receiver=3.1.2] I'm not familiar with the RIPE RPKI RSYNC service architecture, so the above error could be misleading: perhaps there is a loadbalancer distributing TCP sessions across multiple backends, each backend configured to serve up to 150 clients? Or perhaps there is a single rsyncd instance (in which case 150 definitely is too low). Is the RIPE NCC RPKI RSYNC service underprovisioned? If yes, why? Kind regards, Job
On Wed, Feb 16, 2022 at 03:05:30PM +0100, Job Snijders wrote:
However, it seems RIPE NCC adjusted the default rsyncd settings and lowered the concurrent connection count from 200 (which already is too low for RPKI Repository Servers) to 150?
Small correction: I appear to be confused about 200 being the default, according to documentation the default is 'unlimited' Kind regards, Job
Hi Job.
On 16 Feb 2022, at 15:05, Job Snijders via routing-wg <routing-wg@ripe.net> wrote:
Hi all,
I noticed the RIPE NCC RRDP service (https://rrdp.ripe.net/) became unreachable at 2022-02-16 13:34:10 UTC+0 (and still is down).
Ouch. Fallback to rsync due to a DNS misconfiguration (which should have recovered).
This RRDP outage event should not pose an issue for most RPKI validators, because most RPKI cache implementations (which follow best practises) will attempt to try to synchronize via RSYNC, in case RRDP is unavailable.
However, it seems RIPE NCC adjusted the default rsyncd settings and lowered the concurrent connection count from 200 (which already is too low for RPKI Repository Servers) to 150?
$ rsync --no-motd -rt rsync://rpki.ripe.net/repository/ @ERROR: max connections (150) reached -- try again later rsync error: error starting client-server protocol (code 5) at main.c(1666) [Receiver=3.1.2]
I'm not familiar with the RIPE RPKI RSYNC service architecture, so the above error could be misleading: perhaps there is a loadbalancer distributing TCP sessions across multiple backends, each backend configured to serve up to 150 clients? Or perhaps there is a single rsyncd instance (in which case 150 definitely is too low).
We have described our rsync infrastructure extensively in earlier messages (e.g. [0]). There are multiple instances behind a load-balancer. The current storage is on NFS which has a performance limitation - it peaked at about 80K operations/second (2m average). We will follow up with a more detailed post-mortem. Kind regards, Ties [0]: https://www.ripe.net/ripe/mail/archives/routing-wg/2021-June/004351.html
Hi Ties, Thank you for the quick reply. On Wed, Feb 16, 2022 at 03:32:06PM +0100, Ties de Kock wrote:
Ouch. Fallback to rsync due to a DNS misconfiguration (which should have recovered).
Thanks for the confirmation. Indeed, my monitors seem to have returned to 'all clear'.
There are multiple instances behind a load-balancer. The current storage is on NFS which has a performance limitation - it peaked at about 80K operations/second (2m average).
Welp! That's a lot of IO. Sharing from my own experience with a tiny publication point: I estimate there are about 4,000 RPs deployed on the Internet. Assuming their synchronisation attempts are evenly distributed across the hour, a naieve calculation suggests every single second a new client will attempt to connect.
We will follow up with a more detailed post-mortem.
Much appreciated! Kind regards, Job
participants (2)
-
Job Snijders
-
Ties de Kock