Re: [routing-wg] RPKI Quarterly Planning

16 Jul 2021

      Hi Job,
...
On 13 Jul 2021, at 12:57, Job Snijders via routing-wg <routing-wg@ripe.net> wrote:
Hi,
On Mon, Jul 12, 2021 at 10:23:20AM +0200, Daniel Karrenberg wrote:
...
Natanlie pointed us to
https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-plann...
a while ago. Among other things this says:
“In preparation for the improved RPKI repository architecture, the
distributed nature of the RRDP repository is going to be implemented using
containers and krill-sync that pulls data from the centralised on-premise
repository. This greatly simplifies smooth transitioning between publication
servers without any downtime.
NOTE: We are not referring to cloud technologies here, just to our internal
deployment technologies.”
The silence here worries me.
What silence?!
Over the last few months there have been quite some mail threads in this
working group about RPKI and RPKI outage incidents, and NCC staff have
provided updates during the virtual RIPE meetings in the Routing WG
slot.
To me the roadmap seems to reflect the sentiment that reliability is the
key objective at this moment in time.
...
I would like to see some feedback from this group whether this is what
you want to see happening. The RIPE Routing WG is the forum for giving
guidance to the RIPE NCC about RPKI. I know other channels exist too
and that is fine. I also know that individuals here seem to be happy
with what is happening. However private channels and conversations are
not the way RIPE does this.  This group is where the RIPE NCC looks
for guidance and where that guidance gets properly archived and
responded to.
To be honest I am not sure what the purpose of krill-sync is.
In May 2021 [1] extensive testing was conducted with the help of the
NLNOG RING to see if krill-sync could be used to power the RSYNC
service, but it turned out there were multiple issues with krill-sync
making it a suboptimal choice. I believe RIPE NCC ended up deploying a
different solution to serve RSYNC - and my hope is that the
recently-achieved stability is here to stay, because the current setup
seems to work quite nicely.
We are [1] evaluating krill-sync as a tool to build rsync servers that are
independent of NFS and can use cached IO.

The reason for this is rsync fallback. We see ~139 RPs using the rsync
repository (as well as the majority of the NLNOG RING nodes ) and >1600 RPs
using the RRDP repository [2]. When rsync fallback happens for many RPs, the
current infrastructure will likely not scale, even when each RP starts from the
last RRDP state.

We are evaluating krill-sync because it allows us to build a rsync repository
from RRDP and is available as an open-source project.

I recall that while evaluating that krill-sync based environment we found three
issues:
  * Repository versions need to be available for two hours _after they last were
    the current version_ to give slow clients the chance to retrieve them [3].
  * The modification time of objects needs to be the same (between nodes and
    between copies for a serial) to prevent additional IOs for RPs.
  * There are very slow outliers reading repositories, but keeping versions
    available for two hours is long enough in practice.

Finding these issues was good: it ensured that they were accounted for in our
implementation that writes to NFS. After reporting the relevant issues upstream
they have been fixed in krill-sync. The use of NLNOG RING helped verify the
current NFS based setup - which I agree is working nicely.

Kind regards,
Ties

[1]: https://www.ripe.net/ripe/mail/archives/routing-wg/2021-June/004351.html
[2]: rsync: number of unique IPs reading from /repository yesterday in one hour. hour-to-hour variance is minimal. RRDP: number of unique IPs retrieving notification.xml >24 times/day in early July.
[3]: Example: revision 0 gets published at 0h0m, revision 1 at 1h59m, revision 2 at 2h01m (and revision 0 is deleted). The files that clients that connect at 1h58m read get deleted.

Re: [routing-wg] RPKI Quarterly Planning

Ties de Kock