Fwd: Deprecating failed prefixes in the host
FYI -------- Forwarded Message -------- Subject: Re: Deprecating failed prefixes in the host Date: Thu, 2 Feb 2023 17:14:22 +1300 From: Brian E Carpenter <brian.e.carpenter@gmail.com> To: IPv6 Operations <v6ops@ietf.org> Answering some of the comments made, the new version now at https://github.com/becarpenter/misc/blob/main/deprecator.py picks a random RIPE Atlas anchor probe as its ping target (unless the user supplies their own ping target). Thanks to the RIPE Atlas community for not shooting me down in flames. (An even better approach would be choose a new random ping target every hour or so, it's easy enough to code that too.) On my Windows host, I have noticed some odd behaviour of the source selection logic if a ULA is active. No such problem on Linux. For personal reasons I have no time to track that down right now. Again, there's no pretence that this is an operational solution; it's just a proof of half-baked concept. Regards Brian On 24-Jan-23 15:20, Brian E Carpenter wrote:
Hi,
We keeping falling over the same issue with multi-provider multi-homing: routing magic can keep traffic away from the link to/from a failing provider, but hosts don't know about this and keep sending traffic *from* addresses within the failing provider's prefix.
As Eduard Vasilenko has pointed out, source address selection at the host is the root cause of this, and RFC6724 doesn't specify a way to change source address selection automatically. Also, we don't currently have a mechanism for rapidly withdrawing a failing prefix.
To be clear, a typical user program acting as a client uses getaddrinfo() to choose a destination address, but does not use bind() to set a source address - instead it uses connect() and the system chooses a source address. So the need is to avoid choosing a source address that lies within the prefix of a failing provider.
This message suggests a host-only mechanism to achieve this. If you want a name for it, consider "unhappy eyeballs". I'm not claiming originality, because it seems obvious to me.
The mechanism is brutal: have a program running continuously on the host that regularly tests the liveness of each assigned source address, e.g., once a minute. The liveness test is simply a ping to some target address somewhere in the Internet, sourced from the address being tested (i.e., it *does* use bind()). If the ping fails several times in a row, deprecate the source address in question, i.e. set its preferred lifetime to zero. RFC6724 will then ignore it, so no new sessions will use it. Keep the deprecated address in the liveness test, and if it starts to work again, un-deprecate it (i.e. restore its lifetime).
This is brutal because it does nothing for sessions that are already in progress when a source address fails. They will just time out. I'm sure that solutions such as TAPS is proposing will be much more elegant. It's also brutal because it doesn't know anything about prefixes, it just detects failing /128 addresses. (I don't know if there is a userland mechanism for deprecating a whole prefix.) But this works for legacy client code that simply uses connect().
Sounds horrible? Yes. It's also very hard to test; I don't have a setup that allows a proper test. But simulating a ping failure isn't so hard.
You can find a quick and nasty Python version for Windows 10 and Linux at https://github.com/becarpenter/misc/blob/main/deprecator.py This is experimental software that might disturb network access; if you run it, it is entirely at your own risk. It needs sudo or Administrator privilege.
There's an associated and harmless test program at https://github.com/becarpenter/misc/blob/main/deptest.py
(By the way, there seems to be an unexpected interaction between the RFC6724 source-selection algorithm and the presence of a ULA on Windows 10: under investigation.)
Regards Brian Carpenter
participants (1)
-
Brian E Carpenter