Detecting, mitigating, and preventing distributed large-scale prefix de-aggregation attacks
Dear Routing WG, Our apologies to those who received this message via multiple channels. My colleagues and I recently revisited the topic of prefix de-aggregation attacks. We believe that the current IPv6 allocation policies combined with the ever-growing number of interconnection opportunities may facilitate those attacks to the point where they may circumvent traditional prevention mechanisms. Hence, we'd like to raise awareness on how to detect, mitigate, and prevent these kinds of attacks. # Prefix De-aggregation Attacks While allocation policies in IPv4 are very tight, even a new LIR can obtain, e.g., a /29 IPv6 address block from RIPE without justification [1]. This /29 may source more than a million unique IPv6 prefixes when using all CIDR sizes between /29 and /48 (the largest CIDR size that is not filtered). To prevent this many prefixes from flooding the DFZ, many ASes set a maximum prefix limit on their eBGP sessions. When initially introduced, these max-prefix limits prevented prefix de-aggregation attacks. In today's hyper-connected world, prefix limits transform these attacks into session-hunting challenges. To better illustrate this relationship, consider the following example: If an adversary combines two remote-peering offerings of BSO's IXReach [2] and Epsilon's Infinity Platform [3], they can establish ports at more than a hundred peering LANs. If this adversary uses Hurricane Electric as their IPv6 transit provider and establish a BGP session at every in-common peering LAN [4], this will lead to 100+ sessions. With a per-session limit of e.g. 500 prefixes, the adversary could redistribute 50K unique prefixes via this setup alone. If an adversary further increases the number of remote peering providers, adds announcements from BGP-enabled VPS services (e.g., Vultr [5] among many others [6]), and contracts additional IPv6 transit providers, they may globally increase the current IPv6 routing table size manifold. Notably, each of these new routes would have a valid ROV status once the adversary adds a single ROA entry for a /29 with a max CIDR size of /48; hence, they would pass the redistribution requirements for various transit providers. While many current router models support multiple million IPv6 routes, especially older models may crash, drop sessions, or behave in other unintended ways when either their FIB or RIB runs out of memory. When the adversary also withdraws all routes simultaneously, the number of updates generated from BGP's path-hunting may further lead to very high load for extended periods of time. To put this into perspective: Some of you might have noticed increased CPU load alongside other effects when Vultr was de-aggregating 12k IPv6 prefixes on October 5th [7]. Using the different methods described above, an highly-motivated adversary might be able to produce 1-2 orders of magnitude more updates. Please note that we performed various smaller (<600 prefixes) de-aggregation tests as part of our research---see sections 6 and 8 in the document referenced at the end of this notification for detailed explanations. While our experimental setup was very similar to the October 5th incident (we also announced address space obtained from SecureBit via VMs within Vultr), we are in no way related to that incident neither did we share any information about our research or findings with individuals outside our research group prior to the start of our private disclosure phase on October 11th. # Detection, Mitigation, and Prevention Mechanisms. Luckily, prefix de-aggregation attacks are easily detectable (e.g., based on prefix-limit notification thresholds or direct routing table size monitoring) and can be mitigated quickly by filtering either the more specifics of the covering prefix or all prefixes announced by the adversary's ASN(s). Effectively, damage can only be done within the human reaction time---which we hope to shorten with this notification. To protect yourself from prefix de-aggregation attacks, you may establish dynamic yet tight per-session limits on all eBGP sessions. As an adversary could enter unreasonably large values into databases such as PeeringDB, we'd recommend to not solely rely on them but also accept at most 1.5-2x the number of yesterday's prefixes for peers and customers and 1.2x yesterday's routing table size for transit providers (which would currently reflect a headroom of ~32k prefixes with a yearly growth rate of <50k prefixes [8]). We'd also recommend ensuring that the summed prefix limits across all sessions do not drastically exceed the router's maximal FIB size. To protect others, you may: (i) ensure that you only redistribute a certain number of routes per origin; currently, AS 9808 announces the most (~4K) IPv6 prefixes. (ii) ensure that you only redistribute a certain number of more-specific routes for each assigned address block; currently, 2409:8000::/20 is the prefix with the most (~10K) more-specifics. If you want to know more about the research that initiated this notification (including our concluded private disclosure process), you may find a write-up at: https://arxiv.org/pdf/2210.10676.pdf Best regards, Lars [1] https://www.ripe.net/publications/docs/ripe-738#initial_size [2] https://www.ixreach.com/services/remote-peering/ [3] https://epsilontel.com/global-network-footprint/internet-exchanges/ [4] https://he.net/peering.html [5] https://www.vultr.com/features/advanced-networking/ [6] https://docs.google.com/spreadsheets/d/1abmV_mXWWCsVxHLfouSivyS7ch-PcUww8S6k... <https://docs.google.com/spreadsheets/d/1abmV_mXWWCsVxHLfouSivyS7ch-PcUww8S6ksY66c5o/edit#gid=0> [7] https://twitter.com/Qrator_Radar/status/1577748939805278209 [8] https://bgp.potaroo.net/v6/as2.0/index.html
Lars Prehn wrote on 20/10/2022 20:23:
If you want to know more about the research that initiated this notification (including our concluded private disclosure process), you may find a write-up at:
Lars, thanks for sending your findings to routing-wg. The impact of deaggregation of ipv6 prefixes on router resources has been understood since well before ipv6 was standardised. Your paper describes an attack which can either use a provider's customer cone or else use a selection of different transit providers to inject smaller numbers of prefixes from different injection points into either a providers routing table, or the ipv6 dfz. Your argument is roughly equivalent to stating that if you send 20,000 cars from different starting points to a single destination, that you will end up with gridlock, as the taxi industry in Moscow discovered a couple of weeks ago - this isn't news for either cars or routing table prefixes. There are many other easily-staged attacks which are also efficient at causing disruption, e.g. sending 1tbit/s of data at a destination, gluing oneself to the road on a commuter trunk during rush hour, cutting fibre cables in chambers, etc. All of these are low cost, regularly tested, and are known to work well. Your list of takeaways in section 6.1 is correct, but it stopped at the point of detection and mitigation. Routing tables are monitored, and some people / organisations have alerts triggered, particularly transit providers. Another is that routing operations people tend to notice things like routers and routing tables blowing up. In other words, you will cause damage if you try this in anger, but fairly quickly the source(s) will be noticed and you'll find that mitigation action will be taken. As quickly as providers might increase prefix limits on a bgp session, they will drop them too, or shut down the session entirely, or terminate your free ipv6 transit, or cut off your ixp peering. This is important because one of the more important aspects of network reliability is not closing off all angles of potential attack / failure, but ensuring that detection and time-to-recovery are optimised. The Vultr incident on October 5 this year was noticed fairly quickly on operator forums, both because of alerting on router FIB resource usage and control plane CPU usage. Incidentally, production de-peering happened as a result of the incident, although hopefully that has been undone at this point. Nick
Hi Nick, As we wrote in the mail and the write-up (§7): mitigation is quick and fairly easy (to the point where damage is limited to the human response time/time-to-action). Our notification aimed at (i) cutting down the the time-to-action by re-raising awareness for the problem and (ii) providing operators (especially of smaller networks) with prevention mechanisms to lower the impact on their networks until transit providers and IXPs acted. Best regards, Lars On 20.10.22 23:15, Nick Hilliard wrote:
Lars Prehn wrote on 20/10/2022 20:23:
If you want to know more about the research that initiated this notification (including our concluded private disclosure process), you may find a write-up at:
Lars,
thanks for sending your findings to routing-wg. The impact of deaggregation of ipv6 prefixes on router resources has been understood since well before ipv6 was standardised.
Your paper describes an attack which can either use a provider's customer cone or else use a selection of different transit providers to inject smaller numbers of prefixes from different injection points into either a providers routing table, or the ipv6 dfz. Your argument is roughly equivalent to stating that if you send 20,000 cars from different starting points to a single destination, that you will end up with gridlock, as the taxi industry in Moscow discovered a couple of weeks ago - this isn't news for either cars or routing table prefixes. There are many other easily-staged attacks which are also efficient at causing disruption, e.g. sending 1tbit/s of data at a destination, gluing oneself to the road on a commuter trunk during rush hour, cutting fibre cables in chambers, etc. All of these are low cost, regularly tested, and are known to work well.
Your list of takeaways in section 6.1 is correct, but it stopped at the point of detection and mitigation. Routing tables are monitored, and some people / organisations have alerts triggered, particularly transit providers. Another is that routing operations people tend to notice things like routers and routing tables blowing up. In other words, you will cause damage if you try this in anger, but fairly quickly the source(s) will be noticed and you'll find that mitigation action will be taken. As quickly as providers might increase prefix limits on a bgp session, they will drop them too, or shut down the session entirely, or terminate your free ipv6 transit, or cut off your ixp peering.
This is important because one of the more important aspects of network reliability is not closing off all angles of potential attack / failure, but ensuring that detection and time-to-recovery are optimised.
The Vultr incident on October 5 this year was noticed fairly quickly on operator forums, both because of alerting on router FIB resource usage and control plane CPU usage. Incidentally, production de-peering happened as a result of the incident, although hopefully that has been undone at this point.
Nick
participants (2)
-
Lars Prehn
-
Nick Hilliard