Progressive BGP route flap dampening
Dear Routing WG members, sorry for not having had the opportunity to participate in the last few meetings. I'm trying hard for the January meeting ! In the meantime let me initiate an e-mail discussion wrt route flap dampening (I'm appending the relevant part of the minutes of the Sept. meeting, where further discussion was requested). Recently, following a scheduled router-maintenance on an Ebone backbone router, I had problems to communicate with NORDUnet (haven't explicitely tried PIPEX ;-) from /24 networks for about 2 hours ! This reminded me of the "progressive route flap dampening" discussion, and increased my motivation for throwing my 0.02 EUROs into the routing-wg list: - I can't beleive that it's really necessary and reasonable to kill a network (regardless of prefix-length) for two hours if it's flapping twice maybe within half an hour but not more frequently within a month ! - Imagine you're SW-upgrading a router and (very unlikely, as we all know ;-} detect that you have to step back ... BINGO, it's perfectly and innocently electrocuted :-( - I'd suggest that dampening (regardless of prefix length) shouldn't start before AT LEAST three flaps are happening in a row (let's say within half an hour). - Dampening should lockout real network instabilities not make worse even scheduled maintenance ! - Besides, I know applications where it might be perfectly reasonable to announce a single *providerindependent* /24 and where it's contraproductive and politically incorrect to include it into an ISP aggregate ! A solution could be to ask Internic/RIPE to define "PI" address-ranges which can and should be excluded from the /24 hostility acts. Kind regards Christian ===== Following quoted from = = RIPE 25, Amsterdam = Routing Working Group = Report of Meeting, 23rd September 1996 = = [...] = =6. Progressive BGP route flap dampening = ftp://ftp.ripe.net/ripe/presentations/ripe-m25-tbarber-bgp-damp.html = = Tony Barber gave a presentation on the strategies used by = UUnet-Pipex to reduce the effects of route flapping and to = try to prevent router table overflow. These were: = = - route dampening = - prefix filtering = - more router memory = = They had encountered many instabilities from peers and found = that many ISPs had not deployed CIDR; this gave rise to more = flapping as more routes, and particularly more specific ones, = were advertised. = = Tony explained the parameters used for route dampening on a = Cisco router. He had arrived at the following re-use times = for various route sizes: = = /24 and greater ~160 minutes = /23 and /22 ~60 minutes = /21 and less ~30 minutes = = He recommended filtering out all prefixes more specific than /24. = = While route dampening consumed router memory, this was more or = less balanced by a reduction in routing CPU cycles. = = He recommended that if route dampening was to be widely = deployed in Europe, consistency was important. In this sense, = the Routing WG should agree on guidelines for parameters to = be used. = = In discussion, the following points were made: = = - aggregation works in reducing router load and route = flapping. = = - route flapping is often a feature of certain autonomous = systems rather than a function of prefix length. = = - much instability was due to configuration changes and = errors as distinct from link failures. = = - making dampening dependent on prefix length could = penalise many stable /24s. = = - it might be useful to discriminate against /24s in the = 192.0.0.0/8 block (the swamp). = = - the focus should be on keeping noise out of the system = rather than trying to mitigate against it once in the = system. = = In summary, it was agreed that route dampening was an = important topic and that more discussion was needed. = ===== End quote
Christian Panigl, ACOnet/UniVie +43 1 4065822-383 wrote:
Recently, following a scheduled router-maintenance on an Ebone backbone router, I had problems to communicate with NORDUnet (haven't explicitely tried PIPEX ;-) from /24 networks for about 2 hours !
- I can't beleive that it's really necessary and reasonable to kill a network (regardless of prefix-length) for two hours if it's flapping twice maybe within half an hour but not more frequently within a month !
Hello Christian, a reply from my perspective :-) Its an incentive to increase aggregation within the community without resorting to the draconian measures used by some ISPs. (You know who ;-) The smallest blocks allocated by registries now is /19 which we would only penilise for a matter of 20 minutes max (IF they reached the flap threshold). This leaves /24s as in 192/8 and maybe multi-homed customers, possibly some Last Resort numbers also. (I'm sure there are a few more valid uses also). Can I just assure you that we would certainly not kill any prefixes within the parameters you define above. Pregressive Dampening is aimed at routes which oscillate *very* regularly, i.e a few times in a matter of minutes. Default figures are 15 minutes half life of dacay penalty. The *minimum* number of withdrawals required to effect dampening is 2 with a most probably figure of 3 given that the recovery starts straight away (Half life of penalty 15 minutes remember)
- Imagine you're SW-upgrading a router and (very unlikely, as we all know ;-} detect that you have to step back ... BINGO, it's perfectly and innocently electrocuted :-(
Relatively unlikely unless it goes very wrong - and then it does have a real effect on peers doesn't it.
- I'd suggest that dampening (regardless of prefix length) shouldn't start before AT LEAST three flaps are happening in a row (let's say within half an hour).
This is about what I recommended - see above
- Dampening should lockout real network instabilities not make worse even scheduled maintenance !
How does one tell the difference ? I would say that scheduled maintenace would not normally cause potentially serious problems and if it does then it becomes a *real* network problem. ?? Any thoughts here from anyone here ?
- Besides, I know applications where it might be perfectly reasonable to announce a single *providerindependent* /24 and where it's contraproductive and politically incorrect to include it into an ISP aggregate ! A solution could be to ask Internic/RIPE to define "PI" address-ranges which can and should be excluded from the /24 hostility acts.
Toni Li's paper on ISPAC would go some way towards this and any other such schemes. Otherwise yes, for instance the root nameservers networks soon to be using /32. We would for instance exclude these. Regards -Tony
participants (2)
-
Christian Panigl, ACOnet/UniVie +43 1 4065822-383 -
Tony Barber