Dear Routing WG, thanks to Joachim who took notes and provided me with very elaborated draft minutes of our "Route Flap Dampening BOF" ! I did some additions and I'm now asking all participants to come back with comments. And I'd like to remind you to send me your pointers to similar discussions and related recommendations (IETF, NANOG, ...) since I wasn't able to attend other forums than RIPE yet. Regards Christian --- Christian Panigl : Vienna University Computer Center - ACOnet --- --- VUCC - ACOnet : -------------------------------------------- --- --- Universitaetsstrasse 7 : Mail: Panigl@CC.UniVie.ac.at (CP8-RIPE) --- --- A-1010 Vienna / Austria : Tel: +43 1 4065822-383 (Fax: -170) --- =================================================== Route Flap Dampening BOF, RIPE 26, 22.1.97, 14:00 Chairman: Christian Panigl (CP) Scribe: Joachim Schmitz (JS) Attendees: approx 30 In the Routing WG session Christian Panigl asked whether people are interested to participate in a BOF on route flap dampening. The BOF session was held after the plenary session of the RIPE meeting on Wednesday. CP experienced quite severe reachability problems of customer networks because route flap dampening became active at various AS borders following scheduled maintenance actions on a core router. If the default dampening parameters were used everywhere, it wouldn't have hurt that much, since dampening would have lasted for ~20-30 minutes only for all prefixes. Some backbone ISPs, however, have started to implement "progressive route flap dampening" typically using different parameters. The common effect is that longer prefixes are dampened more aggressively than shorter prefixes. In the observed case all /24 customer networks were cut off from parts of the Internet for more than 2 hours and were no longer able to reach for instance the root nameservers. By the way, many, even top- and second-level nameservers are sitting in /24 (192/TWD) prefixes themselves and could easily be "victims" of such a progressive dampening policy ! CP wasn't branding route flap dampening itself, but the aggressiveness of some of the implemented "progressive" parameters and was questioning the real usefulness of progressive dampening at all. Following CP's introduction a vivid discussion on route flap dampening came off: * Does flapping really depend on the prefix length? - To the knowledge of people attending the BOF session no measurements exist. Although several items were already measured by Merit on the stability of routes (as seen in the presentation by G.Winters in the Routing WG) they did not include a stability analysis with regard to the prefix length. If flapping does not necessarily depend on the prefix length longer prefixes should not be punished by more aggressive dampening. - However, the number of longer prefixes in the routing tables is much bigger than the number of shorter ones. As a consequence, if the percentage of flapping routes is the same for all prefix lengths the absolute number of flaps will be definitely higher for longer prefixes. As each flap consumes the same performance on the router (regardless of the prefix length) and to get the the best CPU saving factor, longer prefixes should be dampened more aggressively. - Further justification for the latter was primarily based on the assumption that longer prefixes are serving less users, which of course didn't stay uncontradicted (think of important servers sitting in a /24). * Which networks or prefixes are "important"? - Stating that shorter prefixes are more important because they cover more users doesn't hold in general. On the one hand this may be valuable and motivate ISPs to CIDRize and customers to renumber, on the other hand it may lead to the situation that organisations try to get (or keep, think of Class A/B recycling) as short a prefix as possible, wasting address space without having to care for stability. In this case instability would be moved to shorter prefixes which is far from desirable. - Long prefixes need not be instable. There are discussions to use long prefix routes ("golden networks") for root nameservers or for other Internet structure servers (even for application servers as news, etc). It can be well assumed that these routes are more stable than others and they must not be dampened too aggressively in order not to tackle the functionality of the Internet itself. During all the discussion the general consensus was clear: for routers with large BGP tables (notably with full routing) the CPU load would kill any existing router. To survive instabilities route flap dampening should be applied by *everybody*. However, it was obvious that dampening parameters need to be coordinated throughout the Internet in order to - allow efficient dampening and easy clearing after repair - dampen flaps at their source by keeping them from spreading in the network This will significantly increase the overall stability and the manageability. The broader (soft and default) dampening is deployed allover the Internet, the less the need for aggressive paramaters will be. The group was forming into two major camps with regard to how dampening should be done: - progressive dampening: needs to be accompanied by means to explicitely exclude "golden networks" from "hostility acts" - flat (default) dampening: because it's very hard to make a distinction between less and more "important", not to say "golden" networks, all prefixes should be treated equally. Efforts should be focussed on the propagation of dampening throughout the Internet. The default values for dampening parameters as they are found in Cisco routers are based upon some experiments approx one year ago. These experiments lead to recommendations by the IETF last year. Nevertheless, many ISPs have moved away from the default values and are using their own parameters. Because of the urgent need of coordination of these values CP will try to collect related recommendations and the outcome of similar discussions. This is an activity of the RIPE Routing WG, therefore everybody who is aware of related efforts (IETF, NANOG, ...) should come back to the Routing WG list with hints and pointers ! New Action 26.R4 on Chrisian Panigl To collect reasonable route flap dampening parameter values and to present them at the next RIPE meeting in the Routing WG. Further reading: ftp://ftp.ripe.net/ripe/minutes/ripe-m-24.ps ftp://ftp.ripe.net/ripe/minutes/ripe-m-25.ps http://www.ripe.net/wg/routing/r25-routing.html ftp://ftp.ripe.net/ripe/presentations/ripe-m25-tbarber-bgp-damp.html
Christian Panigl, ACOnet/UniVie wrote:
Dear Routing WG,
and I'm now asking all participants to come back with comments.
In the observed case all /24 customer networks were cut off from parts of the Internet for more than 2 hours and were no longer able to reach for instance the root nameservers. By the way, many, even top- and second-level nameservers are sitting in /24 (192/TWD) prefixes themselves and could easily be "victims" of such a progressive dampening policy !
We do not implement any dampening on the route-server networks nor some particular networks of strategic importance. This is easily done using a 'deny x.x.x.x in any access list applied to the BGP dampening. This seems like an obvious way to protect from cutting *ourselves* off from important resources.
CP wasn't branding route flap dampening itself, but the aggressiveness of some of the implemented "progressive" parameters and was questioning the real usefulness of progressive dampening at all.
If it encourages sensible aggregation surely this is a good thing ?
* Does flapping really depend on the prefix length?
- To the knowledge of people attending the BOF session no measurements exist. Although several items were already measured by Merit on the stability of routes (as seen in the presentation by G.Winters in the Routing WG) they did not include a stability analysis with regard to the prefix length. If flapping does not necessarily depend on the prefix length longer prefixes should not be punished by more aggressive dampening.
While I have no statistics to back up, personal experiences show that the prefix *is* related to the flap propensity.
* Which networks or prefixes are "important"?
- Long prefixes need not be instable. There are discussions to use long prefix routes ("golden networks") for root nameservers or for other Internet structure servers (even for application servers as news, etc). It can be well assumed that these routes are more stable than others and they must not be dampened too aggressively in order not to tackle the functionality of the Internet itself.
This is easily avoided - see above comments re filters.:)
parameters need to be coordinated throughout the Internet in order to
- allow efficient dampening and easy clearing after repair
- dampen flaps at their source by keeping them from spreading in the network
- flat (default) dampening: because it's very hard to make a distinction between less and more "important", not to say "golden" networks, all prefixes should be treated equally. Efforts should be focussed on the propagation of dampening throughout the Internet.
should come back to the Routing WG list with hints and pointers !
AS1849 (UUNET UK) uses the parameters shown at: ftp://ftp.ripe.net/ripe/presentations/ripe-m25-tbarber-bgp-damp.html These are (more or less) the config used by us in the last few months and have proved to work happily. We have had *very* few enquiries about loss of connectivity and our network has been very very stable (not implying it is because of dampening of course ;-). We have for some time been thinking about changing the policies such that they fall in line with accepted Regional registry policies once these are all alligned. I.e it would be nice to apply zero or Minimal penalisation to any /19 or shorter. This means that any well aggregated route will be less affected that ones which are not. This still leaves us with the legacy of PI space, 192, Holes and multi-homed site prefixes. If the community really does want to move away from such legacies perhaps this kind of wide reaching co-operative action will be a good motivation? If RIPE-routing-wg can come up with a best common practice paper, would most members fall in line ??? MY guess is they probably would. One last comment: Dampening beats the hell out of filtering > /19 ! Regards --Tony
participants (2)
-
Christian Panigl, ACOnet/UniVie -
Tony Barber