As part of a broader organisational restructure, data networking research at Swinburne University of Technology has moved from the Centre for Advanced Internet Architecture (CAIA) to the Internet For Things (I4T) Research Lab.

Although CAIA no longer exists, this website reflects CAIA's activities and outputs between March 2002 and February 2017, and is being maintained as a service to the broader data networking research community.

Reducing BGP Update Noise

Overview

The underlying issues with scalability of the Internet’s inter-domain routing system and the Border Gateway Protocol (BGP) contain a number of persistent themes. These themes relate to the overall stability of the routing protocol and the performance of the packet forwarding subsystem when dealing with ever-larger routing domains, finer granularity of information and ever-denser levels of element interconnection.

The study of scalability of routing breaks into the sub-topics of memory, processing and bandwidth. The larger the routing system grows the larger the required search space in order to perform the correct lookup operation for packet forwarding, inferring a requirement for larger memory for forwarding lookups. The combination of a larger routing space and a denser set of interconnections also increases the memory requirements for the operation of the routing protocol itself. The combinations of the absolute size of the routing space and the level of dynamic update, or stability, have processing implications. The higher the level of dynamic changes in the routing state, then the greater the processing requirement that is imposed on each routing element. If the processing element can fall behind in processing the update load then there are implications in terms of the accuracy of the routing state, and potential implications in increased memory demands to hold the backlogged queues of unprocessed updates. And because routing is a distributed protocol, the greater the size of the routing space, the more densely interconnected the network and, more importantly, the greater the level of dynamic change, then the larger the amount of generated routing traffic and, by implication the higher the bandwidth demands for routing itself. So as the routing system grows so does the demand for additional memory processing capacity and bandwidth.

Of the three sub-topics, bandwidth is perhaps the lesser of these scalability concerns, leaving memory and processing loads as consistent themes in studies of the routing system over many years. Early efforts to reduce the routing update load focussed on the perception at the time that this was caused by an unstable edge link that caused a route object to be announced, then withdrawn, then announced again, and so on for extended periods of time, and often at quite high frequency. A response to this type of load in BGP was the introduction of Route Flap Damping. Subsequent investigation has revealed that this form of edge behaviour is relatively rare and current operational advice is to disable this damping response in BGP routers.

Subsequent efforts in attempting to mitigate the BGP Update load include use of the Minimum Route Advertisement Interval (MRAI) timer to enforce a minimum period of 30 seconds between successive BGP advertisements of the same route object to the same BGP peer, It has been noted that the heterogeneous deployment of this timer in commonly deployed BGP configurations has lead to some degree of amplification of prefix instability in certain cases and extended convergence time intervals. Another recent effort is combining TCP blocking with output queue compression, so that when the output queue builds up the BGP speaker ensures that only a single entry for each route object is in the queue. Enqueuing a new route object has a side effect of removing any previously queued update that refers to the same route object.

These measures are relatively general in scope, and appear to be still somewhat ineffectual in addressing overall growth in routing protocol update processing loads.

We propose to undertake a detailed analysis of BGP updates collected at various locations in the network in order to determine the relationship of the update sequences to likely root cause events. The particular sequences that are of interest are here those that relate to the behaviour of the BGP distance vector algorithm undertaking path hunting prior to a complete withdrawal of the prefix, and update sequences that relate to BGP path hunting to the preferred path, or “path affinity”.

In this work program we propose to undertake a analysis of BGP Update logs gathered from a number of aggregated route views collectors, augmented by a number of more specific views of the default-free routing system. The intended approach in this analysis is to identify sequences of updates that refer to the same prefix, and that are closely clustered in time, and to associate with the profile of such update sequences a likely root cause and the outcome of routing convergence.

We then propose to analyse the likely effects of modification of the MRAI timer to investigate the outcome of applying a set of heuristics to the both the application of the timer and the length of the timer for each route update.

We then propose to investigate the interactions of this form of BGP update processing behaviour with commonly deployed BGP implementations when operating as a default-free router in the public Internet.

To achieve this we propose to implement output queue compression in the public domain implementation of BGP, Quagga. We then propose to implement a modified MRAI implementation that includes heuristics that identify update sequences and perform selective suppression of BGP updates in those cases where the update sequence matches the heuristics of withdrawal-triggered path exploration, and where there appears to be a match of a path exploration to route that has strong local affinity.



Last Updated: Friday 4-May-2012 19:49:40 AEST | Maintained by: Mattia Rossi (mrossi@swin.edu.au) | Authorised by: Grenville Armitage ( garmitage@swin.edu.au)