4 ------------------------------------
6 This is an excellent description of the topic:
8 'FIB <https://tools.ietf.org/html/draft-ietf-rtgwg-bgp-pic-12>'_
10 but if you're interested in my take keep reading...
12 First some definitions:
14 - Convergence; When a FIB is forwarding all packets correctly based
15 on the network topology (i.e. doing what the routing control plane
16 has instructed it to do), then it is said to be 'converged'.
17 Not being in a converged state is [hopefully] a transient state,
18 when either the topology change (e.g. a link failure) has not been
19 observed or processed by the routing control plane, or that the FIB
20 is still processing routing updates. Convergence is the act of
21 getting to the converged state.
22 - Fast: In the shortest time possible. There are no absolute limits
23 placed on how short this must be, although there is one number often
24 mentioned. Apparently the human ear can detect loss/delay/jitter in
25 VOIP of 50ms, therefore network failures should last no longer than
26 this, and some technologies (notably link-free alternate fast
27 reroute) are designed to converge in this time. However, it is
28 generally accepted that it is not possible to converge a FIB with
29 tens of millions of routes in this time scale, the industry
30 'standard' is sub-second.
32 Converging the FIB quickly is thus a matter of:
34 - discovering something is down
35 - updating as few objects as possible
36 - to determine which objects to update as efficiently as possible
37 - to update each object as quickly as possible
39 we'll discuss each in turn.
40 All output came from VPP version 21.01rc0. In what follows I use IPv4
41 prefixes, addresses and IPv4 host length masks, however, exactly the
48 The two common forms (we'll see others later on) of failure detection
54 The FIB needs to hook into these notifications to trigger
57 Whenever an interface goes down, VPP issues a callback to all
58 registered clients. The adjacency code is such a client. The adjacency
59 is a leaf node in the FIB control-plane graph (containing fib_path_t,
60 fib_entry_t etc). A back-walk from the adjacency will trigger a
61 re-resolution of the paths.
63 FIB is a client of BFD in order to receive BFD notifications. BFD
64 comes in two flavours; single and multi hop. Single hop is to protect
65 a specific peer on an interface, such peers are modelled by an
66 adjacency. Multi hop is to protect a peer on an unspecified interface
67 (i.e. a remote peer), this peer is represented by a host-prefix
68 **fib_entry_t**. In both case FIB will add a delegate to the
69 **ip_adjacency_t** or **fib_entry_t** that represents the association
70 to the BFD session. If the BFD session signals up/down then a backwalk
71 can be triggered from the object to trigger re-resolution and hence
78 In order to talk about what 'a few' is we have to leave the realm of
79 the FIB as an abstract graph based object DB and move into the
80 concrete representation of forwarding in a large network. Large
81 networks are built in layers, it's how you scale them. We'll take
82 here a hypothetical service provider (SP) network, but the concepts
83 apply equally to data center leaf-spines. This is a rudimentary
84 description, but it should serve our purpose.
86 An SP manages a BGP autonomous system (AS). The SP's goal is both to
87 attract traffic into its network to serve its customers, but also to
88 serve transit traffic passing through it, we'll consider the latter here.
89 The SP's network is all devices in that AS, these
90 devices are split into those at the edge (provider edge (PE) routers)
91 which peer with routers in other SP networks,
92 and those in the core (termed provider (P) routers). Both the PE and P
93 routers run the IGP (usually OSPF or ISIS). Only the reachability of the devices
94 in the AS are advertised in the IGP - thus the scale (i.e. the number
95 of routes) in the IGP is 'small' - only the number of
96 devices that the SP has (typically not more than a few 10k).
97 PE routers run BGP; they have external BGP sessions to devices in
98 other ASs and internal BGP sessions to devices in the same AS. BGP is
99 used to advertise the routes to *all* networks on the internet - at
100 the time of writing this number is approaching 900k IPv4 route, hopefully by
101 the time you are reading this the number of IPv6 routes has caught up ...
102 If we include the additional routes the SP carries to offering VPN service to its
103 customers the number of BGP routes can grow to the tens of millions.
105 BGP scale thus exceeds IGP scale by two orders of magnitude... pause for
106 a moment and let that sink in...
108 A comparison of BGP and an IGP is way way beyond the scope of this
109 documentation (and frankly beyond me) so we'll note only the
110 difference in the form of the routes they present to FIB. A routing
111 protocol will produce routes that specify the prefixes that are
112 reachable through its peers. A good IGP
113 is link state based, it forms peerings to other devices over these
114 links, hence its routes specify links/interfaces. In
115 FIB nomenclature this means an IGP produces routes that are
116 attached-nexthop, e.g.:
118 .. code-block:: console
120 ip route add 1.1.1.1/32 via 10.0.0.1 GigEthernet0/0/0
122 BGP on the other hand forms peerings only to neighbours, it does not
123 know, nor care, what interface is used to reach the peer. In FIB
124 nomenclature therefore BGP produces recursive routes, e.g.:
126 .. code-block:: console
128 ip route 8.0.0.0/16 via 1.1.1.1
130 where 1.1.1.1 is the BGP peer. It's no accident in this example that
131 1.1.1.1/32 happens to be the route the IGP advertised... BGP installs
132 routes for prefixes reachable via other BGP peers, and the IGP install
133 the routes to those BGP peers.
135 This has been a very long winded way of describing why the scale of
136 recursive routes is therefore 2 orders of magnitude greater than
137 non-recursive/attached-nexthop routes.
139 If we step back for a moment and recall why we've crawled down this
140 rabbit hole, we're trying to determine what 'a few' updates means,
141 does it include all those recursive routes, probably not ... let's
144 We started this chapter with an abstract description of convergence,
145 let's now make that more real. In the event of a network failure an SP
146 is interested in moving to an alternate forwarding path as quickly as
147 possible. If there is no alternate path, and a converged FIB will drop
148 the packet, then who cares how fast it converges. In other words the
149 interesting convergence scenarios are the scenarios where the network has
155 First let's consider alternate paths in the IGP, e.g.;
157 .. code-block:: console
159 ip route add 1.1.1.1/32 via 10.0.0.2 GigEthernet0/0/0
160 ip route add 1.1.1.1/32 via 10.0.1.2 GigEthernet0/0/1
162 this gives us in the FIB:
164 .. code-block:: console
166 DBGvpp# sh ip fib 1.1.1.1/32
167 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, default-route:1, ]
168 1.1.1.1/32 fib:0 index:15 locks:2
169 API refs:1 src-flags:added,contributing,active,
170 path-list:[23] locks:2 flags:shared, uPRF-list:22 len:2 itfs:[1, 2, ]
171 path:[27] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
172 10.0.0.2 GigEthernet0/0/0
173 [@0]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
174 path:[28] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
175 10.0.1.2 GigEthernet0/0/1
176 [@0]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
178 forwarding: unicast-ip4-chain
179 [@0]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:22 to:[0:0]]
180 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
181 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
183 There is ECMP across the two paths. Note that the instance/index of the
184 load-balance present in the forwarding graph is 17.
186 Let's add a BGP route via this peer;
188 .. code-block:: console
190 ip route add 8.0.0.0/16 via 1.1.1.1
195 .. code-block:: console
197 DBGvpp# sh ip fib 8.0.0.0/16
198 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:1, default-route:1, ]
199 8.0.0.0/16 fib:0 index:18 locks:2
200 API refs:1 src-flags:added,contributing,active,
201 path-list:[24] locks:2 flags:shared, uPRF-list:21 len:2 itfs:[1, 2, ]
202 path:[29] pl-index:24 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
203 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
205 forwarding: unicast-ip4-chain
206 [@0]: dpo-load-balance: [proto:ip4 index:20 buckets:1 uRPF:21 to:[0:0]]
207 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:22 to:[0:0]]
208 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
209 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
211 the load-balance object used by this route is index 20, but note that
212 the next load-balance in the chain is index 17, i.e. it is exactly
213 the same instance that appears in the forwarding chain for the IGP
214 route. So in the forwarding plane the packet first encounters
215 load-balance object 20 (which it will use in ip4-lookup) and then
216 number 17 (in ip4-load-balance).
218 What's the significance? Let's shut down one of those IGP paths:
220 .. code-block:: console
222 DBGvpp# set in state GigEthernet0/0/0 down
224 the resulting update to the IGP route is:
226 .. code-block:: console
228 DBGvpp# sh ip fib 1.1.1.1/32
229 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:1, default-route:1, ]
230 1.1.1.1/32 fib:0 index:15 locks:4
231 API refs:1 src-flags:added,contributing,active,
232 path-list:[23] locks:2 flags:shared, uPRF-list:25 len:2 itfs:[1, 2, ]
233 path:[27] pl-index:23 ip4 weight=1 pref=0 attached-nexthop:
234 10.0.0.2 GigEthernet0/0/0
235 [@0]: arp-ipv4: via 10.0.0.2 GigEthernet0/0/0
236 path:[28] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
237 10.0.1.2 GigEthernet0/0/1
238 [@0]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
240 recursive-resolution refs:1 src-flags:added, cover:-1
242 forwarding: unicast-ip4-chain
243 [@0]: dpo-load-balance: [proto:ip4 index:17 buckets:1 uRPF:25 to:[0:0]]
244 [0] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
247 notice that the path via 10.0.0.2 is no longer flagged as resolved,
248 and the forwarding chain does not contain this path as a
249 choice. However, the key thing to note is the load-balance
250 instance is still index 17, i.e. it has been modified not
251 exchanged. In the FIB vernacular we say it has been 'in-place
252 modified', a somewhat linguistically redundant expression, but one that serves
253 to emphasise that it was changed whilst still be part of the graph, it
254 was never at any point removed from the graph and re-added, and it was
255 modified without worker barrier lock held.
257 Still don't see the significance? In order to converge around the
258 failure of the IGP link it was not necessary to update load-balance
259 object number 20! It was not necessary to update the recursive
260 route. i.e. convergence is achieved without updating any recursive
261 routes, it is only necessary to update the affected IGP routes, this is
262 the definition of 'a few'. We call this 'prefix independent
263 convergence' (PIC) which should really be called 'recursive prefix
264 independent convergence' but it isn't...
266 How was the trick done? As with all problems in computer science, it
267 was solved by a layer of misdirection, I mean indirection. The
268 indirection is the load-balance that belongs to the IGP route. By
269 keeping this object in the forwarding graph and updating it in place,
270 we get PIC. The alternative design would be to collapse the two layers of
271 load-balancing into one, which would improve forwarding performance
272 but would come at the cost of prefix dependent convergence. No doubt
273 there are situations where the VPP deployment would favour forwarding
274 performance over convergence, you know the drill, contributions welcome.
276 This failure scenario is known as PIC core, since it's one of the IGP's
277 core links that has failed.
282 Next, let's consider alternate paths in BGP, e.g:
284 .. code-block:: console
286 ip route add 8.0.0.0/16 via 1.1.1.1
287 ip route add 8.0.0.0/16 via 1.1.1.2
289 the 8.0.0.0/16 prefix is reachable via two BGP next-hops (two PEs).
291 Our FIB now also contains:
293 .. code-block:: console
295 DBGvpp# sh ip fib 8.0.0.0/16
296 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:2, default-route:1, ]
297 8.0.0.0/16 fib:0 index:18 locks:2
298 API refs:1 src-flags:added,contributing,active,
299 path-list:[15] locks:2 flags:shared, uPRF-list:11 len:2 itfs:[1, 2, ]
300 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
301 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
302 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
303 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
305 forwarding: unicast-ip4-chain
306 [@0]: dpo-load-balance: [proto:ip4 index:20 buckets:2 uRPF:11 to:[0:0]]
307 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:1 uRPF:25 to:[0:0]]
308 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
309 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
310 [1] [@12]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:13 to:[0:0]]
311 [0] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
313 The first load-balance (LB) in the forwarding graph is index 20 (the astute
314 reader will note this is the same index as in the previous
315 section, I am adding paths to the same route, the load-balance is
316 in-place modified again). Each choice in LB 20 is another LB
317 contributed by the IGP route through which the route's paths recurse.
319 So what's the equivalent in BGP to a link down in the IGP? An IGP link
320 down means it loses its peering out of that link, so the equivalent in
321 BGP is the loss of the peering and thus the loss of reachability to
322 the peer. This is signaled by the IGP withdrawing the route to the
323 peer. But "Wait wait wait", i hear you say ... "just because the IGP
324 withdraws 1.1.1.1/32 doesn't mean I can't reach 1.1.1.1, perhaps there
325 is a less specific route that gives reachability to 1.1.1.1". Indeed
326 there may be. So a little more on BGP network design. I know it's like
327 a bad detective novel where the author drip feeds you the plot... When
328 describing iBGP peerings one 'always' describes the peer using one of
329 its GigEthernet0/0/back addresses. Why? A GigEthernet0/0/back interface
330 never goes down (unless you admin down it yourself), some muppet can't
331 accidentally cut through the GigEthernet0/0/back cable whilst digging up the
332 street. And what subnet mask length does a prefix have on a GigEthernet0/0/back
333 interface? it's 'always' a /32. Why? because there's no cable to connect
334 any other devices. This choice justifies there 'always' being a /32
335 route for the BGP peer. But what prevents there not being a less
337 Now clearly if the BGP peer crashes then the /32 for its GigEthernet0/0/back is
338 going to be removed from the IGP, but what will withdraw the less
341 So in order to make use of this trick of relying on the withdrawal of
342 the /32 for the peer to signal that the peer is down and thus the
343 signal to converge the FIB, we need to force FIB to recurse only via
344 the /32 and not via a less specific. This is called a 'recursion
345 constraint'. In this case the constraint is 'recurse via host'
346 i.e. for ipv4 use a /32.
347 So we need to update our route additions from before:
349 .. code-block:: console
351 ip route add 8.0.0.0/16 via 1.1.1.1 resolve-via-host
352 ip route add 8.0.0.0/16 via 1.1.1.2 resolve-via-host
354 checking the FIB output is left as an exercise to the reader. I hope
355 you're doing these configs as you read. There's little change in the
356 output, you'll see some extra flags on the paths.
358 Now let's add the less specific, just for fun:
361 .. code-block:: console
363 ip route add 1.1.1.0/28 via 10.0.0.2 GigEthernet0/0/0
365 nothing changes in resolution of 8.0.0.0/16.
367 Now withdraw the route to 1.1.1.2/32:
369 .. code-block:: console
371 ip route del 1.1.1.2/32 via 10.0.0.2 GigEthernet0/0/0
375 .. code-block:: console
377 DBGvpp# sh ip fib 8.0.0.0/32
378 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:2, default-route:1, ]
379 8.0.0.0/16 fib:0 index:18 locks:2
380 API refs:1 src-flags:added,contributing,active,
381 path-list:[15] locks:2 flags:shared, uPRF-list:13 len:2 itfs:[1, 2, ]
382 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
383 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
384 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: cfg-flags:resolve-host,
385 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-drop:0]
387 forwarding: unicast-ip4-chain
388 [@0]: dpo-load-balance: [proto:ip4 index:20 buckets:1 uRPF:13 to:[0:0]]
389 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
390 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
391 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
393 the path via 1.1.1.2 is unresolved, because the recursion constraints
394 are preventing the the path resolving via 1.1.1.0/28. the LB index 20
395 has been updated to remove the unresolved path.
397 Job done? Not quite! Why not?
399 Let's re-examine the goals of this chapter. We wanted to update 'a
400 few' objects, which we have defined as not all the millions of
401 recursive routes. Did we do that here? We sure did, when we
402 modified LB index 20. So WTF?? Where's the indirection object that can
403 be modified so that the LBs for the recursive routes are not
404 modified - it's not there.... WTF?
406 OK so the great detective has assembled all the suspects in the
407 drawing room and only now does he drop the bomb; the FIB knows the
408 scale, we talked above about what the scale **can** be, worst case
409 scenario, but that's not necessarily what it is in this hypothetical
410 (your) deployment. It knows how many recursive routes there are that
411 depend on a /32, it can thus make its own determination of the
412 definition of 'a few'. In other words, if there are only 'a few'
413 recursive prefixes that depend on a /32 then it will update them
414 synchronously (and we'll discuss what synchronously means a bit more later).
416 So what does FIB consider to be 'a few'. Let's add more routes and
419 .. code-block:: console
421 DBGvpp# ip route add 8.1.0.0/16 via 1.1.1.2 resolve-via-host via 1.1.1.1 resolve-via-host
423 DBGvpp# ip route add 8.63.0.0/16 via 1.1.1.2 resolve-via-host via 1.1.1.1 resolve-via-host
427 .. code-block:: console
429 DBGvpp# sh ip fib 8.8.0.0
430 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:4, default-route:1, ]
431 8.8.0.0/16 fib:0 index:77 locks:2
432 API refs:1 src-flags:added,contributing,active,
433 path-list:[15] locks:128 flags:shared,popular, uPRF-list:28 len:2 itfs:[1, 2, ]
434 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
435 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
436 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
437 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
439 forwarding: unicast-ip4-chain
440 [@0]: dpo-load-balance: [proto:ip4 index:79 buckets:2 uRPF:28 flags:[uses-map] to:[0:0]]
441 load-balance-map: index:0 buckets:2
444 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
445 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
446 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
447 [1] [@12]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:18 to:[0:0]]
448 [0] [@3]: arp-ipv4: via 10.0.1.2 GigEthernet0/0/0
451 Two elements to note here; the path-list has the 'popular' flag and
452 there is a load-balance map in the forwarding path.
454 'popular' in this case means that the path-list has passed the limit
455 of 'a few' in the number of children it has.
457 here are the children:
459 .. code-block:: console
461 DBGvpp# sh fib path-list 15
462 path-list:[15] locks:128 flags:shared,popular, uPRF-list:28 len:2 itfs:[1, 2, ]
463 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
464 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
465 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
466 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
467 children:{entry:18}{entry:21}{entry:22}{entry:23}{entry:25}{entry:26}{entry:27}{entry:28}{entry:29}{entry:30}{entry:31}{entry:32}{entry:33}{entry:34}{entry:35}{entry:36}{entry:37}{entry:38}{entry:39}{entry:40}{entry:41}{entry:42}{entry:43}{entry:44}{entry:45}{entry:46}{entry:47}{entry:48}{entry:49}{entry:50}{entry:51}{entry:52}{entry:53}{entry:54}{entry:55}{entry:56}{entry:57}{entry:58}{entry:59}{entry:60}{entry:61}{entry:62}{entry:63}{entry:64}{entry:65}{entry:66}{entry:67}{entry:68}{entry:69}{entry:70}{entry:71}{entry:72}{entry:73}{entry:74}{entry:75}{entry:76}{entry:77}{entry:78}{entry:79}{entry:80}{entry:81}{entry:82}{entry:83}{entry:84}
469 64 children makes it popular. The number is fixed (there is no API to
470 change it). Its choice is an attempt to balance the performance cost
471 of the indirection performance degradation versus the convergence
474 Popular path-lists contribute the load-balance map, this is the
475 missing indirection object. Its indirection happens when choosing the
476 bucket in the LB. The packet's flow-hash is taken 'mod number of
477 buckets' to give the 'candidate bucket' then the map will take this
478 'index' and convert it into the 'map'. You can see in the example above
479 that no change occurs, i.e. if the flow-hash mod n chooses bucket 1
480 then it gets bucket 1.
482 Why is this useful? The path-list is shared (you can convince
483 yourself of this if you look at each of the 8.x.0.0/16 routes we
484 added) and all of these routes use the same load-balance map, therefore, to
485 converge all the recursive routs, we need only change the map and
486 we're good; we again get PIC.
488 OK who's still awake... if you're thinking there's more to this story,
489 you're right. Keep reading.
491 This failure scenario is called iBGP PIC edge. It's 'edge' because it
492 refers to the loss of an edge device, and iBGP because the device was
493 a iBGP peer (we learn iBGP peers in the IGP). There is a similar eBGP
494 PIC edge scenario, but this is left for an exercise to the reader (hint
495 there are other recursion constraints - see the RFC).
500 The next topic on our list of how to converge quickly was to
501 effectively find the objects that need to be updated when a converge
502 event happens. If you haven't realised by now that the FIB is an
503 object graph, then can I politely suggest you go back and start from
506 Finding the objects affected by a change is simply a matter of walking
507 from the parent (the object affected) to its children. These
508 dependencies are kept really for this reason.
510 So is fast convergence just a matter of walking the graph? Yes and
511 no. The question to ask yourself is this, "in the case of iBGP PIC edge,
512 when the /32 is withdrawn, what is the list of objects that need to be
513 updated and particularly what is the order they should be updated in
514 order to obtain the best convergence time?" Think breadth v. depth first.
516 ... ponder for a while ...
518 For iBGP PIC edge we said it's the path-list that provides the
519 indirection through the load-balance map. Hence once all path-lists
520 are updated we are converged, thereafter, at our leisure, we can
521 update the child recursive prefixes. Is the breadth or depth first?
525 Breadth first walks are achieved by spawning an async walk of the
526 branch of the graph that we don't want to traverse. Withdrawing the /32
527 triggers a synchronous walk of the children of the /32 route, we want
528 a synchronous walk because we want to converge ASAP. This synchronous
529 walk will encounter path-lists in the /32 route's child dependent list.
530 These path-lists (and their LB maps) will be updated. If a path-list is
531 popular, then it will spawn a async walk of the path-list's child
532 dependent routes, if not it will walk those routes. So the walk
533 effectively proceeds breadth first across the path-lists, then returns
534 to the start to do the affected routes.
536 Now the story is complete. The murderer is revealed.
538 Let's withdraw one of the IGP routes.
540 .. code-block:: console
542 DBGvpp# ip route del 1.1.1.2/32 via 10.0.1.2 GigEthernet0/0/1
544 DBGvpp# sh ip fib 8.8.0.0
545 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:4, default-route:1, ]
546 8.8.0.0/16 fib:0 index:77 locks:2
547 API refs:1 src-flags:added,contributing,active,
548 path-list:[15] locks:128 flags:shared,popular, uPRF-list:18 len:2 itfs:[1, 2, ]
549 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
550 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
551 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: cfg-flags:resolve-host,
552 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-drop:0]
554 forwarding: unicast-ip4-chain
555 [@0]: dpo-load-balance: [proto:ip4 index:79 buckets:1 uRPF:18 to:[0:0]]
556 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
557 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
558 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
560 the LB Map has gone, since the prefix now only has one path. You'll
561 need to be a CLI ninja if you want to catch the output showing the LB
562 map in its transient state of:
564 .. code-block:: console
566 load-balance-map: index:0 buckets:2
570 but it happens. Trust me. I've got tests and everything.
572 On the final topic of how to converge quickly; 'make each update fast'