docs/gettingstarted/developers/vlib.md

   1
   2 VLIB (Vector Processing Library)
   3 ================================
   4
   5 The files associated with vlib are located in the ./src/{vlib,
   6 vlibapi, vlibmemory} folders. These libraries provide vector
   7 processing support including graph-node scheduling, reliable multicast
   8 support, ultra-lightweight cooperative multi-tasking threads, a CLI,
   9 plug in .DLL support, physical memory and Linux epoll support. Parts of
  10 this library embody US Patent 7,961,636.
  11
  12 Init function discovery
  13 -----------------------
  14
  15 vlib applications register for various \[initialization\] events by
  16 placing structures and \_\_attribute\_\_((constructor)) functions into
  17 the image. At appropriate times, the vlib framework walks
  18 constructor-generated singly-linked structure lists, performs a
  19 topological sort based on specified constraints, and calls the
  20 indicated functions. Vlib applications create graph nodes, add CLI
  21 functions, start cooperative multi-tasking threads, etc. etc. using
  22 this mechanism.
  23
  24 vlib applications invariably include a number of VLIB\_INIT\_FUNCTION
  25 (my\_init\_function) macros.
  26
  27 Each init / configure / etc. function has the return type clib\_error\_t
  28 \*. Make sure that the function returns 0 if all is well, otherwise the
  29 framework will announce an error and exit.
  30
  31 vlib applications must link against vppinfra, and often link against
  32 other libraries such as VNET. In the latter case, it may be necessary to
  33 explicitly reference symbol(s) otherwise large portions of the library
  34 may be AWOL at runtime.
  35
  36 ### Init function construction and constraint specification
  37
  38 It's easy to add an init function:
  39
  40 ```
  41    static clib_error_t *my_init_function (vlib_main_t *vm)
  42    {
  43       /* ... initialize things ... */
  44
  45       return 0; // or return clib_error_return (0, "BROKEN!");
  46    }
  47    VLIB_INIT_FUNCTION(my_init_function);
  48 ```
  49
  50 As given, my_init_function will be executed "at some point," but with
  51 no ordering guarantees.
  52
  53 Specifying ordering constraints is easy:
  54
  55 ```
  56    VLIB_INIT_FUNCTION(my_init_function) =
  57    {
  58       .runs_before = VLIB_INITS("we_run_before_function_1",
  59                                 "we_run_before_function_2"),
  60       .runs_after = VLIB_INITS("we_run_after_function_1",
  61                                "we_run_after_function_2),
  62     };
  63 ```
  64
  65 It's also easy to specify bulk ordering constraints of the form "a
  66 then b then c then d":
  67
  68 ```
  69    VLIB_INIT_FUNCTION(my_init_function) =
  70    {
  71       .init_order = VLIB_INITS("a", "b", "c", "d"),
  72    };
  73 ```
  74
  75 It's OK to specify all three sorts of ordering constraints for a
  76 single init function, although it's hard to imagine why it would be
  77 necessary.
  78
  79
  80 Node Graph Initialization
  81 -------------------------
  82
  83 vlib packet-processing applications invariably define a set of graph
  84 nodes to process packets.
  85
  86 One constructs a vlib\_node\_registration\_t, most often via the
  87 VLIB\_REGISTER\_NODE macro. At runtime, the framework processes the set
  88 of such registrations into a directed graph. It is easy enough to add
  89 nodes to the graph at runtime. The framework does not support removing
  90 nodes.
  91
  92 vlib provides several types of vector-processing graph nodes, primarily
  93 to control framework dispatch behaviors. The type member of the
  94 vlib\_node\_registration\_t functions as follows:
  95
  96 -   VLIB\_NODE\_TYPE\_PRE\_INPUT - run before all other node types
  97 -   VLIB\_NODE\_TYPE\_INPUT - run as often as possible, after pre\_input
  98     nodes
  99 -   VLIB\_NODE\_TYPE\_INTERNAL - only when explicitly made runnable by
 100     adding pending frames for processing
 101 -   VLIB\_NODE\_TYPE\_PROCESS - only when explicitly made runnable.
 102     "Process" nodes are actually cooperative multi-tasking threads. They
 103     **must** explicitly suspend after a reasonably short period of time.
 104
 105 For a precise understanding of the graph node dispatcher, please read
 106 ./src/vlib/main.c:vlib\_main\_loop.
 107
 108 Graph node dispatcher
 109 ---------------------
 110
 111 Vlib\_main\_loop() dispatches graph nodes. The basic vector processing
 112 algorithm is diabolically simple, but may not be obvious from even a
 113 long stare at the code. Here's how it works: some input node, or set of
 114 input nodes, produce a vector of work to process. The graph node
 115 dispatcher pushes the work vector through the directed graph,
 116 subdividing it as needed, until the original work vector has been
 117 completely processed. At that point, the process recurs.
 118
 119 This scheme yields a stable equilibrium in frame size, by construction.
 120 Here's why: as the frame size increases, the per-frame-element
 121 processing time decreases. There are several related forces at work; the
 122 simplest to describe is the effect of vector processing on the CPU L1
 123 I-cache. The first frame element \[packet\] processed by a given node
 124 warms up the node dispatch function in the L1 I-cache. All subsequent
 125 frame elements profit. As we increase the number of frame elements, the
 126 cost per element goes down.
 127
 128 Under light load, it is a crazy waste of CPU cycles to run the graph
 129 node dispatcher flat-out. So, the graph node dispatcher arranges to wait
 130 for work by sitting in a timed epoll wait if the prevailing frame size
 131 is low. The scheme has a certain amount of hysteresis to avoid
 132 constantly toggling back and forth between interrupt and polling mode.
 133 Although the graph dispatcher supports interrupt and polling modes, our
 134 current default device drivers do not.
 135
 136 The graph node scheduler uses a hierarchical timer wheel to reschedule
 137 process nodes upon timer expiration.
 138
 139 Graph dispatcher internals
 140 --------------------------
 141
 142 This section may be safely skipped. It's not necessary to understand
 143 graph dispatcher internals to create graph nodes.
 144
 145 Vector Data Structure
 146 ---------------------
 147
 148 In vpp / vlib, we represent vectors as instances of the vlib_frame_t type:
 149
 150 ```c
 151     typedef struct vlib_frame_t
 152     {
 153       /* Frame flags. */
 154       u16 flags;
 155
 156       /* Number of scalar bytes in arguments. */
 157       u8 scalar_size;
 158
 159       /* Number of bytes per vector argument. */
 160       u8 vector_size;
 161
 162       /* Number of vector elements currently in frame. */
 163       u16 n_vectors;
 164
 165       /* Scalar and vector arguments to next node. */
 166       u8 arguments[0];
 167     } vlib_frame_t;
 168 ```
 169
 170 Note that one _could_ construct all kinds of vectors - including
 171 vectors with some associated scalar data - using this structure. In
 172 the vpp application, vectors typically use a 4-byte vector element
 173 size, and zero bytes' worth of associated per-frame scalar data.
 174
 175 Frames are always allocated on CLIB_CACHE_LINE_BYTES boundaries.
 176 Frames have u32 indices which make use of the alignment property, so
 177 the maximum feasible main heap offset of a frame is
 178 CLIB_CACHE_LINE_BYTES * 0xFFFFFFFF: 64*4 = 256 Gbytes.
 179
 180 Scheduling Vectors
 181 ------------------
 182
 183 As you can see, vectors are not directly associated with graph
 184 nodes. We represent that association in a couple of ways.  The
 185 simplest is the vlib\_pending\_frame\_t:
 186
 187 ```c
 188     /* A frame pending dispatch by main loop. */
 189     typedef struct
 190     {
 191       /* Node and runtime for this frame. */
 192       u32 node_runtime_index;
 193
 194       /* Frame index (in the heap). */
 195       u32 frame_index;
 196
 197       /* Start of next frames for this node. */
 198       u32 next_frame_index;
 199
 200       /* Special value for next_frame_index when there is no next frame. */
 201     #define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0)
 202     } vlib_pending_frame_t;
 203 ```
 204
 205 Here is the code in .../src/vlib/main.c:vlib_main_or_worker_loop()
 206 which processes frames:
 207
 208 ```c
 209       /*
 210        * Input nodes may have added work to the pending vector.
 211        * Process pending vector until there is nothing left.
 212        * All pending vectors will be processed from input -> output.
 213        */
 214       for (i = 0; i < _vec_len (nm->pending_frames); i++)
 215         cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
 216       /* Reset pending vector for next iteration. */
 217 ```
 218
 219 The pending frame node_runtime_index associates the frame with the
 220 node which will process it.
 221
 222 Complications
 223 -------------
 224
 225 Fasten your seatbelt. Here's where the story - and the data structures
 226 \- become quite complicated...
 227
 228 At 100,000 feet: vpp uses a directed graph, not a directed _acyclic_
 229 graph. It's really quite normal for a packet to visit ip\[46\]-lookup
 230 multiple times. The worst-case: a graph node which enqueues packets to
 231 itself.
 232
 233 To deal with this issue, the graph dispatcher must force allocation of
 234 a new frame if the current graph node's dispatch function happens to
 235 enqueue a packet back to itself.
 236
 237 There are no guarantees that a pending frame will be processed
 238 immediately, which means that more packets may be added to the
 239 underlying vlib_frame_t after it has been attached to a
 240 vlib_pending_frame_t. Care must be taken to allocate new
 241 frames and pending frames if a (pending\_frame, frame) pair fills.
 242
 243 Next frames, next frame ownership
 244 ---------------------------------
 245
 246 The vlib\_next\_frame\_t is the last key graph dispatcher data structure:
 247
 248 ```c
 249     typedef struct
 250     {
 251       /* Frame index. */
 252       u32 frame_index;
 253
 254       /* Node runtime for this next. */
 255       u32 node_runtime_index;
 256
 257       /* Next frame flags. */
 258       u32 flags;
 259
 260       /* Reflects node frame-used flag for this next. */
 261     #define VLIB_FRAME_NO_FREE_AFTER_DISPATCH \
 262       VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH
 263
 264       /* This next frame owns enqueue to node
 265          corresponding to node_runtime_index. */
 266     #define VLIB_FRAME_OWNER (1 << 15)
 267
 268       /* Set when frame has been allocated for this next. */
 269     #define VLIB_FRAME_IS_ALLOCATED     VLIB_NODE_FLAG_IS_OUTPUT
 270
 271       /* Set when frame has been added to pending vector. */
 272     #define VLIB_FRAME_PENDING VLIB_NODE_FLAG_IS_DROP
 273
 274       /* Set when frame is to be freed after dispatch. */
 275     #define VLIB_FRAME_FREE_AFTER_DISPATCH VLIB_NODE_FLAG_IS_PUNT
 276
 277       /* Set when frame has traced packets. */
 278     #define VLIB_FRAME_TRACE VLIB_NODE_FLAG_TRACE
 279
 280       /* Number of vectors enqueue to this next since last overflow. */
 281       u32 vectors_since_last_overflow;
 282     } vlib_next_frame_t;
 283 ```
 284
 285 Graph node dispatch functions call vlib\_get\_next\_frame (...)  to
 286 set "(u32 \*)to_next" to the right place in the vlib_frame_t
 287 corresponding to the ith arc (aka next0) from the current node to the
 288 indicated next node.
 289
 290 After some scuffling around - two levels of macros - processing
 291 reaches vlib\_get\_next\_frame_internal (...). Get-next-frame-internal
 292 digs up the vlib\_next\_frame\_t corresponding to the desired graph
 293 arc.
 294
 295 The next frame data structure amounts to a graph-arc-centric frame
 296 cache. Once a node finishes adding element to a frame, it will acquire
 297 a vlib_pending_frame_t and end up on the graph dispatcher's
 298 run-queue. But there's no guarantee that more vector elements won't be
 299 added to the underlying frame from the same (source\_node,
 300 next\_index) arc or from a different (source\_node, next\_index) arc.
 301
 302 Maintaining consistency of the arc-to-frame cache is necessary. The
 303 first step in maintaining consistency is to make sure that only one
 304 graph node at a time thinks it "owns" the target vlib\_frame\_t.
 305
 306 Back to the graph node dispatch function. In the usual case, a certain
 307 number of packets will be added to the vlib\_frame\_t acquired by
 308 calling vlib\_get\_next\_frame (...).
 309
 310 Before a dispatch function returns, it's required to call
 311 vlib\_put\_next\_frame (...) for all of the graph arcs it actually
 312 used.  This action adds a vlib\_pending\_frame\_t to the graph
 313 dispatcher's pending frame vector.
 314
 315 Vlib\_put\_next\_frame makes a note in the pending frame of the frame
 316 index, and also of the vlib\_next\_frame\_t index.
 317
 318 dispatch\_pending\_node actions
 319 -------------------------------
 320
 321 The main graph dispatch loop calls dispatch pending node as shown
 322 above.
 323
 324 Dispatch\_pending\_node recovers the pending frame, and the graph node
 325 runtime / dispatch function. Further, it recovers the next\_frame
 326 currently associated with the vlib\_frame\_t, and detaches the
 327 vlib\_frame\_t from the next\_frame.
 328
 329 In .../src/vlib/main.c:dispatch\_pending\_node(...), note this stanza:
 330
 331 ```c
 332   /* Force allocation of new frame while current frame is being
 333      dispatched. */
 334   restore_frame_index = ~0;
 335   if (nf->frame_index == p->frame_index)
 336     {
 337       nf->frame_index = ~0;
 338       nf->flags &= ~VLIB_FRAME_IS_ALLOCATED;
 339       if (!(n->flags & VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH))
 340         restore_frame_index = p->frame_index;
 341     }
 342 ```
 343
 344 dispatch\_pending\_node is worth a hard stare due to the several
 345 second-order optimizations it implements. Almost as an afterthought,
 346 it calls dispatch_node which actually calls the graph node dispatch
 347 function.
 348
 349 Process / thread model
 350 ----------------------
 351
 352 vlib provides an ultra-lightweight cooperative multi-tasking thread
 353 model. The graph node scheduler invokes these processes in much the same
 354 way as traditional vector-processing run-to-completion graph nodes;
 355 plus-or-minus a setjmp/longjmp pair required to switch stacks. Simply
 356 set the vlib\_node\_registration\_t type field to
 357 vlib\_NODE\_TYPE\_PROCESS. Yes, process is a misnomer. These are
 358 cooperative multi-tasking threads.
 359
 360 As of this writing, the default stack size is 2<<15 = 32kb.
 361 Initialize the node registration's process\_log2\_n\_stack\_bytes member
 362 as needed. The graph node dispatcher makes some effort to detect stack
 363 overrun, e.g. by mapping a no-access page below each thread stack.
 364
 365 Process node dispatch functions are expected to be "while(1) { }" loops
 366 which suspend when not otherwise occupied, and which must not run for
 367 unreasonably long periods of time.
 368
 369 "Unreasonably long" is an application-dependent concept. Over the years,
 370 we have constructed frame-size sensitive control-plane nodes which will
 371 use a much higher fraction of the available CPU bandwidth when the frame
 372 size is low. The classic example: modifying forwarding tables. So long
 373 as the table-builder leaves the forwarding tables in a valid state, one
 374 can suspend the table builder to avoid dropping packets as a result of
 375 control-plane activity.
 376
 377 Process nodes can suspend for fixed amounts of time, or until another
 378 entity signals an event, or both. See the next section for a description
 379 of the vlib process event mechanism.
 380
 381 When running in vlib process context, one must pay strict attention to
 382 loop invariant issues. If one walks a data structure and calls a
 383 function which may suspend, one had best know by construction that it
 384 cannot change. Often, it's best to simply make a snapshot copy of a data
 385 structure, walk the copy at leisure, then free the copy.
 386
 387 Process events
 388 --------------
 389
 390 The vlib process event mechanism API is extremely lightweight and easy
 391 to use. Here is a typical example:
 392
 393 ```c
 394     vlib_main_t *vm = &vlib_global_main;
 395     uword event_type, * event_data = 0;
 396
 397     while (1)
 398     {
 399        vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
 400
 401        event_type = vlib_process_get_events (vm, &event_data);
 402
 403        switch (event_type) {
 404        case EVENT1:
 405            handle_event1s (event_data);
 406            break;
 407
 408        case EVENT2:
 409            handle_event2s (event_data);
 410            break;
 411
 412        case ~0: /* 5-second idle/periodic */
 413            handle_idle ();
 414            break;
 415
 416        default: /* bug! */
 417            ASSERT (0);
 418        }
 419
 420        vec_reset_length(event_data);
 421     }
 422 ```
 423
 424 In this example, the VLIB process node waits for an event to occur, or
 425 for 5 seconds to elapse. The code demuxes on the event type, calling
 426 the appropriate handler function. Each call to
 427 vlib\_process\_get\_events returns a vector of per-event-type data
 428 passed to successive vlib\_process\_signal\_event calls; it is a
 429 serious error to process only event\_data\[0\].
 430
 431 Resetting the event\_data vector-length to 0 \[instead of calling
 432 vec\_free\] means that the event scheme doesn't burn cycles continuously
 433 allocating and freeing the event data vector. This is a common vppinfra
 434 / vlib coding pattern, well worth using when appropriate.
 435
 436 Signaling an event is easy, for example:
 437
 438 ```c
 439     vlib_process_signal_event (vm, process_node_index, EVENT1,
 440         (uword)arbitrary_event1_data); /* and so forth */
 441 ```
 442
 443 One can either know the process node index by construction - dig it out
 444 of the appropriate vlib\_node\_registration\_t - or by finding the
 445 vlib\_node\_t with vlib\_get\_node\_by\_name(...).
 446
 447 Buffers
 448 -------
 449
 450 vlib buffering solves the usual set of packet-processing problems,
 451 albeit at high performance. Key in terms of performance: one ordinarily
 452 allocates / frees N buffers at a time rather than one at a time. Except
 453 when operating directly on a specific buffer, one deals with buffers by
 454 index, not by pointer.
 455
 456 Packet-processing frames are u32\[\] arrays, not
 457 vlib\_buffer\_t\[\] arrays.
 458
 459 Packets comprise one or more vlib buffers, chained together as required.
 460 Multiple particle sizes are supported; hardware input nodes simply ask
 461 for the required size(s). Coalescing support is available. For obvious
 462 reasons one is discouraged from writing one's own wild and wacky buffer
 463 chain traversal code.
 464
 465 vlib buffer headers are allocated immediately prior to the buffer data
 466 area. In typical packet processing this saves a dependent read wait:
 467 given a buffer's address, one can prefetch the buffer header
 468 \[metadata\] at the same time as the first cache line of buffer data.
 469
 470 Buffer header metadata (vlib\_buffer\_t) includes the usual rewrite
 471 expansion space, a current\_data offset, RX and TX interface indices,
 472 packet trace information, and a opaque areas.
 473
 474 The opaque data is intended to control packet processing in arbitrary
 475 subgraph-dependent ways. The programmer shoulders responsibility for
 476 data lifetime analysis, type-checking, etc.
 477
 478 Buffers have reference-counts in support of e.g. multicast replication.
 479
 480 Shared-memory message API
 481 -------------------------
 482
 483 Local control-plane and application processes interact with the vpp
 484 dataplane via asynchronous message-passing in shared memory over
 485 unidirectional queues. The same application APIs are available via
 486 sockets.
 487
 488 Capturing API traces and replaying them in a simulation environment
 489 requires a disciplined approach to the problem. This seems like a
 490 make-work task, but it is not. When something goes wrong in the
 491 control-plane after 300,000 or 3,000,000 operations, high-speed replay
 492 of the events leading up to the accident is a huge win.
 493
 494 The shared-memory message API message allocator vl\_api\_msg\_alloc uses
 495 a particularly cute trick. Since messages are processed in order, we try
 496 to allocate message buffering from a set of fixed-size, preallocated
 497 rings. Each ring item has a "busy" bit. Freeing one of the preallocated
 498 message buffers merely requires the message consumer to clear the busy
 499 bit. No locking required.
 500
 501 Debug CLI
 502 ---------
 503
 504 Adding debug CLI commands to VLIB applications is very simple.
 505
 506 Here is a complete example:
 507
 508 ```c
 509     static clib_error_t *
 510     show_ip_tuple_match (vlib_main_t * vm,
 511                          unformat_input_t * input,
 512                          vlib_cli_command_t * cmd)
 513     {
 514         vlib_cli_output (vm, "%U\n", format_ip_tuple_match_tables, &routing_main);
 515         return 0;
 516     }
 517
 518     /* *INDENT-OFF* */
 519     static VLIB_CLI_COMMAND (show_ip_tuple_command) =
 520     {
 521         .path = "show ip tuple match",
 522         .short_help = "Show ip 5-tuple match-and-broadcast tables",
 523         .function = show_ip_tuple_match,
 524     };
 525     /* *INDENT-ON* */
 526 ```
 527
 528 This example implements the "show ip tuple match" debug cli
 529 command. In ordinary usage, the vlib cli is available via the "vppctl"
 530 application, which sends traffic to a named pipe. One can configure
 531 debug CLI telnet access on a configurable port.
 532
 533 The cli implementation has an output redirection facility which makes it
 534 simple to deliver cli output via shared-memory API messaging,
 535
 536 Particularly for debug or "show tech support" type commands, it would be
 537 wasteful to write vlib application code to pack binary data, write more
 538 code elsewhere to unpack the data and finally print the answer. If a
 539 certain cli command has the potential to hurt packet processing
 540 performance by running for too long, do the work incrementally in a
 541 process node. The client can wait.
 542
 543 ### Macro expansion
 544
 545 The vpp debug CLI engine includes a recursive macro expander. This
 546 is quite useful for factoring out address and/or interface name
 547 specifics:
 548
 549 ```
 550    define ip1 192.168.1.1/24
 551    define ip2 192.168.2.1/24
 552    define iface1 GigabitEthernet3/0/0
 553    define iface2 loop1
 554
 555    set int ip address $iface1 $ip1
 556    set int ip address $iface2 $(ip2)
 557
 558    undefine ip1
 559    undefine ip2
 560    undefine iface1
 561    undefine iface2
 562 ```
 563
 564 Each socket (or telnet) debug CLI session has its own macro
 565 tables. All debug CLI sessions which use CLI_INBAND binary API
 566 messages share a single table.
 567
 568 The macro expander recognizes circular defintions:
 569
 570 ```
 571     define foo \$(bar)
 572     define bar \$(mumble)
 573     define mumble \$(foo)
 574 ```
 575
 576 At 8 levels of recursion, the macro expander throws up its hands and
 577 replies "CIRCULAR."
 578
 579 ### Macro-related debug CLI commands
 580
 581 In addition to the "define" and "undefine" debug CLI commands, use
 582 "show macro [noevaluate]" to dump the macro table. The "echo" debug
 583 CLI command will evaluate and print its argument:
 584
 585 ```
 586     vpp# define foo This\ Is\ Foo
 587     vpp# echo $foo
 588     This Is Foo
 589 ```
 590
 591 Handing off buffers between threads
 592 -----------------------------------
 593
 594 Vlib includes an easy-to-use mechanism for handing off buffers between
 595 worker threads. A typical use-case: software ingress flow hashing. At
 596 a high level, one creates a per-worker-thread queue which sends packets
 597 to a specific graph node in the indicated worker thread. With the
 598 queue in hand, enqueue packets to the worker thread of your choice.
 599
 600 ### Initialize a handoff queue
 601
 602 Simple enough, call vlib_frame_queue_main_init:
 603
 604 ```c
 605    main_ptr->frame_queue_index
 606        = vlib_frame_queue_main_init (dest_node.index, frame_queue_size);
 607 ```
 608
 609 Frame_queue_size means what it says: the number of frames which may be
 610 queued. Since frames contain 1...256 packets, frame_queue_size should
 611 be a reasonably small number (32...64). If the frame queue producer(s)
 612 are faster than the frame queue consumer(s), congestion will
 613 occur. Suggest letting the enqueue operator deal with queue
 614 congestion, as shown in the enqueue example below.
 615
 616 Under the floorboards, vlib_frame_queue_main_init creates an input queue
 617 for each worker thread.
 618
 619 Please do NOT create frame queues until it's clear that they will be
 620 used. Although the main dispatch loop is reasonably smart about how
 621 often it polls the (entire set of) frame queues, polling unused frame
 622 queues is a waste of clock cycles.
 623
 624 ### Hand off packets
 625
 626 The actual handoff mechanics are simple, and integrate nicely with
 627 a typical graph-node dispatch function:
 628
 629 ```c
 630     always_inline uword
 631     do_handoff_inline (vlib_main_t * vm,
 632                        vlib_node_runtime_t * node, vlib_frame_t * frame,
 633                        int is_ip4, int is_trace)
 634     {
 635       u32 n_left_from, *from;
 636       vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
 637       u16 thread_indices [VLIB_FRAME_SIZE];
 638       u16 nexts[VLIB_FRAME_SIZE], *next;
 639       u32 n_enq;
 640       htest_main_t *hmp = &htest_main;
 641       int i;
 642
 643       from = vlib_frame_vector_args (frame);
 644       n_left_from = frame->n_vectors;
 645
 646       vlib_get_buffers (vm, from, bufs, n_left_from);
 647       next = nexts;
 648       b = bufs;
 649
 650       /*
 651        * Typical frame traversal loop, details vary with
 652        * use case. Make sure to set thread_indices[i] with
 653        * the desired destination thread index. You may
 654        * or may not bother to set next[i].
 655        */
 656
 657       for (i = 0; i < frame->n_vectors; i++)
 658         {
 659           <snip>
 660           /* Pick a thread to handle this packet */
 661           thread_indices[i] = f (packet_data_or_whatever);
 662           <snip>
 663
 664           b += 1;
 665           next += 1;
 666           n_left_from -= 1;
 667         }
 668
 669        /* Enqueue buffers to threads */
 670        n_enq =
 671         vlib_buffer_enqueue_to_thread (vm, node, hmp->frame_queue_index,
 672                                        from, thread_indices, frame->n_vectors,
 673                                        1 /* drop on congestion */);
 674        /* Typical counters,
 675       if (n_enq < frame->n_vectors)
 676         vlib_node_increment_counter (vm, node->node_index,
 677                                  XXX_ERROR_CONGESTION_DROP,
 678                                  frame->n_vectors - n_enq);
 679       vlib_node_increment_counter (vm, node->node_index,
 680                                  XXX_ERROR_HANDED_OFF, n_enq);
 681       return frame->n_vectors;
 682 }
 683 ```
 684
 685 Notes about calling vlib_buffer_enqueue_to_thread(...):
 686
 687 * If you pass "drop on congestion" non-zero, all packets in the
 688 inbound frame will be consumed one way or the other. This is the
 689 recommended setting.
 690
 691 * In the drop-on-congestion case, please don't try to "help" in the
 692 enqueue node by freeing dropped packets, or by pushing them to
 693 "error-drop." Either of those actions would be a severe error.
 694
 695 * It's perfectly OK to enqueue packets to the current thread.
 696
 697 Handoff Demo Plugin
 698 -------------------
 699
 700 Check out the sample (plugin) example in
 701 .../src/examples/handoffdemo. If you want to build the handoff demo plugin:
 702
 703 ```
 704 $ cd .../src/plugins
 705 $ ln -s ../examples/handoffdemo
 706 ```
 707
 708 This plugin provides a simple example of how to hand off packets
 709 between threads. We used it to debug packet-tracer handoff tracing
 710 support.
 711
 712 # Packet generator input script
 713
 714 ```
 715  packet-generator new {
 716     name x
 717     limit 5
 718     size 128-128
 719     interface local0
 720     node handoffdemo-1
 721     data {
 722         incrementing 30
 723     }
 724  }
 725 ```
 726 # Start vpp with 2 worker threads
 727
 728 The demo plugin hands packets from worker 1 to worker 2.
 729
 730 # Enable tracing, and start the packet generator
 731
 732 ```
 733   trace add pg-input 100
 734   packet-generator enable
 735 ```
 736
 737 # Sample Run
 738
 739 ```
 740   DBGvpp# ex /tmp/pg_input_script
 741   DBGvpp# pa en
 742   DBGvpp# sh err
 743    Count                    Node                  Reason
 744          5              handoffdemo-1             packets handed off processed
 745          5              handoffdemo-2             completed packets
 746   DBGvpp# show run
 747   Thread 1 vpp_wk_0 (lcore 0)
 748   Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
 749     vector rates in 3.7331e-2, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
 750                Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
 751   handoffdemo-1                    active                  1               5               0          4.76e3            5.00
 752   pg-input                        disabled                 2               5               0          5.58e4            2.50
 753   unix-epoll-input                 polling             22760               0               0          2.14e7            0.00
 754   ---------------
 755   Thread 2 vpp_wk_1 (lcore 2)
 756   Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
 757     vector rates in 0.0000e0, out 0.0000e0, drop 3.7331e-2, punt 0.0000e0
 758                Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
 759   drop                             active                  1               5               0          1.35e4            5.00
 760   error-drop                       active                  1               5               0          2.52e4            5.00
 761   handoffdemo-2                    active                  1               5               0          2.56e4            5.00
 762   unix-epoll-input                 polling             22406               0               0          2.18e7            0.00
 763 ```
 764
 765 Enable the packet tracer and run it again...
 766
 767 ```
 768   DBGvpp# trace add pg-input 100
 769   DBGvpp# pa en
 770   DBGvpp# sh trace
 771   sh trace
 772   ------------------- Start of thread 0 vpp_main -------------------
 773   No packets in trace buffer
 774   ------------------- Start of thread 1 vpp_wk_0 -------------------
 775   Packet 1
 776
 777   00:06:50:520688: pg-input
 778     stream x, 128 bytes, 0 sw_if_index
 779     current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000000
 780     00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 781     00000020: 0000000000000000000000000000000000000000000000000000000000000000
 782     00000040: 0000000000000000000000000000000000000000000000000000000000000000
 783     00000060: 0000000000000000000000000000000000000000000000000000000000000000
 784   00:06:50:520762: handoffdemo-1
 785     HANDOFFDEMO: current thread 1
 786
 787   Packet 2
 788
 789   00:06:50:520688: pg-input
 790     stream x, 128 bytes, 0 sw_if_index
 791     current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000001
 792     00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 793     00000020: 0000000000000000000000000000000000000000000000000000000000000000
 794     00000040: 0000000000000000000000000000000000000000000000000000000000000000
 795     00000060: 0000000000000000000000000000000000000000000000000000000000000000
 796   00:06:50:520762: handoffdemo-1
 797     HANDOFFDEMO: current thread 1
 798
 799   Packet 3
 800
 801   00:06:50:520688: pg-input
 802     stream x, 128 bytes, 0 sw_if_index
 803     current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000002
 804     00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 805     00000020: 0000000000000000000000000000000000000000000000000000000000000000
 806     00000040: 0000000000000000000000000000000000000000000000000000000000000000
 807     00000060: 0000000000000000000000000000000000000000000000000000000000000000
 808   00:06:50:520762: handoffdemo-1
 809     HANDOFFDEMO: current thread 1
 810
 811   Packet 4
 812
 813   00:06:50:520688: pg-input
 814     stream x, 128 bytes, 0 sw_if_index
 815     current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000003
 816     00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 817     00000020: 0000000000000000000000000000000000000000000000000000000000000000
 818     00000040: 0000000000000000000000000000000000000000000000000000000000000000
 819     00000060: 0000000000000000000000000000000000000000000000000000000000000000
 820   00:06:50:520762: handoffdemo-1
 821     HANDOFFDEMO: current thread 1
 822
 823   Packet 5
 824
 825   00:06:50:520688: pg-input
 826     stream x, 128 bytes, 0 sw_if_index
 827     current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000004
 828     00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 829     00000020: 0000000000000000000000000000000000000000000000000000000000000000
 830     00000040: 0000000000000000000000000000000000000000000000000000000000000000
 831     00000060: 0000000000000000000000000000000000000000000000000000000000000000
 832   00:06:50:520762: handoffdemo-1
 833     HANDOFFDEMO: current thread 1
 834
 835   ------------------- Start of thread 2 vpp_wk_1 -------------------
 836   Packet 1
 837
 838   00:06:50:520796: handoff_trace
 839     HANDED-OFF: from thread 1 trace index 0
 840   00:06:50:520796: handoffdemo-2
 841     HANDOFFDEMO: current thread 2
 842   00:06:50:520867: error-drop
 843     rx:local0
 844   00:06:50:520914: drop
 845     handoffdemo-2: completed packets
 846
 847   Packet 2
 848
 849   00:06:50:520796: handoff_trace
 850     HANDED-OFF: from thread 1 trace index 1
 851   00:06:50:520796: handoffdemo-2
 852     HANDOFFDEMO: current thread 2
 853   00:06:50:520867: error-drop
 854     rx:local0
 855   00:06:50:520914: drop
 856     handoffdemo-2: completed packets
 857
 858   Packet 3
 859
 860   00:06:50:520796: handoff_trace
 861     HANDED-OFF: from thread 1 trace index 2
 862   00:06:50:520796: handoffdemo-2
 863     HANDOFFDEMO: current thread 2
 864   00:06:50:520867: error-drop
 865     rx:local0
 866   00:06:50:520914: drop
 867     handoffdemo-2: completed packets
 868
 869   Packet 4
 870
 871   00:06:50:520796: handoff_trace
 872     HANDED-OFF: from thread 1 trace index 3
 873   00:06:50:520796: handoffdemo-2
 874     HANDOFFDEMO: current thread 2
 875   00:06:50:520867: error-drop
 876     rx:local0
 877   00:06:50:520914: drop
 878     handoffdemo-2: completed packets
 879
 880   Packet 5
 881
 882   00:06:50:520796: handoff_trace
 883     HANDED-OFF: from thread 1 trace index 4
 884   00:06:50:520796: handoffdemo-2
 885     HANDOFFDEMO: current thread 2
 886   00:06:50:520867: error-drop
 887     rx:local0
 888   00:06:50:520914: drop
 889     handoffdemo-2: completed packets
 890  DBGvpp#
 891 ```