docs/developer/corearchitecture/vlib.rst

   1 VLIB (Vector Processing Library)
   2 ================================
   3
   4 The files associated with vlib are located in the ./src/{vlib, vlibapi,
   5 vlibmemory} folders. These libraries provide vector processing support
   6 including graph-node scheduling, reliable multicast support,
   7 ultra-lightweight cooperative multi-tasking threads, a CLI, plug in .DLL
   8 support, physical memory and Linux epoll support. Parts of this library
   9 embody US Patent 7,961,636.
  10
  11 Init function discovery
  12 -----------------------
  13
  14 vlib applications register for various [initialization] events by
  15 placing structures and \__attribute__((constructor)) functions into the
  16 image. At appropriate times, the vlib framework walks
  17 constructor-generated singly-linked structure lists, performs a
  18 topological sort based on specified constraints, and calls the indicated
  19 functions. Vlib applications create graph nodes, add CLI functions,
  20 start cooperative multi-tasking threads, etc. etc. using this mechanism.
  21
  22 vlib applications invariably include a number of VLIB_INIT_FUNCTION
  23 (my_init_function) macros.
  24
  25 Each init / configure / etc. function has the return type clib_error_t
  26 \*. Make sure that the function returns 0 if all is well, otherwise the
  27 framework will announce an error and exit.
  28
  29 vlib applications must link against vppinfra, and often link against
  30 other libraries such as VNET. In the latter case, it may be necessary to
  31 explicitly reference symbol(s) otherwise large portions of the library
  32 may be AWOL at runtime.
  33
  34 Init function construction and constraint specification
  35 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  36
  37 It’s easy to add an init function:
  38
  39 .. code:: c
  40
  41       static clib_error_t *my_init_function (vlib_main_t *vm)
  42       {
  43          /* ... initialize things ... */
  44
  45          return 0; // or return clib_error_return (0, "BROKEN!");
  46       }
  47       VLIB_INIT_FUNCTION(my_init_function);
  48
  49 As given, my_init_function will be executed “at some point,” but with no
  50 ordering guarantees.
  51
  52 Specifying ordering constraints is easy:
  53
  54 .. code:: c
  55
  56       VLIB_INIT_FUNCTION(my_init_function) =
  57       {
  58          .runs_before = VLIB_INITS("we_run_before_function_1",
  59                                    "we_run_before_function_2"),
  60          .runs_after = VLIB_INITS("we_run_after_function_1",
  61                                   "we_run_after_function_2),
  62        };
  63
  64 It’s also easy to specify bulk ordering constraints of the form “a then
  65 b then c then d”:
  66
  67 .. code:: c
  68
  69       VLIB_INIT_FUNCTION(my_init_function) =
  70       {
  71          .init_order = VLIB_INITS("a", "b", "c", "d"),
  72       };
  73
  74 It’s OK to specify all three sorts of ordering constraints for a single
  75 init function, although it’s hard to imagine why it would be necessary.
  76
  77 Node Graph Initialization
  78 -------------------------
  79
  80 vlib packet-processing applications invariably define a set of graph
  81 nodes to process packets.
  82
  83 One constructs a vlib_node_registration_t, most often via the
  84 VLIB_REGISTER_NODE macro. At runtime, the framework processes the set of
  85 such registrations into a directed graph. It is easy enough to add nodes
  86 to the graph at runtime. The framework does not support removing nodes.
  87
  88 vlib provides several types of vector-processing graph nodes, primarily
  89 to control framework dispatch behaviors. The type member of the
  90 vlib_node_registration_t functions as follows:
  91
  92 -  VLIB_NODE_TYPE_PRE_INPUT - run before all other node types
  93 -  VLIB_NODE_TYPE_INPUT - run as often as possible, after pre_input
  94    nodes
  95 -  VLIB_NODE_TYPE_INTERNAL - only when explicitly made runnable by
  96    adding pending frames for processing
  97 -  VLIB_NODE_TYPE_PROCESS - only when explicitly made runnable.
  98    “Process” nodes are actually cooperative multi-tasking threads. They
  99    **must** explicitly suspend after a reasonably short period of time.
 100
 101 For a precise understanding of the graph node dispatcher, please read
 102 ./src/vlib/main.c:vlib_main_loop.
 103
 104 Graph node dispatcher
 105 ---------------------
 106
 107 Vlib_main_loop() dispatches graph nodes. The basic vector processing
 108 algorithm is diabolically simple, but may not be obvious from even a
 109 long stare at the code. Here’s how it works: some input node, or set of
 110 input nodes, produce a vector of work to process. The graph node
 111 dispatcher pushes the work vector through the directed graph,
 112 subdividing it as needed, until the original work vector has been
 113 completely processed. At that point, the process recurs.
 114
 115 This scheme yields a stable equilibrium in frame size, by construction.
 116 Here’s why: as the frame size increases, the per-frame-element
 117 processing time decreases. There are several related forces at work; the
 118 simplest to describe is the effect of vector processing on the CPU L1
 119 I-cache. The first frame element [packet] processed by a given node
 120 warms up the node dispatch function in the L1 I-cache. All subsequent
 121 frame elements profit. As we increase the number of frame elements, the
 122 cost per element goes down.
 123
 124 Under light load, it is a crazy waste of CPU cycles to run the graph
 125 node dispatcher flat-out. So, the graph node dispatcher arranges to wait
 126 for work by sitting in a timed epoll wait if the prevailing frame size
 127 is low. The scheme has a certain amount of hysteresis to avoid
 128 constantly toggling back and forth between interrupt and polling mode.
 129 Although the graph dispatcher supports interrupt and polling modes, our
 130 current default device drivers do not.
 131
 132 The graph node scheduler uses a hierarchical timer wheel to reschedule
 133 process nodes upon timer expiration.
 134
 135 Graph dispatcher internals
 136 --------------------------
 137
 138 This section may be safely skipped. It’s not necessary to understand
 139 graph dispatcher internals to create graph nodes.
 140
 141 Vector Data Structure
 142 ---------------------
 143
 144 In vpp / vlib, we represent vectors as instances of the vlib_frame_t
 145 type:
 146
 147 .. code:: c
 148
 149        typedef struct vlib_frame_t
 150        {
 151          /* Frame flags. */
 152          u16 flags;
 153
 154          /* Number of scalar bytes in arguments. */
 155          u8 scalar_size;
 156
 157          /* Number of bytes per vector argument. */
 158          u8 vector_size;
 159
 160          /* Number of vector elements currently in frame. */
 161          u16 n_vectors;
 162
 163          /* Scalar and vector arguments to next node. */
 164          u8 arguments[0];
 165        } vlib_frame_t;
 166
 167 Note that one *could* construct all kinds of vectors - including vectors
 168 with some associated scalar data - using this structure. In the vpp
 169 application, vectors typically use a 4-byte vector element size, and
 170 zero bytes’ worth of associated per-frame scalar data.
 171
 172 Frames are always allocated on CLIB_CACHE_LINE_BYTES boundaries. Frames
 173 have u32 indices which make use of the alignment property, so the
 174 maximum feasible main heap offset of a frame is CLIB_CACHE_LINE_BYTES \*
 175 0xFFFFFFFF: 64*4 = 256 Gbytes.
 176
 177 Scheduling Vectors
 178 ------------------
 179
 180 As you can see, vectors are not directly associated with graph nodes. We
 181 represent that association in a couple of ways. The simplest is the
 182 vlib_pending_frame_t:
 183
 184 .. code:: c
 185
 186        /* A frame pending dispatch by main loop. */
 187        typedef struct
 188        {
 189          /* Node and runtime for this frame. */
 190          u32 node_runtime_index;
 191
 192          /* Frame index (in the heap). */
 193          u32 frame_index;
 194
 195          /* Start of next frames for this node. */
 196          u32 next_frame_index;
 197
 198          /* Special value for next_frame_index when there is no next frame. */
 199        #define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0)
 200        } vlib_pending_frame_t;
 201
 202 Here is the code in …/src/vlib/main.c:vlib_main_or_worker_loop() which
 203 processes frames:
 204
 205 .. code:: c
 206
 207          /*
 208           * Input nodes may have added work to the pending vector.
 209           * Process pending vector until there is nothing left.
 210           * All pending vectors will be processed from input -> output.
 211           */
 212          for (i = 0; i < _vec_len (nm->pending_frames); i++)
 213        cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
 214          /* Reset pending vector for next iteration. */
 215
 216 The pending frame node_runtime_index associates the frame with the node
 217 which will process it.
 218
 219 Complications
 220 -------------
 221
 222 Fasten your seatbelt. Here’s where the story - and the data structures -
 223 become quite complicated…
 224
 225 At 100,000 feet: vpp uses a directed graph, not a directed *acyclic*
 226 graph. It’s really quite normal for a packet to visit ip[46]-lookup
 227 multiple times. The worst-case: a graph node which enqueues packets to
 228 itself.
 229
 230 To deal with this issue, the graph dispatcher must force allocation of a
 231 new frame if the current graph node’s dispatch function happens to
 232 enqueue a packet back to itself.
 233
 234 There are no guarantees that a pending frame will be processed
 235 immediately, which means that more packets may be added to the
 236 underlying vlib_frame_t after it has been attached to a
 237 vlib_pending_frame_t. Care must be taken to allocate new frames and
 238 pending frames if a (pending_frame, frame) pair fills.
 239
 240 Next frames, next frame ownership
 241 ---------------------------------
 242
 243 The vlib_next_frame_t is the last key graph dispatcher data structure:
 244
 245 .. code:: c
 246
 247        typedef struct
 248        {
 249          /* Frame index. */
 250          u32 frame_index;
 251
 252          /* Node runtime for this next. */
 253          u32 node_runtime_index;
 254
 255          /* Next frame flags. */
 256          u32 flags;
 257
 258          /* Reflects node frame-used flag for this next. */
 259        #define VLIB_FRAME_NO_FREE_AFTER_DISPATCH \
 260          VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH
 261
 262          /* This next frame owns enqueue to node
 263             corresponding to node_runtime_index. */
 264        #define VLIB_FRAME_OWNER (1 << 15)
 265
 266          /* Set when frame has been allocated for this next. */
 267        #define VLIB_FRAME_IS_ALLOCATED VLIB_NODE_FLAG_IS_OUTPUT
 268
 269          /* Set when frame has been added to pending vector. */
 270        #define VLIB_FRAME_PENDING VLIB_NODE_FLAG_IS_DROP
 271
 272          /* Set when frame is to be freed after dispatch. */
 273        #define VLIB_FRAME_FREE_AFTER_DISPATCH VLIB_NODE_FLAG_IS_PUNT
 274
 275          /* Set when frame has traced packets. */
 276        #define VLIB_FRAME_TRACE VLIB_NODE_FLAG_TRACE
 277
 278          /* Number of vectors enqueue to this next since last overflow. */
 279          u32 vectors_since_last_overflow;
 280        } vlib_next_frame_t;
 281
 282 Graph node dispatch functions call vlib_get_next_frame (…) to set “(u32
 283 \*)to_next” to the right place in the vlib_frame_t corresponding to the
 284 ith arc (aka next0) from the current node to the indicated next node.
 285
 286 After some scuffling around - two levels of macros - processing reaches
 287 vlib_get_next_frame_internal (…). Get-next-frame-internal digs up the
 288 vlib_next_frame_t corresponding to the desired graph arc.
 289
 290 The next frame data structure amounts to a graph-arc-centric frame
 291 cache. Once a node finishes adding element to a frame, it will acquire a
 292 vlib_pending_frame_t and end up on the graph dispatcher’s run-queue. But
 293 there’s no guarantee that more vector elements won’t be added to the
 294 underlying frame from the same (source_node, next_index) arc or from a
 295 different (source_node, next_index) arc.
 296
 297 Maintaining consistency of the arc-to-frame cache is necessary. The
 298 first step in maintaining consistency is to make sure that only one
 299 graph node at a time thinks it “owns” the target vlib_frame_t.
 300
 301 Back to the graph node dispatch function. In the usual case, a certain
 302 number of packets will be added to the vlib_frame_t acquired by calling
 303 vlib_get_next_frame (…).
 304
 305 Before a dispatch function returns, it’s required to call
 306 vlib_put_next_frame (…) for all of the graph arcs it actually used. This
 307 action adds a vlib_pending_frame_t to the graph dispatcher’s pending
 308 frame vector.
 309
 310 Vlib_put_next_frame makes a note in the pending frame of the frame
 311 index, and also of the vlib_next_frame_t index.
 312
 313 dispatch_pending_node actions
 314 -----------------------------
 315
 316 The main graph dispatch loop calls dispatch pending node as shown above.
 317
 318 Dispatch_pending_node recovers the pending frame, and the graph node
 319 runtime / dispatch function. Further, it recovers the next_frame
 320 currently associated with the vlib_frame_t, and detaches the
 321 vlib_frame_t from the next_frame.
 322
 323 In …/src/vlib/main.c:dispatch_pending_node(…), note this stanza:
 324
 325 .. code:: c
 326
 327      /* Force allocation of new frame while current frame is being
 328         dispatched. */
 329      restore_frame_index = ~0;
 330      if (nf->frame_index == p->frame_index)
 331        {
 332          nf->frame_index = ~0;
 333          nf->flags &= ~VLIB_FRAME_IS_ALLOCATED;
 334          if (!(n->flags & VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH))
 335        restore_frame_index = p->frame_index;
 336        }
 337
 338 dispatch_pending_node is worth a hard stare due to the several
 339 second-order optimizations it implements. Almost as an afterthought, it
 340 calls dispatch_node which actually calls the graph node dispatch
 341 function.
 342
 343 Process / thread model
 344 ----------------------
 345
 346 vlib provides an ultra-lightweight cooperative multi-tasking thread
 347 model. The graph node scheduler invokes these processes in much the same
 348 way as traditional vector-processing run-to-completion graph nodes;
 349 plus-or-minus a setjmp/longjmp pair required to switch stacks. Simply
 350 set the vlib_node_registration_t type field to vlib_NODE_TYPE_PROCESS.
 351 Yes, process is a misnomer. These are cooperative multi-tasking threads.
 352
 353 As of this writing, the default stack size is 2<<15 = 32kb. Initialize
 354 the node registration’s process_log2_n_stack_bytes member as needed. The
 355 graph node dispatcher makes some effort to detect stack overrun, e.g. by
 356 mapping a no-access page below each thread stack.
 357
 358 Process node dispatch functions are expected to be “while(1) { }” loops
 359 which suspend when not otherwise occupied, and which must not run for
 360 unreasonably long periods of time.
 361
 362 “Unreasonably long” is an application-dependent concept. Over the years,
 363 we have constructed frame-size sensitive control-plane nodes which will
 364 use a much higher fraction of the available CPU bandwidth when the frame
 365 size is low. The classic example: modifying forwarding tables. So long
 366 as the table-builder leaves the forwarding tables in a valid state, one
 367 can suspend the table builder to avoid dropping packets as a result of
 368 control-plane activity.
 369
 370 Process nodes can suspend for fixed amounts of time, or until another
 371 entity signals an event, or both. See the next section for a description
 372 of the vlib process event mechanism.
 373
 374 When running in vlib process context, one must pay strict attention to
 375 loop invariant issues. If one walks a data structure and calls a
 376 function which may suspend, one had best know by construction that it
 377 cannot change. Often, it’s best to simply make a snapshot copy of a data
 378 structure, walk the copy at leisure, then free the copy.
 379
 380 Process events
 381 --------------
 382
 383 The vlib process event mechanism API is extremely lightweight and easy
 384 to use. Here is a typical example:
 385
 386 .. code:: c
 387
 388        vlib_main_t *vm = &vlib_global_main;
 389        uword event_type, * event_data = 0;
 390
 391        while (1)
 392        {
 393           vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
 394
 395           event_type = vlib_process_get_events (vm, &event_data);
 396
 397           switch (event_type) {
 398           case EVENT1:
 399               handle_event1s (event_data);
 400               break;
 401
 402           case EVENT2:
 403               handle_event2s (event_data);
 404               break;
 405
 406           case ~0: /* 5-second idle/periodic */
 407               handle_idle ();
 408               break;
 409
 410           default: /* bug! */
 411               ASSERT (0);
 412           }
 413
 414           vec_reset_length(event_data);
 415        }
 416
 417 In this example, the VLIB process node waits for an event to occur, or
 418 for 5 seconds to elapse. The code demuxes on the event type, calling the
 419 appropriate handler function. Each call to vlib_process_get_events
 420 returns a vector of per-event-type data passed to successive
 421 vlib_process_signal_event calls; it is a serious error to process only
 422 event_data[0].
 423
 424 Resetting the event_data vector-length to 0 [instead of calling
 425 vec_free] means that the event scheme doesn’t burn cycles continuously
 426 allocating and freeing the event data vector. This is a common vppinfra
 427 / vlib coding pattern, well worth using when appropriate.
 428
 429 Signaling an event is easy, for example:
 430
 431 .. code:: c
 432
 433        vlib_process_signal_event (vm, process_node_index, EVENT1,
 434            (uword)arbitrary_event1_data); /* and so forth */
 435
 436 One can either know the process node index by construction - dig it out
 437 of the appropriate vlib_node_registration_t - or by finding the
 438 vlib_node_t with vlib_get_node_by_name(…).
 439
 440 Buffers
 441 -------
 442
 443 vlib buffering solves the usual set of packet-processing problems,
 444 albeit at high performance. Key in terms of performance: one ordinarily
 445 allocates / frees N buffers at a time rather than one at a time. Except
 446 when operating directly on a specific buffer, one deals with buffers by
 447 index, not by pointer.
 448
 449 Packet-processing frames are u32[] arrays, not vlib_buffer_t[] arrays.
 450
 451 Packets comprise one or more vlib buffers, chained together as required.
 452 Multiple particle sizes are supported; hardware input nodes simply ask
 453 for the required size(s). Coalescing support is available. For obvious
 454 reasons one is discouraged from writing one’s own wild and wacky buffer
 455 chain traversal code.
 456
 457 vlib buffer headers are allocated immediately prior to the buffer data
 458 area. In typical packet processing this saves a dependent read wait:
 459 given a buffer’s address, one can prefetch the buffer header [metadata]
 460 at the same time as the first cache line of buffer data.
 461
 462 Buffer header metadata (vlib_buffer_t) includes the usual rewrite
 463 expansion space, a current_data offset, RX and TX interface indices,
 464 packet trace information, and a opaque areas.
 465
 466 The opaque data is intended to control packet processing in arbitrary
 467 subgraph-dependent ways. The programmer shoulders responsibility for
 468 data lifetime analysis, type-checking, etc.
 469
 470 Buffers have reference-counts in support of e.g. multicast replication.
 471
 472 Shared-memory message API
 473 -------------------------
 474
 475 Local control-plane and application processes interact with the vpp
 476 dataplane via asynchronous message-passing in shared memory over
 477 unidirectional queues. The same application APIs are available via
 478 sockets.
 479
 480 Capturing API traces and replaying them in a simulation environment
 481 requires a disciplined approach to the problem. This seems like a
 482 make-work task, but it is not. When something goes wrong in the
 483 control-plane after 300,000 or 3,000,000 operations, high-speed replay
 484 of the events leading up to the accident is a huge win.
 485
 486 The shared-memory message API message allocator vl_api_msg_alloc uses a
 487 particularly cute trick. Since messages are processed in order, we try
 488 to allocate message buffering from a set of fixed-size, preallocated
 489 rings. Each ring item has a “busy” bit. Freeing one of the preallocated
 490 message buffers merely requires the message consumer to clear the busy
 491 bit. No locking required.
 492
 493 Debug CLI
 494 ---------
 495
 496 Adding debug CLI commands to VLIB applications is very simple.
 497
 498 Here is a complete example:
 499
 500 .. code:: c
 501
 502        static clib_error_t *
 503        show_ip_tuple_match (vlib_main_t * vm,
 504                             unformat_input_t * input,
 505                             vlib_cli_command_t * cmd)
 506        {
 507            vlib_cli_output (vm, "%U\n", format_ip_tuple_match_tables, &routing_main);
 508            return 0;
 509        }
 510
 511        static VLIB_CLI_COMMAND (show_ip_tuple_command) =
 512        {
 513            .path = "show ip tuple match",
 514            .short_help = "Show ip 5-tuple match-and-broadcast tables",
 515            .function = show_ip_tuple_match,
 516        };
 517
 518 This example implements the “show ip tuple match” debug cli command. In
 519 ordinary usage, the vlib cli is available via the “vppctl” application,
 520 which sends traffic to a named pipe. One can configure debug CLI telnet
 521 access on a configurable port.
 522
 523 The cli implementation has an output redirection facility which makes it
 524 simple to deliver cli output via shared-memory API messaging,
 525
 526 Particularly for debug or “show tech support” type commands, it would be
 527 wasteful to write vlib application code to pack binary data, write more
 528 code elsewhere to unpack the data and finally print the answer. If a
 529 certain cli command has the potential to hurt packet processing
 530 performance by running for too long, do the work incrementally in a
 531 process node. The client can wait.
 532
 533 Macro expansion
 534 ~~~~~~~~~~~~~~~
 535
 536 The vpp debug CLI engine includes a recursive macro expander. This is
 537 quite useful for factoring out address and/or interface name specifics:
 538
 539 ::
 540
 541       define ip1 192.168.1.1/24
 542       define ip2 192.168.2.1/24
 543       define iface1 GigabitEthernet3/0/0
 544       define iface2 loop1
 545
 546       set int ip address $iface1 $ip1
 547       set int ip address $iface2 $(ip2)
 548
 549       undefine ip1
 550       undefine ip2
 551       undefine iface1
 552       undefine iface2
 553
 554 Each socket (or telnet) debug CLI session has its own macro tables. All
 555 debug CLI sessions which use CLI_INBAND binary API messages share a
 556 single table.
 557
 558 The macro expander recognizes circular definitions:
 559
 560 ::
 561
 562        define foo \$(bar)
 563        define bar \$(mumble)
 564        define mumble \$(foo)
 565
 566 At 8 levels of recursion, the macro expander throws up its hands and
 567 replies “CIRCULAR.”
 568
 569 Macro-related debug CLI commands
 570 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 571
 572 In addition to the “define” and “undefine” debug CLI commands, use “show
 573 macro [noevaluate]” to dump the macro table. The “echo” debug CLI
 574 command will evaluate and print its argument:
 575
 576 ::
 577
 578        vpp# define foo This\ Is\ Foo
 579        vpp# echo $foo
 580        This Is Foo
 581
 582 Handing off buffers between threads
 583 -----------------------------------
 584
 585 Vlib includes an easy-to-use mechanism for handing off buffers between
 586 worker threads. A typical use-case: software ingress flow hashing. At a
 587 high level, one creates a per-worker-thread queue which sends packets to
 588 a specific graph node in the indicated worker thread. With the queue in
 589 hand, enqueue packets to the worker thread of your choice.
 590
 591 Initialize a handoff queue
 592 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 593
 594 Simple enough, call vlib_frame_queue_main_init:
 595
 596 .. code:: c
 597
 598       main_ptr->frame_queue_index
 599           = vlib_frame_queue_main_init (dest_node.index, frame_queue_size);
 600
 601 Frame_queue_size means what it says: the number of frames which may be
 602 queued. Since frames contain 1…256 packets, frame_queue_size should be a
 603 reasonably small number (32…64). If the frame queue producer(s) are
 604 faster than the frame queue consumer(s), congestion will occur. Suggest
 605 letting the enqueue operator deal with queue congestion, as shown in the
 606 enqueue example below.
 607
 608 Under the floorboards, vlib_frame_queue_main_init creates an input queue
 609 for each worker thread.
 610
 611 Please do NOT create frame queues until it’s clear that they will be
 612 used. Although the main dispatch loop is reasonably smart about how
 613 often it polls the (entire set of) frame queues, polling unused frame
 614 queues is a waste of clock cycles.
 615
 616 Hand off packets
 617 ~~~~~~~~~~~~~~~~
 618
 619 The actual handoff mechanics are simple, and integrate nicely with a
 620 typical graph-node dispatch function:
 621
 622 .. code:: c
 623
 624        always_inline uword
 625        do_handoff_inline (vlib_main_t * vm,
 626                       vlib_node_runtime_t * node, vlib_frame_t * frame,
 627                       int is_ip4, int is_trace)
 628        {
 629          u32 n_left_from, *from;
 630          vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
 631          u16 thread_indices [VLIB_FRAME_SIZE];
 632          u16 nexts[VLIB_FRAME_SIZE], *next;
 633          u32 n_enq;
 634          htest_main_t *hmp = &htest_main;
 635          int i;
 636
 637          from = vlib_frame_vector_args (frame);
 638          n_left_from = frame->n_vectors;
 639
 640          vlib_get_buffers (vm, from, bufs, n_left_from);
 641          next = nexts;
 642          b = bufs;
 643
 644          /*
 645           * Typical frame traversal loop, details vary with
 646           * use case. Make sure to set thread_indices[i] with
 647           * the desired destination thread index. You may
 648           * or may not bother to set next[i].
 649           */
 650
 651          for (i = 0; i < frame->n_vectors; i++)
 652            {
 653              <snip>
 654              /* Pick a thread to handle this packet */
 655              thread_indices[i] = f (packet_data_or_whatever);
 656              <snip>
 657
 658              b += 1;
 659              next += 1;
 660              n_left_from -= 1;
 661            }
 662
 663           /* Enqueue buffers to threads */
 664           n_enq =
 665            vlib_buffer_enqueue_to_thread (vm, node, hmp->frame_queue_index,
 666                                           from, thread_indices, frame->n_vectors,
 667                                           1 /* drop on congestion */);
 668           /* Typical counters,
 669          if (n_enq < frame->n_vectors)
 670            vlib_node_increment_counter (vm, node->node_index,
 671                         XXX_ERROR_CONGESTION_DROP,
 672                         frame->n_vectors - n_enq);
 673          vlib_node_increment_counter (vm, node->node_index,
 674                             XXX_ERROR_HANDED_OFF, n_enq);
 675          return frame->n_vectors;
 676    }
 677
 678 Notes about calling vlib_buffer_enqueue_to_thread(…):
 679
 680 -  If you pass “drop on congestion” non-zero, all packets in the inbound
 681    frame will be consumed one way or the other. This is the recommended
 682    setting.
 683
 684 -  In the drop-on-congestion case, please don’t try to “help” in the
 685    enqueue node by freeing dropped packets, or by pushing them to
 686    “error-drop.” Either of those actions would be a severe error.
 687
 688 -  It’s perfectly OK to enqueue packets to the current thread.
 689
 690 Handoff Demo Plugin
 691 -------------------
 692
 693 Check out the sample (plugin) example in …/src/examples/handoffdemo. If
 694 you want to build the handoff demo plugin:
 695
 696 ::
 697
 698    $ cd .../src/plugins
 699    $ ln -s ../examples/handoffdemo
 700
 701 This plugin provides a simple example of how to hand off packets between
 702 threads. We used it to debug packet-tracer handoff tracing support.
 703
 704 Packet generator input script
 705 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 706
 707 ::
 708
 709     packet-generator new {
 710        name x
 711        limit 5
 712        size 128-128
 713        interface local0
 714        node handoffdemo-1
 715        data {
 716            incrementing 30
 717        }
 718     }
 719
 720 Start vpp with 2 worker threads
 721 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 722
 723 The demo plugin hands packets from worker 1 to worker 2.
 724
 725 Enable tracing, and start the packet generator
 726 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 727
 728 ::
 729
 730      trace add pg-input 100
 731      packet-generator enable
 732
 733 Sample Run
 734 ~~~~~~~~~~
 735
 736 ::
 737
 738      DBGvpp# ex /tmp/pg_input_script
 739      DBGvpp# pa en
 740      DBGvpp# sh err
 741       Count                    Node                  Reason
 742             5              handoffdemo-1             packets handed off processed
 743             5              handoffdemo-2             completed packets
 744      DBGvpp# show run
 745      Thread 1 vpp_wk_0 (lcore 0)
 746      Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
 747        vector rates in 3.7331e-2, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
 748                   Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
 749      handoffdemo-1                    active                  1               5               0          4.76e3            5.00
 750      pg-input                        disabled                 2               5               0          5.58e4            2.50
 751      unix-epoll-input                 polling             22760               0               0          2.14e7            0.00
 752      ---------------
 753      Thread 2 vpp_wk_1 (lcore 2)
 754      Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
 755        vector rates in 0.0000e0, out 0.0000e0, drop 3.7331e-2, punt 0.0000e0
 756                   Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
 757      drop                             active                  1               5               0          1.35e4            5.00
 758      error-drop                       active                  1               5               0          2.52e4            5.00
 759      handoffdemo-2                    active                  1               5               0          2.56e4            5.00
 760      unix-epoll-input                 polling             22406               0               0          2.18e7            0.00
 761
 762 Enable the packet tracer and run it again…
 763
 764 ::
 765
 766      DBGvpp# trace add pg-input 100
 767      DBGvpp# pa en
 768      DBGvpp# sh trace
 769      sh trace
 770      ------------------- Start of thread 0 vpp_main -------------------
 771      No packets in trace buffer
 772      ------------------- Start of thread 1 vpp_wk_0 -------------------
 773      Packet 1
 774
 775      00:06:50:520688: pg-input
 776        stream x, 128 bytes, 0 sw_if_index
 777        current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000000
 778        00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 779        00000020: 0000000000000000000000000000000000000000000000000000000000000000
 780        00000040: 0000000000000000000000000000000000000000000000000000000000000000
 781        00000060: 0000000000000000000000000000000000000000000000000000000000000000
 782      00:06:50:520762: handoffdemo-1
 783        HANDOFFDEMO: current thread 1
 784
 785      Packet 2
 786
 787      00:06:50:520688: pg-input
 788        stream x, 128 bytes, 0 sw_if_index
 789        current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000001
 790        00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 791        00000020: 0000000000000000000000000000000000000000000000000000000000000000
 792        00000040: 0000000000000000000000000000000000000000000000000000000000000000
 793        00000060: 0000000000000000000000000000000000000000000000000000000000000000
 794      00:06:50:520762: handoffdemo-1
 795        HANDOFFDEMO: current thread 1
 796
 797      Packet 3
 798
 799      00:06:50:520688: pg-input
 800        stream x, 128 bytes, 0 sw_if_index
 801        current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000002
 802        00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 803        00000020: 0000000000000000000000000000000000000000000000000000000000000000
 804        00000040: 0000000000000000000000000000000000000000000000000000000000000000
 805        00000060: 0000000000000000000000000000000000000000000000000000000000000000
 806      00:06:50:520762: handoffdemo-1
 807        HANDOFFDEMO: current thread 1
 808
 809      Packet 4
 810
 811      00:06:50:520688: pg-input
 812        stream x, 128 bytes, 0 sw_if_index
 813        current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000003
 814        00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 815        00000020: 0000000000000000000000000000000000000000000000000000000000000000
 816        00000040: 0000000000000000000000000000000000000000000000000000000000000000
 817        00000060: 0000000000000000000000000000000000000000000000000000000000000000
 818      00:06:50:520762: handoffdemo-1
 819        HANDOFFDEMO: current thread 1
 820
 821      Packet 5
 822
 823      00:06:50:520688: pg-input
 824        stream x, 128 bytes, 0 sw_if_index
 825        current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000004
 826        00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
 827        00000020: 0000000000000000000000000000000000000000000000000000000000000000
 828        00000040: 0000000000000000000000000000000000000000000000000000000000000000
 829        00000060: 0000000000000000000000000000000000000000000000000000000000000000
 830      00:06:50:520762: handoffdemo-1
 831        HANDOFFDEMO: current thread 1
 832
 833      ------------------- Start of thread 2 vpp_wk_1 -------------------
 834      Packet 1
 835
 836      00:06:50:520796: handoff_trace
 837        HANDED-OFF: from thread 1 trace index 0
 838      00:06:50:520796: handoffdemo-2
 839        HANDOFFDEMO: current thread 2
 840      00:06:50:520867: error-drop
 841        rx:local0
 842      00:06:50:520914: drop
 843        handoffdemo-2: completed packets
 844
 845      Packet 2
 846
 847      00:06:50:520796: handoff_trace
 848        HANDED-OFF: from thread 1 trace index 1
 849      00:06:50:520796: handoffdemo-2
 850        HANDOFFDEMO: current thread 2
 851      00:06:50:520867: error-drop
 852        rx:local0
 853      00:06:50:520914: drop
 854        handoffdemo-2: completed packets
 855
 856      Packet 3
 857
 858      00:06:50:520796: handoff_trace
 859        HANDED-OFF: from thread 1 trace index 2
 860      00:06:50:520796: handoffdemo-2
 861        HANDOFFDEMO: current thread 2
 862      00:06:50:520867: error-drop
 863        rx:local0
 864      00:06:50:520914: drop
 865        handoffdemo-2: completed packets
 866
 867      Packet 4
 868
 869      00:06:50:520796: handoff_trace
 870        HANDED-OFF: from thread 1 trace index 3
 871      00:06:50:520796: handoffdemo-2
 872        HANDOFFDEMO: current thread 2
 873      00:06:50:520867: error-drop
 874        rx:local0
 875      00:06:50:520914: drop
 876        handoffdemo-2: completed packets
 877
 878      Packet 5
 879
 880      00:06:50:520796: handoff_trace
 881        HANDED-OFF: from thread 1 trace index 4
 882      00:06:50:520796: handoffdemo-2
 883        HANDOFFDEMO: current thread 2
 884      00:06:50:520867: error-drop
 885        rx:local0
 886      00:06:50:520914: drop
 887        handoffdemo-2: completed packets
 888     DBGvpp#