--------------------------
This section may be safely skipped. It's not necessary to understand
-graph dispatcher internals to create graph nodes.
+graph dispatcher internals to create graph nodes.
Vector Data Structure
---------------------
which processes frames:
```c
- /*
+ /*
* Input nodes may have added work to the pending vector.
* Process pending vector until there is nothing left.
- * All pending vectors will be processed from input -> output.
+ * All pending vectors will be processed from input -> output.
*/
for (i = 0; i < _vec_len (nm->pending_frames); i++)
cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
After some scuffling around - two levels of macros - processing
reaches vlib\_get\_next\_frame_internal (...). Get-next-frame-internal
digs up the vlib\_next\_frame\_t corresponding to the desired graph
-arc.
+arc.
The next frame data structure amounts to a graph-arc-centric frame
cache. Once a node finishes adding element to a frame, it will acquire
a vlib_pending_frame_t and end up on the graph dispatcher's
run-queue. But there's no guarantee that more vector elements won't be
added to the underlying frame from the same (source\_node,
-next\_index) arc or from a different (source\_node, next\_index) arc.
+next\_index) arc or from a different (source\_node, next\_index) arc.
Maintaining consistency of the arc-to-frame cache is necessary. The
first step in maintaining consistency is to make sure that only one
Back to the graph node dispatch function. In the usual case, a certain
number of packets will be added to the vlib\_frame\_t acquired by
-calling vlib\_get\_next\_frame (...).
+calling vlib\_get\_next\_frame (...).
Before a dispatch function returns, it's required to call
vlib\_put\_next\_frame (...) for all of the graph arcs it actually
-------------------------------
The main graph dispatch loop calls dispatch pending node as shown
-above.
+above.
Dispatch\_pending\_node recovers the pending frame, and the graph node
runtime / dispatch function. Further, it recovers the next\_frame
currently associated with the vlib\_frame\_t, and detaches the
-vlib\_frame\_t from the next\_frame.
+vlib\_frame\_t from the next\_frame.
In .../src/vlib/main.c:dispatch\_pending\_node(...), note this stanza:
vlib_main_t *vm = &vlib_global_main;
uword event_type, * event_data = 0;
- while (1)
+ while (1)
{
vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
case EVENT2:
handle_event2s (event_data);
- break;
+ break;
case ~0: /* 5-second idle/periodic */
handle_idle ();
}
/* *INDENT-OFF* */
- static VLIB_CLI_COMMAND (show_ip_tuple_command) =
+ static VLIB_CLI_COMMAND (show_ip_tuple_command) =
{
.path = "show ip tuple match",
.short_help = "Show ip 5-tuple match-and-broadcast tables",
certain cli command has the potential to hurt packet processing
performance by running for too long, do the work incrementally in a
process node. The client can wait.
+
+Handing off buffers between threads
+-----------------------------------
+
+Vlib includes an easy-to-use mechanism for handing off buffers between
+worker threads. A typical use-case: software ingress flow hashing. At
+a high level, one creates a per-worker-thread queue which sends packets
+to a specific graph node in the indicated worker thread. With the
+queue in hand, enqueue packets to the worker thread of your choice.
+
+### Initialize a handoff queue
+
+Simple enough, call vlib_frame_queue_main_init:
+
+```c
+ main_ptr->frame_queue_index
+ = vlib_frame_queue_main_init (dest_node.index, frame_queue_size);
+```
+
+Frame_queue_size means what it says: the number of frames which may be
+queued. Since frames contain 1...256 packets, frame_queue_size should
+be a reasonably small number (32...64). If the frame queue producer(s)
+are faster than the frame queue consumer(s), congestion will
+occur. Suggest letting the enqueue operator deal with queue
+congestion, as shown in the enqueue example below.
+
+Under the floorboards, vlib_frame_queue_main_init creates an input queue
+for each worker thread.
+
+Please do NOT create frame queues until it's clear that they will be
+used. Although the main dispatch loop is reasonably smart about how
+often it polls the (entire set of) frame queues, polling unused frame
+queues is a waste of clock cycles.
+
+### Hand off packets
+
+The actual handoff mechanics are simple, and integrate nicely with
+a typical graph-node dispatch function:
+
+```c
+ always_inline uword
+ do_handoff_inline (vlib_main_t * vm,
+ vlib_node_runtime_t * node, vlib_frame_t * frame,
+ int is_ip4, int is_trace)
+ {
+ u32 n_left_from, *from;
+ vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
+ u16 thread_indices [VLIB_FRAME_SIZE];
+ u16 nexts[VLIB_FRAME_SIZE], *next;
+ u32 n_enq;
+ htest_main_t *hmp = &htest_main;
+ int i;
+
+ from = vlib_frame_vector_args (frame);
+ n_left_from = frame->n_vectors;
+
+ vlib_get_buffers (vm, from, bufs, n_left_from);
+ next = nexts;
+ b = bufs;
+
+ /*
+ * Typical frame traversal loop, details vary with
+ * use case. Make sure to set thread_indices[i] with
+ * the desired destination thread index. You may
+ * or may not bother to set next[i].
+ */
+
+ for (i = 0; i < frame->n_vectors; i++)
+ {
+ <snip>
+ /* Pick a thread to handle this packet */
+ thread_indices[i] = f (packet_data_or_whatever);
+ <snip>
+
+ b += 1;
+ next += 1;
+ n_left_from -= 1;
+ }
+
+ /* Enqueue buffers to threads */
+ n_enq =
+ vlib_buffer_enqueue_to_thread (vm, hmp->frame_queue_index,
+ from, thread_indices, frame->n_vectors,
+ 1 /* drop on congestion */);
+ /* Typical counters,
+ if (n_enq < frame->n_vectors)
+ vlib_node_increment_counter (vm, node->node_index,
+ XXX_ERROR_CONGESTION_DROP,
+ frame->n_vectors - n_enq);
+ vlib_node_increment_counter (vm, node->node_index,
+ XXX_ERROR_HANDED_OFF, n_enq);
+ return frame->n_vectors;
+}
+```
+
+Notes about calling vlib_buffer_enqueue_to_thread(...):
+
+* If you pass "drop on congestion" non-zero, all packets in the
+inbound frame will be consumed one way or the other. This is the
+recommended setting.
+
+* In the drop-on-congestion case, please don't try to "help" in the
+enqueue node by freeing dropped packets, or by pushing them to
+"error-drop." Either of those actions would be a severe error.
+
+* It's perfectly OK to enqueue packets to the current thread.