docs/developer/corearchitecture/vnet.rst

   1 VNET (VPP Network Stack)
   2 ========================
   3
   4 The files associated with the VPP network stack layer are located in the
   5 *./src/vnet* folder. The Network Stack Layer is basically an
   6 instantiation of the code in the other layers. This layer has a vnet
   7 library that provides vectorized layer-2 and 3 networking graph nodes, a
   8 packet generator, and a packet tracer.
   9
  10 In terms of building a packet processing application, vnet provides a
  11 platform-independent subgraph to which one connects a couple of
  12 device-driver nodes.
  13
  14 Typical RX connections include “ethernet-input” [full software
  15 classification, feeds ipv4-input, ipv6-input, arp-input etc.] and
  16 “ipv4-input-no-checksum” [if hardware can classify, perform ipv4 header
  17 checksum].
  18
  19 Effective graph dispatch function coding
  20 ----------------------------------------
  21
  22 Over the 15 years, multiple coding styles have emerged: a
  23 single/dual/quad loop coding model (with variations) and a
  24 fully-pipelined coding model.
  25
  26 Single/dual loops
  27 -----------------
  28
  29 The single/dual/quad loop model variations conveniently solve problems
  30 where the number of items to process is not known in advance: typical
  31 hardware RX-ring processing. This coding style is also very effective
  32 when a given node will not need to cover a complex set of dependent
  33 reads.
  34
  35 Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
  36 units to convert buffer indices to buffer pointers:
  37
  38 .. code:: c
  39
  40       static uword
  41       simulated_ethernet_interface_tx (vlib_main_t * vm,
  42                     vlib_node_runtime_t *
  43                     node, vlib_frame_t * frame)
  44       {
  45         u32 n_left_from, *from;
  46         u32 next_index = 0;
  47         u32 n_bytes;
  48         u32 thread_index = vm->thread_index;
  49         vnet_main_t *vnm = vnet_get_main ();
  50         vnet_interface_main_t *im = &vnm->interface_main;
  51         vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
  52         u16 nexts[VLIB_FRAME_SIZE], *next;
  53
  54         n_left_from = frame->n_vectors;
  55         from = vlib_frame_vector_args (frame);
  56
  57         /*
  58          * Convert up to VLIB_FRAME_SIZE indices in "from" to
  59          * buffer pointers in bufs[]
  60          */
  61         vlib_get_buffers (vm, from, bufs, n_left_from);
  62         b = bufs;
  63         next = nexts;
  64
  65         /*
  66          * While we have at least 4 vector elements (pkts) to process..
  67          */
  68         while (n_left_from >= 4)
  69           {
  70             /* Prefetch next quad-loop iteration. */
  71             if (PREDICT_TRUE (n_left_from >= 8))
  72           {
  73             vlib_prefetch_buffer_header (b[4], STORE);
  74             vlib_prefetch_buffer_header (b[5], STORE);
  75             vlib_prefetch_buffer_header (b[6], STORE);
  76             vlib_prefetch_buffer_header (b[7], STORE);
  77               }
  78
  79             /*
  80              * $$$ Process 4x packets right here...
  81              * set next[0..3] to send the packets where they need to go
  82              */
  83
  84              do_something_to (b[0]);
  85              do_something_to (b[1]);
  86              do_something_to (b[2]);
  87              do_something_to (b[3]);
  88
  89             /* Process the next 0..4 packets */
  90         b += 4;
  91         next += 4;
  92         n_left_from -= 4;
  93        }
  94         /*
  95          * Clean up 0...3 remaining packets at the end of the incoming frame
  96          */
  97         while (n_left_from > 0)
  98           {
  99             /*
 100              * $$$ Process one packet right here...
 101              * set next[0..3] to send the packets where they need to go
 102              */
 103              do_something_to (b[0]);
 104
 105             /* Process the next packet */
 106             b += 1;
 107             next += 1;
 108             n_left_from -= 1;
 109           }
 110
 111         /*
 112          * Send the packets along their respective next-node graph arcs
 113          * Considerable locality of reference is expected, most if not all
 114          * packets in the inbound vector will traverse the same next-node
 115          * arc
 116          */
 117         vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
 118
 119         return frame->n_vectors;
 120       }
 121
 122 Given a packet processing task to implement, it pays to scout around
 123 looking for similar tasks, and think about using the same coding
 124 pattern. It is not uncommon to recode a given graph node dispatch
 125 function several times during performance optimization.
 126
 127 Creating Packets from Scratch
 128 -----------------------------
 129
 130 At times, it’s necessary to create packets from scratch and send them.
 131 Tasks like sending keepalives or actively opening connections come to
 132 mind. Its not difficult, but accurate buffer metadata setup is required.
 133
 134 Allocating Buffers
 135 ~~~~~~~~~~~~~~~~~~
 136
 137 Use vlib_buffer_alloc, which allocates a set of buffer indices. For
 138 low-performance applications, it’s OK to allocate one buffer at a time.
 139 Note that vlib_buffer_alloc(…) does NOT initialize buffer metadata. See
 140 below.
 141
 142 In high-performance cases, allocate a vector of buffer indices, and hand
 143 them out from the end of the vector; decrement \_vec_len(..) as buffer
 144 indices are allocated. See tcp_alloc_tx_buffers(…) and
 145 tcp_get_free_buffer_index(…) for an example.
 146
 147 Buffer Initialization Example
 148 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 149
 150 The following example shows the **main points**, but is not to be
 151 blindly cut-’n-pasted.
 152
 153 .. code:: c
 154
 155      u32 bi0;
 156      vlib_buffer_t *b0;
 157      ip4_header_t *ip;
 158      udp_header_t *udp;
 159
 160      /* Allocate a buffer */
 161      if (vlib_buffer_alloc (vm, &bi0, 1) != 1)
 162        return -1;
 163
 164      b0 = vlib_get_buffer (vm, bi0);
 165
 166      /* At this point b0->current_data = 0, b0->current_length = 0 */
 167
 168      /*
 169       * Copy data into the buffer. This example ASSUMES that data will fit
 170       * in a single buffer, and is e.g. an ip4 packet.
 171       */
 172      if (have_packet_rewrite)
 173         {
 174           clib_memcpy (b0->data, data, vec_len (data));
 175           b0->current_length = vec_len (data);
 176         }
 177      else
 178         {
 179           /* OR, build a udp-ip packet (for example) */
 180           ip = vlib_buffer_get_current (b0);
 181           udp = (udp_header_t *) (ip + 1);
 182           data_dst = (u8 *) (udp + 1);
 183
 184           ip->ip_version_and_header_length = 0x45;
 185           ip->ttl = 254;
 186           ip->protocol = IP_PROTOCOL_UDP;
 187           ip->length = clib_host_to_net_u16 (sizeof (*ip) + sizeof (*udp) +
 188                      vec_len(udp_data));
 189           ip->src_address.as_u32 = src_address->as_u32;
 190           ip->dst_address.as_u32 = dst_address->as_u32;
 191           udp->src_port = clib_host_to_net_u16 (src_port);
 192           udp->dst_port = clib_host_to_net_u16 (dst_port);
 193           udp->length = clib_host_to_net_u16 (vec_len (udp_data));
 194           clib_memcpy (data_dst, udp_data, vec_len(udp_data));
 195
 196           if (compute_udp_checksum)
 197             {
 198               /* RFC 7011 section 10.3.2. */
 199               udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip);
 200               if (udp->checksum == 0)
 201                 udp->checksum = 0xffff;
 202          }
 203          b0->current_length = vec_len (sizeof (*ip) + sizeof (*udp) +
 204                                       vec_len (udp_data));
 205
 206        }
 207      b0->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID;
 208
 209      /* sw_if_index 0 is the "local" interface, which always exists */
 210      vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0;
 211
 212      /* Use the default FIB index for tx lookup. Set non-zero to use another fib */
 213      vnet_buffer (b0)->sw_if_index[VLIB_TX] = 0;
 214
 215 If your use-case calls for large packet transmission, use
 216 vlib_buffer_chain_append_data_with_alloc(…) to create the requisite
 217 buffer chain.
 218
 219 Enqueueing packets for lookup and transmission
 220 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 221
 222 The simplest way to send a set of packets is to use
 223 vlib_get_frame_to_node(…) to allocate fresh frame(s) to ip4_lookup_node
 224 or ip6_lookup_node, add the constructed buffer indices, and dispatch the
 225 frame using vlib_put_frame_to_node(…).
 226
 227 .. code:: c
 228
 229        vlib_frame_t *f;
 230        f = vlib_get_frame_to_node (vm, ip4_lookup_node.index);
 231        f->n_vectors = vec_len(buffer_indices_to_send);
 232        to_next = vlib_frame_vector_args (f);
 233
 234        for (i = 0; i < vec_len (buffer_indices_to_send); i++)
 235          to_next[i] = buffer_indices_to_send[i];
 236
 237        vlib_put_frame_to_node (vm, ip4_lookup_node_index, f);
 238
 239 It is inefficient to allocate and schedule single packet frames. That’s
 240 typical in case you need to send one packet per second, but should
 241 **not** occur in a for-loop!
 242
 243 Packet tracer
 244 -------------
 245
 246 Vlib includes a frame element [packet] trace facility, with a simple
 247 debug CLI interface. The cli is straightforward: “trace add
 248 input-node-name count” to start capturing packet traces.
 249
 250 To trace 100 packets on a typical x86_64 system running the dpdk plugin:
 251 “trace add dpdk-input 100”. When using the packet generator: “trace add
 252 pg-input 100”
 253
 254 To display the packet trace: “show trace”
 255
 256 Each graph node has the opportunity to capture its own trace data. It is
 257 almost always a good idea to do so. The trace capture APIs are simple.
 258
 259 The packet capture APIs snapshoot binary data, to minimize processing at
 260 capture time. Each participating graph node initialization provides a
 261 vppinfra format-style user function to pretty-print data when required
 262 by the VLIB “show trace” command.
 263
 264 Set the VLIB node registration “.format_trace” member to the name of the
 265 per-graph node format function.
 266
 267 Here’s a simple example:
 268
 269 .. code:: c
 270
 271        u8 * my_node_format_trace (u8 * s, va_list * args)
 272        {
 273            vlib_main_t * vm = va_arg (*args, vlib_main_t *);
 274            vlib_node_t * node = va_arg (*args, vlib_node_t *);
 275            my_node_trace_t * t = va_arg (*args, my_trace_t *);
 276
 277            s = format (s, "My trace data was: %d", t-><whatever>);
 278
 279            return s;
 280        }
 281
 282 The trace framework hands the per-node format function the data it
 283 captured as the packet whizzed by. The format function pretty-prints the
 284 data as desired.
 285
 286 Graph Dispatcher Pcap Tracing
 287 -----------------------------
 288
 289 The vpp graph dispatcher knows how to capture vectors of packets in pcap
 290 format as they’re dispatched. The pcap captures are as follows:
 291
 292 ::
 293
 294        VPP graph dispatch trace record description:
 295
 296            0                   1                   2                   3
 297            0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 298           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 299           | Major Version | Minor Version | NStrings      | ProtoHint     |
 300           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 301           | Buffer index (big endian)                                     |
 302           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 303           + VPP graph node name ...     ...               | NULL octet    |
 304           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 305           | Buffer Metadata ... ...                       | NULL octet    |
 306           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 307           | Buffer Opaque ... ...                         | NULL octet    |
 308           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 309           | Buffer Opaque 2 ... ...                       | NULL octet    |
 310           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 311           | VPP ASCII packet trace (if NStrings > 4)      | NULL octet    |
 312           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 313           | Packet data (up to 16K)                                       |
 314           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 315
 316 Graph dispatch records comprise a version stamp, an indication of how
 317 many NULL-terminated strings will follow the record header and preceed
 318 packet data, and a protocol hint.
 319
 320 The buffer index is an opaque 32-bit cookie which allows consumers of
 321 these data to easily filter/track single packets as they traverse the
 322 forwarding graph.
 323
 324 Multiple records per packet are normal, and to be expected. Packets will
 325 appear multiple times as they traverse the vpp forwarding graph. In this
 326 way, vpp graph dispatch traces are significantly different from regular
 327 network packet captures from an end-station. This property complicates
 328 stateful packet analysis.
 329
 330 Restricting stateful analysis to records from a single vpp graph node
 331 such as “ethernet-input” seems likely to improve the situation.
 332
 333 As of this writing: major version = 1, minor version = 0. Nstrings
 334 SHOULD be 4 or 5. Consumers SHOULD be wary values less than 4 or greater
 335 than 5. They MAY attempt to display the claimed number of strings, or
 336 they MAY treat the condition as an error.
 337
 338 Here is the current set of protocol hints:
 339
 340 .. code:: c
 341
 342        typedef enum
 343          {
 344            VLIB_NODE_PROTO_HINT_NONE = 0,
 345            VLIB_NODE_PROTO_HINT_ETHERNET,
 346            VLIB_NODE_PROTO_HINT_IP4,
 347            VLIB_NODE_PROTO_HINT_IP6,
 348            VLIB_NODE_PROTO_HINT_TCP,
 349            VLIB_NODE_PROTO_HINT_UDP,
 350            VLIB_NODE_N_PROTO_HINTS,
 351          } vlib_node_proto_hint_t;
 352
 353 Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet
 354 data SHOULD be 0x60, and should begin an ipv6 packet header.
 355
 356 Downstream consumers of these data SHOULD pay attention to the protocol
 357 hint. They MUST tolerate inaccurate hints, which MAY occur from time to
 358 time.
 359
 360 Dispatch Pcap Trace Debug CLI
 361 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 362
 363 To start a dispatch trace capture of up to 10,000 trace records:
 364
 365 ::
 366
 367         pcap dispatch trace on max 10000 file dispatch.pcap
 368
 369 To start a dispatch trace which will also include standard vpp packet
 370 tracing for packets which originate in dpdk-input:
 371
 372 ::
 373
 374         pcap dispatch trace on max 10000 file dispatch.pcap buffer-trace dpdk-input 1000
 375
 376 To save the pcap trace, e.g. in /tmp/dispatch.pcap:
 377
 378 ::
 379
 380        pcap dispatch trace off
 381
 382 Wireshark dissection of dispatch pcap traces
 383 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 384
 385 It almost goes without saying that we built a companion wireshark
 386 dissector to display these traces. As of this writing, we have
 387 upstreamed the wireshark dissector.
 388
 389 Since it will be a while before wireshark/master/latest makes it into
 390 all of the popular Linux distros, please see the “How to build a vpp
 391 dispatch trace aware Wireshark” page for build info.
 392
 393 Here is a sample packet dissection, with some fields omitted for
 394 clarity. The point is that the wireshark dissector accurately displays
 395 **all** of the vpp buffer metadata, and the name of the graph node in
 396 question.
 397
 398 ::
 399
 400        Frame 1: 2216 bytes on wire (17728 bits), 2216 bytes captured (17728 bits)
 401            Encapsulation type: USER 13 (58)
 402            [Protocols in frame: vpp:vpp-metadata:vpp-opaque:vpp-opaque2:eth:ethertype:ip:tcp:data]
 403        VPP Dispatch Trace
 404            BufferIndex: 0x00036663
 405        NodeName: ethernet-input
 406        VPP Buffer Metadata
 407            Metadata: flags:
 408            Metadata: current_data: 0, current_length: 102
 409            Metadata: current_config_index: 0, flow_id: 0, next_buffer: 0
 410            Metadata: error: 0, n_add_refs: 0, buffer_pool_index: 0
 411            Metadata: trace_index: 0, recycle_count: 0, len_not_first_buf: 0
 412            Metadata: free_list_index: 0
 413            Metadata:
 414        VPP Buffer Opaque
 415            Opaque: raw: 00000007 ffffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 416            Opaque: sw_if_index[VLIB_RX]: 7, sw_if_index[VLIB_TX]: -1
 417            Opaque: L2 offset 0, L3 offset 0, L4 offset 0, feature arc index 0
 418            Opaque: ip.adj_index[VLIB_RX]: 0, ip.adj_index[VLIB_TX]: 0
 419            Opaque: ip.flow_hash: 0x0, ip.save_protocol: 0x0, ip.fib_index: 0
 420            Opaque: ip.save_rewrite_length: 0, ip.rpf_id: 0
 421            Opaque: ip.icmp.type: 0 ip.icmp.code: 0, ip.icmp.data: 0x0
 422            Opaque: ip.reass.next_index: 0, ip.reass.estimated_mtu: 0
 423            Opaque: ip.reass.fragment_first: 0 ip.reass.fragment_last: 0
 424            Opaque: ip.reass.range_first: 0 ip.reass.range_last: 0
 425            Opaque: ip.reass.next_range_bi: 0x0, ip.reass.ip6_frag_hdr_offset: 0
 426            Opaque: mpls.ttl: 0, mpls.exp: 0, mpls.first: 0, mpls.save_rewrite_length: 0, mpls.bier.n_bytes: 0
 427            Opaque: l2.feature_bitmap: 00000000, l2.bd_index: 0, l2.l2_len: 0, l2.shg: 0, l2.l2fib_sn: 0, l2.bd_age: 0
 428            Opaque: l2.feature_bitmap_input:   none configured, L2.feature_bitmap_output:   none configured
 429            Opaque: l2t.next_index: 0, l2t.session_index: 0
 430            Opaque: l2_classify.table_index: 0, l2_classify.opaque_index: 0, l2_classify.hash: 0x0
 431            Opaque: policer.index: 0
 432            Opaque: ipsec.flags: 0x0, ipsec.sad_index: 0
 433            Opaque: map.mtu: 0
 434            Opaque: map_t.v6.saddr: 0x0, map_t.v6.daddr: 0x0, map_t.v6.frag_offset: 0, map_t.v6.l4_offset: 0
 435            Opaque: map_t.v6.l4_protocol: 0, map_t.checksum_offset: 0, map_t.mtu: 0
 436            Opaque: ip_frag.mtu: 0, ip_frag.next_index: 0, ip_frag.flags: 0x0
 437            Opaque: cop.current_config_index: 0
 438            Opaque: lisp.overlay_afi: 0
 439            Opaque: tcp.connection_index: 0, tcp.seq_number: 0, tcp.seq_end: 0, tcp.ack_number: 0, tcp.hdr_offset: 0, tcp.data_offset: 0
 440            Opaque: tcp.data_len: 0, tcp.flags: 0x0
 441            Opaque: sctp.connection_index: 0, sctp.sid: 0, sctp.ssn: 0, sctp.tsn: 0, sctp.hdr_offset: 0
 442            Opaque: sctp.data_offset: 0, sctp.data_len: 0, sctp.subconn_idx: 0, sctp.flags: 0x0
 443            Opaque: snat.flags: 0x0
 444            Opaque:
 445        VPP Buffer Opaque2
 446            Opaque2: raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 447            Opaque2: qos.bits: 0, qos.source: 0
 448            Opaque2: loop_counter: 0
 449            Opaque2: gbp.flags: 0, gbp.src_epg: 0
 450            Opaque2: pg_replay_timestamp: 0
 451            Opaque2:
 452        Ethernet II, Src: 06:d6:01:41:3b:92 (06:d6:01:41:3b:92), Dst: IntelCor_3d:f6    Transmission Control Protocol, Src Port: 22432, Dst Port: 54084, Seq: 1, Ack: 1, Len: 36
 453            Source Port: 22432
 454            Destination Port: 54084
 455            TCP payload (36 bytes)
 456        Data (36 bytes)
 457
 458        0000  cf aa 8b f5 53 14 d4 c7 29 75 3e 56 63 93 9d 11   ....S...)u>Vc...
 459        0010  e5 f2 92 27 86 56 4c 21 ce c5 23 46 d7 eb ec 0d   ...'.VL!..#F....
 460        0020  a8 98 36 5a                                       ..6Z
 461            Data: cfaa8bf55314d4c729753e5663939d11e5f2922786564c21…
 462            [Length: 36]
 463
 464 It’s a matter of a couple of mouse-clicks in Wireshark to filter the
 465 trace to a specific buffer index. With that specific kind of filtration,
 466 one can watch a packet walk through the forwarding graph; noting any/all
 467 metadata changes, header checksum changes, and so forth.
 468
 469 This should be of significant value when developing new vpp graph nodes.
 470 If new code mispositions b->current_data, it will be completely obvious
 471 from looking at the dispatch trace in wireshark.
 472
 473 pcap rx, tx, and drop tracing
 474 -----------------------------
 475
 476 vpp also supports rx, tx, and drop packet capture in pcap format,
 477 through the “pcap trace” debug CLI command.
 478
 479 This command is used to start or stop a packet capture, or show the
 480 status of packet capture. Each of “pcap trace rx”, “pcap trace tx”, and
 481 “pcap trace drop” is implemented. Supply one or more of “rx”, “tx”, and
 482 “drop” to enable multiple simultaneous capture types.
 483
 484 These commands have the following optional parameters:
 485
 486 -  rx - trace received packets.
 487
 488 -  tx - trace transmitted packets.
 489
 490 -  drop - trace dropped packets.
 491
 492 -  max *nnnn*\  - file size, number of packet captures. Once packets
 493    have been received, the trace buffer buffer is flushed to the
 494    indicated file. Defaults to 1000. Can only be updated if packet
 495    capture is off.
 496
 497 -  max-bytes-per-pkt *nnnn*\  - maximum number of bytes to trace on a
 498    per-packet basis. Must be >32 and less than 9000. Default value:
 499
 500    512.
 501
 502 -  filter - Use the pcap rx / tx / drop trace filter, which must be
 503    configured. Use classify filter pcap… to configure the filter. The
 504    filter will only be executed if the per-interface or any-interface
 505    tests fail.
 506
 507 -  intfc *interface* \| *any*\  - Used to specify a given interface, or
 508    use ‘any’ to run packet capture on all interfaces. ‘any’ is the
 509    default if not provided. Settings from a previous packet capture are
 510    preserved, so ‘any’ can be used to reset the interface setting.
 511
 512 -  file *filename*\  - Used to specify the output filename. The file
 513    will be placed in the ‘/tmp’ directory. If *filename* already exists,
 514    file will be overwritten. If no filename is provided, ‘/tmp/rx.pcap
 515    or tx.pcap’ will be used, depending on capture direction. Can only be
 516    updated when pcap capture is off.
 517
 518 -  status - Displays the current status and configured attributes
 519    associated with a packet capture. If packet capture is in progress,
 520    ‘status’ also will return the number of packets currently in the
 521    buffer. Any additional attributes entered on command line with a
 522    ‘status’ request will be ignored.
 523
 524 -  filter - Capture packets which match the current packet trace filter
 525    set. See next section. Configure the capture filter first.
 526
 527 packet trace capture filtering
 528 ------------------------------
 529
 530 The “classify filter pcap \| \| trace” debug CLI command constructs an
 531 arbitrary set of packet classifier tables for use with “pcap rx \| tx \|
 532 drop trace,” and with the vpp packet tracer on a per-interface or
 533 system-wide basis.
 534
 535 Packets which match a rule in the classifier table chain will be traced.
 536 The tables are automatically ordered so that matches in the most
 537 specific table are tried first.
 538
 539 It’s reasonably likely that folks will configure a single table with one
 540 or two matches. As a result, we configure 8 hash buckets and 128K of
 541 match rule space by default. One can override the defaults by specifying
 542 “buckets ” and “memory-size ” as desired.
 543
 544 To build up complex filter chains, repeatedly issue the classify filter
 545 debug CLI command. Each command must specify the desired mask and match
 546 values. If a classifier table with a suitable mask already exists, the
 547 CLI command adds a match rule to the existing table. If not, the CLI
 548 command add a new table and the indicated mask rule
 549
 550 Configure a simple pcap classify filter
 551 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 552
 553 ::
 554
 555        classify filter pcap mask l3 ip4 src match l3 ip4 src 192.168.1.11
 556        pcap trace rx max 100 filter
 557
 558 Configure a simple per-interface capture filter
 559 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 560
 561 ::
 562
 563        classify filter GigabitEthernet3/0/0 mask l3 ip4 src match l3 ip4 src 192.168.1.11"
 564        pcap trace rx max 100 intfc GigabitEthernet3/0/0
 565
 566 Note that per-interface capture filters are *always* applied.
 567
 568 Clear per-interface capture filters
 569 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 570
 571 ::
 572
 573        classify filter GigabitEthernet3/0/0 del
 574
 575 Configure another fairly simple pcap classify filter
 576 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 577
 578 ::
 579
 580       classify filter pcap mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
 581       pcap trace tx max 100 filter
 582
 583 Configure a vpp packet tracer filter
 584 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 585
 586 ::
 587
 588       classify filter trace mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
 589       trace add dpdk-input 100 filter
 590
 591 Clear all current classifier filters
 592 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 593
 594 ::
 595
 596        classify filter [pcap | <interface> | trace] del
 597
 598 To inspect the classifier tables
 599 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 600
 601 ::
 602
 603       show classify table [verbose]
 604
 605 The verbose form displays all of the match rules, with hit-counters.
 606
 607 Terse description of the “mask ” syntax:
 608 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 609
 610 ::
 611
 612        l2 src dst proto tag1 tag2 ignore-tag1 ignore-tag2 cos1 cos2 dot1q dot1ad
 613        l3 ip4 <ip4-mask> ip6 <ip6-mask>
 614        <ip4-mask> version hdr_length src[/width] dst[/width]
 615                   tos length fragment_id ttl protocol checksum
 616        <ip6-mask> version traffic-class flow-label src dst proto
 617                   payload_length hop_limit protocol
 618        l4 tcp <tcp-mask> udp <udp_mask> src_port dst_port
 619        <tcp-mask> src dst  # ports
 620        <udp-mask> src_port dst_port
 621
 622 To construct **matches**, add the values to match after the indicated
 623 keywords in the mask syntax. For example: “… mask l3 ip4 src” -> “…
 624 match l3 ip4 src 192.168.1.11”
 625
 626 VPP Packet Generator
 627 --------------------
 628
 629 We use the VPP packet generator to inject packets into the forwarding
 630 graph. The packet generator can replay pcap traces, and generate packets
 631 out of whole cloth at respectably high performance.
 632
 633 The VPP pg enables quite a variety of use-cases, ranging from functional
 634 testing of new data-plane nodes to regression testing to performance
 635 tuning.
 636
 637 PG setup scripts
 638 ----------------
 639
 640 PG setup scripts describe traffic in detail, and leverage vpp debug CLI
 641 mechanisms. It’s reasonably unusual to construct a pg setup script which
 642 doesn’t include a certain amount of interface and FIB configuration.
 643
 644 For example:
 645
 646 ::
 647
 648        loop create
 649        set int ip address loop0 192.168.1.1/24
 650        set int state loop0 up
 651
 652        packet-generator new {
 653            name pg0
 654            limit 100
 655            rate 1e6
 656            size 300-300
 657            interface loop0
 658            node ethernet-input
 659            data { IP4: 1.2.3 -> 4.5.6
 660                   UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10
 661                   UDP: 1234 -> 2345
 662                   incrementing 286
 663            }
 664        }
 665
 666 A packet generator stream definition includes two major sections: -
 667 Stream Parameter Setup - Packet Data
 668
 669 Stream Parameter Setup
 670 ~~~~~~~~~~~~~~~~~~~~~~
 671
 672 Given the example above, let’s look at how to set up stream parameters:
 673
 674 -  **name pg0** - Name of the stream, in this case “pg0”
 675
 676 -  **limit 1000** - Number of packets to send when the stream is
 677    enabled. “limit 0” means send packets continuously.
 678
 679 -  **maxframe <nnn>** - Maximum frame size. Handy for injecting multiple
 680    frames no larger than <nnn>. Useful for checking dual / quad loop
 681    codes
 682
 683 -  **rate 1e6** - Packet injection rate, in this case 1 MPPS. When not
 684    specified, the packet generator injects packets as fast as possible
 685
 686 -  **size 300-300** - Packet size range, in this case send 300-byte
 687    packets
 688
 689 -  **interface loop0** - Packets appear as if they were received on the
 690    specified interface. This datum is used in multiple ways: to select
 691    graph arc feature configuration, to select IP FIBs. Configure
 692    features e.g. on loop0 to exercise those features.
 693
 694 -  **tx-interface <name>** - Packets will be transmitted on the
 695    indicated interface. Typically required only when injecting packets
 696    into post-IP-rewrite graph nodes.
 697
 698 -  **pcap <filename>** - Replay packets from the indicated pcap capture
 699    file. “make test” makes extensive use of this feature: generate
 700    packets using scapy, save them in a .pcap file, then inject them into
 701    the vpp graph via a vpp pg “pcap <filename>” stream definition
 702
 703 -  **worker <nn>** - Generate packets for the stream using the indicated
 704    vpp worker thread. The vpp pg generates and injects O(10 MPPS /
 705    core). Use multiple stream definitions and worker threads to generate
 706    and inject enough traffic to easily fill a 40 gbit pipe with small
 707    packets.
 708
 709 Data definition
 710 ~~~~~~~~~~~~~~~
 711
 712 Packet generator data definitions make use of a layered implementation
 713 strategy. Networking layers are specified in order, and the notation can
 714 seem a bit counter-intuitive. In the example above, the data definition
 715 stanza constructs a set of L2-L4 headers layers, and uses an
 716 incrementing fill pattern to round out the requested 300-byte packets.
 717
 718 -  **IP4: 1.2.3 -> 4.5.6** - Construct an L2 (MAC) header with the ip4
 719    ethertype (0x800), src MAC address of 00:01:00:02:00:03 and dst MAC
 720    address of 00:04:00:05:00:06. Mac addresses may be specified in
 721    either *xxxx.xxxx.xxxx* format or *xx:xx:xx:xx:xx:xx* format.
 722
 723 -  **UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10** - Construct an
 724    incrementing set of L3 (IPv4) headers for successive packets with
 725    source addresses ranging from .10 to .254. All packets in the stream
 726    have a constant dest address of 192.168.2.10. Set the protocol field
 727    to 17, UDP.
 728
 729 -  **UDP: 1234 -> 2345** - Set the UDP source and destination ports to
 730    1234 and 2345, respectively
 731
 732 -  **incrementing 256** - Insert up to 256 incrementing data bytes.
 733
 734 Obvious variations involve “s/IP4/IP6/” in the above, along with
 735 changing from IPv4 to IPv6 address notation.
 736
 737 The vpp pg can set any / all IPv4 header fields, including tos, packet
 738 length, mf / df / fragment id and offset, ttl, protocol, checksum, and
 739 src/dst addresses. Take a look at ../src/vnet/ip/ip[46]_pg.c for
 740 details.
 741
 742 If all else fails, specify the entire packet data in hex:
 743
 744 -  **hex 0xabcd…** - copy hex data verbatim into the packet
 745
 746 When replaying pcap files (“**pcap <filename>**”), do not specify a data
 747 stanza.
 748
 749 Diagnosing “packet-generator new” parse failures
 750 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 751
 752 If you want to inject packets into a brand-new graph node, remember to
 753 tell the packet generator debug CLI how to parse the packet data stanza.
 754
 755 If the node expects L2 Ethernet MAC headers, specify “.unformat_buffer =
 756 unformat_ethernet_header”:
 757
 758 .. code:: c
 759
 760        VLIB_REGISTER_NODE (ethernet_input_node) =
 761        {
 762          <snip>
 763          .unformat_buffer = unformat_ethernet_header,
 764          <snip>
 765        };
 766
 767 Beyond that, it may be necessary to set breakpoints in
 768 …/src/vnet/pg/cli.c. Debug image suggested.
 769
 770 When debugging new nodes, it may be far simpler to directly inject
 771 ethernet frames - and add a corresponding vlib_buffer_advance in the new
 772 node - than to modify the packet generator.
 773
 774 Debug CLI
 775 ---------
 776
 777 The descriptions above describe the “packet-generator new” debug CLI in
 778 detail.
 779
 780 Additional debug CLI commands include:
 781
 782 ::
 783
 784        vpp# packet-generator enable [<stream-name>]
 785
 786 which enables the named stream, or all streams.
 787
 788 ::
 789
 790        vpp# packet-generator disable [<stream-name>]
 791
 792 disables the named stream, or all streams.
 793
 794 ::
 795
 796        vpp# packet-generator delete <stream-name>
 797
 798 Deletes the named stream.
 799
 800 ::
 801
 802        vpp# packet-generator configure <stream-name> [limit <nnn>]
 803             [rate <f64-pps>] [size <nn>-<nn>]
 804
 805 Changes stream parameters without having to recreate the entire stream
 806 definition. Note that re-issuing a “packet-generator new” command will
 807 correctly recreate the named stream.