sflow: add feature-arc at error-drop, drop-monitoring, egress-sampling 43/42543/27
authorNeil McKee <neil.mckee@inmon.com>
Sun, 23 Mar 2025 22:57:13 +0000 (15:57 -0700)
committerDave Wallace <dwallacelf@gmail.com>
Tue, 3 Jun 2025 21:56:19 +0000 (21:56 +0000)
(First submitted as two separate commits, but is now one).

This turns the "error-drop" --> "drop" arc into a feature-arc
so that any plugin can insert a step and examine condemned packets
before they are freed. The immediate goal is for the sflow
plugin to be able export the headers of dropped packets to the sflow
collector, as per the sflow standard. However it seems clear that the
existing pcap-drop code could also be moved to a plugin, which
would help to simplify the core vnet/interface_output.c.

The sflow agent is rounded out to support egress-sampling and
packet-drop monitoring.  The egress-sampling is achieved by
inserting a node on the interface-output feature arc. Packet
samples taken here are forwarded on the same FIFO to the
main thread, but marked as egress samples so that the samples
are written to a separate netlink PSAMPLE group number, which
indicated to hsflowd that these are egress samples.

Note that when you random-sample both ingress and egress packets at
1:N it is statistically the same as if you were sampling ingress packets
at 1:N and egress packets at 1:N independently. The sflow standard
does not allow a different value of N for ingress and egress on
the same interface, so we are free to take advantage of this.

The new node on the newly introduced "error-drop" feature-arc is
responsible for passing the headers of dropped packets to the main
thread too.  These are not sampled.  Instead we use a deliberately
shallow FIFO to ensure that under chronic conditions of high
packet loss these discard events will have negligible impact if
the main thread is not servicing the FIFO. The sflow standard
sets a rate-limit for the export of discard events which will
almost certainly be much lower (typically around 100 per second)
so even if we can only sustain a peak of, say, 1000 per second being
written to netlink DROPMON that is more than enough. The hsflowd
mod_dropmon will apply the configured sflow rate-limit and throw
away the excess messages before they are actually sent to the sflow
collector. For this reason it is not considered necessary for the
vpp CLI to set an explicit rate-limit.

The netlink PSAMPLE, DROPMON and USERSOCK code has been factored
and corrected for style, but the new implementation behaves the
same way in that there is no heap allocation, and iovectors are
used to assemble the netlink messages from their headers and
attributes.

New tests are added to confirm that (1) a single dropped packet is
delivered to the sflow_drop node, send to the DROPMON
netlink channel, and counted correctly when sflow drop-monitoring
is enabled, and that (2) when bidirectional packet-sampling is
enabled samples are taken both at ingress and at egress.

The terms "rx" for ingress, "tx" for egress and "both" for
bidrectional were settled on for brevity and because they appear
elsewhere in VPP, however the code still uses terms like
"ingress", "egress" and "bidirectional" to be consistent with
sFlow standard documents.

The new CLI options are:

vpp> sflow drop-monitoring enable|disable
vpp> sflow direction rx|tx|both

And the "show sflow" output has been enhanced to reflect this.
The defaults are as follows:

vpp> show sflow
show sflow
sflow sampling-rate 10000
sflow direction rx
sflow polling-interval 20
sflow header-bytes 128
sflow drop-monitoring disable
Status
  interfaces enabled: 0
  packet samples sent: 0
  packet samples dropped: 0
  counter samples sent: 0
  counter samples dropped: 0
  drop samples sent: 0
  drop samples dropped: 0

(rebased on 5/12/2025)

Type: improvement
Change-Id: I831e803fa41874965bc9c32516f655b7ae837719
Signed-off-by: Neil McKee <neil.mckee@inmon.com>
19 files changed:
src/plugins/sflow/CMakeLists.txt
src/plugins/sflow/node.c
src/plugins/sflow/sflow.api
src/plugins/sflow/sflow.c
src/plugins/sflow/sflow.h
src/plugins/sflow/sflow_common.h
src/plugins/sflow/sflow_dropmon.c [new file with mode: 0644]
src/plugins/sflow/sflow_dropmon.h [new file with mode: 0644]
src/plugins/sflow/sflow_netlink.c [new file with mode: 0644]
src/plugins/sflow/sflow_netlink.h [new file with mode: 0644]
src/plugins/sflow/sflow_psample.c
src/plugins/sflow/sflow_psample.h
src/plugins/sflow/sflow_test.c
src/plugins/sflow/sflow_usersock.c
src/plugins/sflow/sflow_usersock.h
src/vnet/interface.h
src/vnet/interface_output.c
test/test_sflow_drop.py [new file with mode: 0644]
test/test_sflow_egress.py [new file with mode: 0644]

index c966fcc..37f6261 100644 (file)
@@ -29,6 +29,10 @@ add_vpp_plugin(sflow
   sflow_psample_fields.h
   sflow_usersock.c
   sflow_usersock.h
+  sflow_dropmon.h
+  sflow_dropmon.c
+  sflow_netlink.c
+  sflow_netlink.h
 
   MULTIARCH_SOURCES
   node.c
index 5182643..47da22c 100644 (file)
@@ -52,6 +52,8 @@ format_sflow_trace (u8 *s, va_list *args)
 }
 
 vlib_node_registration_t sflow_node;
+vlib_node_registration_t sflow_egress_node;
+vlib_node_registration_t sflow_drop_node;
 
 #endif /* CLIB_MARCH_VARIANT */
 
@@ -65,12 +67,14 @@ static char *sflow_error_strings[] = {
 
 typedef enum
 {
-  SFLOW_NEXT_ETHERNET_INPUT,
+  SFLOW_NEXT_ETHERNET_INPUT_OR_INTERFACE_OUTPUT,
   SFLOW_N_NEXT,
 } sflow_next_t;
 
-VLIB_NODE_FN (sflow_node)
-(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
+static_always_inline uword
+sflow_node_ingress_egress (vlib_main_t *vm, vlib_node_runtime_t *node,
+                          vlib_frame_t *frame,
+                          sflow_enum_sample_t sample_type)
 {
   u32 n_left_from, *from, *to_next;
   sflow_next_t next_index;
@@ -107,6 +111,7 @@ VLIB_NODE_FN (sflow_node)
 
          ethernet_header_t *en = vlib_buffer_get_current (bN);
          u32 if_index = vnet_buffer (bN)->sw_if_index[VLIB_RX];
+         u32 if_index_out = 0;
          vnet_hw_interface_t *hw =
            vnet_get_sup_hw_interface (smp->vnet_main, if_index);
          if (hw)
@@ -117,9 +122,20 @@ VLIB_NODE_FN (sflow_node)
              // If so,  should we ignore the sample?
            }
 
+         if (sample_type == SFLOW_SAMPLETYPE_EGRESS)
+           {
+             if_index_out = vnet_buffer (bN)->sw_if_index[VLIB_TX];
+             vnet_hw_interface_t *hw_out =
+               vnet_get_sup_hw_interface (smp->vnet_main, if_index_out);
+             if (hw_out)
+               if_index_out = hw_out->hw_if_index;
+           }
+
          sflow_sample_t sample = {
+           .sample_type = sample_type,
            .samplingN = sfwk->smpN,
            .input_if_index = if_index,
+           .output_if_index = if_index_out,
            .sampled_packet_size =
              bN->current_length + bN->total_length_not_including_first_buffer,
            .header_bytes = hdr
@@ -173,10 +189,10 @@ VLIB_NODE_FN (sflow_node)
 
       while (n_left_from >= 8 && n_left_to_next >= 4)
        {
-         u32 next0 = SFLOW_NEXT_ETHERNET_INPUT;
-         u32 next1 = SFLOW_NEXT_ETHERNET_INPUT;
-         u32 next2 = SFLOW_NEXT_ETHERNET_INPUT;
-         u32 next3 = SFLOW_NEXT_ETHERNET_INPUT;
+         u32 next0 = SFLOW_NEXT_ETHERNET_INPUT_OR_INTERFACE_OUTPUT;
+         u32 next1 = SFLOW_NEXT_ETHERNET_INPUT_OR_INTERFACE_OUTPUT;
+         u32 next2 = SFLOW_NEXT_ETHERNET_INPUT_OR_INTERFACE_OUTPUT;
+         u32 next3 = SFLOW_NEXT_ETHERNET_INPUT_OR_INTERFACE_OUTPUT;
          ethernet_header_t *en0, *en1, *en2, *en3;
          u32 bi0, bi1, bi2, bi3;
          vlib_buffer_t *b0, *b1, *b2, *b3;
@@ -286,7 +302,7 @@ VLIB_NODE_FN (sflow_node)
        {
          u32 bi0;
          vlib_buffer_t *b0;
-         u32 next0 = SFLOW_NEXT_ETHERNET_INPUT;
+         u32 next0 = SFLOW_NEXT_ETHERNET_INPUT_OR_INTERFACE_OUTPUT;
          ethernet_header_t *en0;
 
          /* speculatively enqueue b0 to the current next frame */
@@ -331,6 +347,138 @@ VLIB_NODE_FN (sflow_node)
   return frame->n_vectors;
 }
 
+VLIB_NODE_FN (sflow_node)
+(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
+{
+  return sflow_node_ingress_egress (vm, node, frame, SFLOW_SAMPLETYPE_INGRESS);
+}
+
+VLIB_NODE_FN (sflow_egress_node)
+(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
+{
+  return sflow_node_ingress_egress (vm, node, frame, SFLOW_SAMPLETYPE_EGRESS);
+}
+
+typedef enum
+{
+  SFLOW_DROP_NEXT_DROP,
+  SFLOW_DROP_N_NEXT,
+} sflow_drop_next_t;
+
+static_always_inline void
+buffer_rewind_current (vlib_buffer_t *bN)
+{
+  /*
+   * Typically, we'll need to rewind the buffer
+   * if l2_hdr_offset is valid, make sure to rewind to the start of
+   * the L2 header. This may not be the buffer start in case we pop-ed
+   * vlan tags.
+   * Otherwise, rewind to buffer start and hope for the best.
+   */
+  /*
+   * If the packet was rewritten the start may be somewhere
+   * in buffer->pre_data, which comes before buffer->data. In
+   * other words, the buffer->current_data index can be negative.
+   */
+  if (bN->flags & VNET_BUFFER_F_L2_HDR_OFFSET_VALID)
+    {
+      if (bN->current_data > vnet_buffer (bN)->l2_hdr_offset)
+       vlib_buffer_advance (bN, vnet_buffer (bN)->l2_hdr_offset -
+                                  bN->current_data);
+    }
+  else if (bN->current_data > 0)
+    {
+      vlib_buffer_advance (bN, (word) -bN->current_data);
+    }
+}
+
+VLIB_NODE_FN (sflow_drop_node)
+(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
+{
+  u32 n_left_from, *from, *to_next, n_left_to_next;
+  sflow_drop_next_t next_index;
+
+  from = vlib_frame_vector_args (frame);
+  n_left_from = frame->n_vectors;
+
+  sflow_main_t *smp = &sflow_main;
+  uword thread_index = os_get_thread_index ();
+  sflow_per_thread_data_t *sfwk =
+    vec_elt_at_index (smp->per_thread_data, thread_index);
+
+  for (u32 pkt = n_left_from; pkt > 0; --pkt)
+    {
+      vlib_buffer_t *bN = vlib_get_buffer (vm, from[pkt - 1]);
+      buffer_rewind_current (bN);
+      // drops are subject to header_bytes limit too
+      u32 hdr = bN->current_length;
+      if (hdr > smp->headerB)
+       hdr = smp->headerB;
+      ethernet_header_t *en = vlib_buffer_get_current (bN);
+      // Where did this packet come in originally?
+      // (Doesn't have to be known)
+      u32 if_index = vnet_buffer (bN)->sw_if_index[VLIB_RX];
+      if (if_index)
+       {
+         vnet_hw_interface_t *hw =
+           vnet_get_sup_hw_interface (smp->vnet_main, if_index);
+         if (hw)
+           if_index = hw->hw_if_index;
+       }
+      // queue the discard sample for the main thread
+      sflow_sample_t discard = { .sample_type = SFLOW_SAMPLETYPE_DISCARD,
+                                .input_if_index = if_index,
+                                .sampled_packet_size =
+                                  bN->current_length +
+                                  bN->total_length_not_including_first_buffer,
+                                .header_bytes = hdr,
+                                // .header_protocol = 0,
+                                .drop_reason = bN->error };
+      sfwk->dsmp++; // drop-samples
+      memcpy (discard.header, en, hdr);
+      if (PREDICT_FALSE (
+           !sflow_drop_fifo_enqueue (&sfwk->drop_fifo, &discard)))
+       sfwk->ddrp++; // drop-sample drops
+    }
+
+  /* the rest of this is boilerplate code to pass packets on - typically to
+     "drop" */
+  /* TODO: put back tracing code? */
+  /* TODO: process 2 or 4 at a time? */
+  /* TODO: by using this variant of the pipeline are we assuming that
+     we are in a feature arc where frames are not converging or dividing? Just
+     processing through a linear list of nodes that will each pass the whole
+     frame of buffers on unchanged ("lighting fools the way to dusty death").
+     And if so, how do we make that assumption explicit?
+     To improve the flexibility would we have to go back and change the way
+     that interface_output.c (error-drop) launches the frame along the arc
+     in the first place?
+  */
+  next_index = node->cached_next_index;
+  while (n_left_from > 0)
+    {
+      vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next);
+      while (n_left_from > 0 && n_left_to_next > 0)
+       {
+         u32 bi0;
+         vlib_buffer_t *b0;
+         u32 next0 = SFLOW_DROP_NEXT_DROP;
+         /* enqueue b0 to the current next frame */
+         bi0 = from[0];
+         to_next[0] = bi0;
+         from += 1;
+         to_next += 1;
+         n_left_from -= 1;
+         n_left_to_next -= 1;
+         b0 = vlib_get_buffer (vm, bi0);
+         /* do this to always pass on to the next node on feature arc */
+         vnet_feature_next (&next0, b0);
+       }
+      vlib_put_next_frame (vm, node, next_index, n_left_to_next);
+    }
+  return frame->n_vectors;
+}
+
 #ifndef CLIB_MARCH_VARIANT
 VLIB_REGISTER_NODE (sflow_node) =
 {
@@ -343,7 +491,38 @@ VLIB_REGISTER_NODE (sflow_node) =
   .n_next_nodes = SFLOW_N_NEXT,
   /* edit / add dispositions here */
   .next_nodes = {
-    [SFLOW_NEXT_ETHERNET_INPUT] = "ethernet-input",
+    [SFLOW_NEXT_ETHERNET_INPUT_OR_INTERFACE_OUTPUT] = "ethernet-input",
+  },
+};
+
+VLIB_REGISTER_NODE (sflow_egress_node) =
+{
+  .name = "sflow-egress",
+  .vector_size = sizeof (u32),
+  .format_trace = format_sflow_trace,
+  .type = VLIB_NODE_TYPE_INTERNAL,
+  .n_errors = ARRAY_LEN(sflow_error_strings),
+  .error_strings = sflow_error_strings,
+  .n_next_nodes = SFLOW_N_NEXT,
+  /* edit / add dispositions here */
+  .next_nodes = {
+    [SFLOW_NEXT_ETHERNET_INPUT_OR_INTERFACE_OUTPUT] = "interface-output",
+  },
+};
+
+VLIB_REGISTER_NODE (sflow_drop_node) =
+{
+  .name = "sflow-drop",
+  .vector_size = sizeof (u32),
+  .format_trace = format_sflow_trace,
+  .type = VLIB_NODE_TYPE_INTERNAL,
+  .n_errors = ARRAY_LEN(sflow_error_strings),
+  .error_strings = sflow_error_strings,
+  .n_next_nodes = SFLOW_DROP_N_NEXT,
+  /* edit / add dispositions here */
+  .next_nodes = {
+    //[SFLOW_DROP_NEXT_DROP] = "error-drop",
+    [SFLOW_DROP_NEXT_DROP] = "drop",
   },
 };
 #endif /* CLIB_MARCH_VARIANT */
index e5f3300..57a478b 100644 (file)
@@ -61,7 +61,7 @@ define sflow_sampling_rate_get {
     u32 context;
 };
 
-/** \brief API go the sflow sampling-rate
+/** \brief reply to get the sflow sampling-rate
     @param client_index - opaque cookie to identify the sender
     @param context - sender context, to match reply w/ request
     @param sampling_N - the current 1-in-N sampling rate
@@ -121,7 +121,7 @@ define sflow_polling_interval_get {
     u32 context;
 };
 
-/** \brief API go the sflow polling-interval
+/** \brief reply to get the sflow polling-interval
     @param client_index - opaque cookie to identify the sender
     @param context - sender context, to match reply w/ request
     @param polling_S - current polling interval in seconds
@@ -164,7 +164,7 @@ define sflow_header_bytes_get {
     u32 context;
 };
 
-/** \brief API go the sflow header-bytes
+/** \brief reply to get the sflow header-bytes
     @param client_index - opaque cookie to identify the sender
     @param context - sender context, to match reply w/ request
     @param header_B - current maximum header length in bytes
@@ -177,6 +177,92 @@ define sflow_header_bytes_get_reply
   option in_progress;
 };
 
+/** @brief API to set sflow direction
+    @param client_index - opaque cookie to identify the sender
+    @param context - sender context, to match reply w/ request
+    @param sampling_D - direction
+*/
+
+autoreply define sflow_direction_set {
+    /* Client identifier, set from api_main.my_client_index */
+    u32 client_index;
+
+    /* Arbitrary context, so client can match reply to request */
+    u32 context;
+
+    /* sampling_D */
+    u32 sampling_D;
+};
+
+/** @brief API to get sflow direction
+    @param client_index - opaque cookie to identify the sender
+    @param context - sender context, to match reply w/ request
+*/
+
+define sflow_direction_get {
+    /* Client identifier, set from api_main.my_client_index */
+    u32 client_index;
+
+    /* Arbitrary context, so client can match reply to request */
+    u32 context;
+};
+
+/** \brief reply to get the sflow direction
+    @param client_index - opaque cookie to identify the sender
+    @param context - sender context, to match reply w/ request
+    @param sampling_D - direction
+*/
+
+define sflow_direction_get_reply
+{
+  u32 context;
+  u32 sampling_D;
+  option in_progress;
+};
+
+/** @brief API to set sflow drop-monitoring
+    @param client_index - opaque cookie to identify the sender
+    @param context - sender context, to match reply w/ request
+    @param drop_M - enable drop monitoring
+*/
+
+autoreply define sflow_drop_monitoring_set {
+    /* Client identifier, set from api_main.my_client_index */
+    u32 client_index;
+
+    /* Arbitrary context, so client can match reply to request */
+    u32 context;
+
+    /* drop_M */
+    u32 drop_M;
+};
+
+/** @brief API to get sflow drop-monitoring
+    @param client_index - opaque cookie to identify the sender
+    @param context - sender context, to match reply w/ request
+*/
+
+define sflow_drop_monitoring_get {
+    /* Client identifier, set from api_main.my_client_index */
+    u32 client_index;
+
+    /* Arbitrary context, so client can match reply to request */
+    u32 context;
+};
+
+/** \brief reply to get the sflow drop-monitoring
+    @param client_index - opaque cookie to identify the sender
+    @param context - sender context, to match reply w/ request
+    @param drop_M - is drop monitoring enabled
+*/
+
+define sflow_drop_monitoring_get_reply
+{
+  u32 context;
+  u32 drop_M;
+  option in_progress;
+};
+
 /** \brief Dump sflow enabled interface(s)
     @param client_index - opaque cookie to identify the sender
     @param hw_if_index - hw_if_index of a specific interface, or -1 (default)
index 14d07d6..3568114 100644 (file)
@@ -24,7 +24,6 @@
 
 #include <sflow/sflow.api_enum.h>
 #include <sflow/sflow.api_types.h>
-#include <sflow/sflow_psample.h>
 #include <sflow/sflow_dlapi.h>
 
 #include <vpp-api/client/stat_client.h>
@@ -178,11 +177,11 @@ retry:
   stat_segment_data_free (res);
   vec_free (stats);
   // send the structure via netlink
-  SFLOWUSSpec spec = {};
-  SFLOWUSSpec_setMsgType (&spec, SFLOW_VPP_MSG_IF_COUNTERS);
-  SFLOWUSSpec_setAttr (&spec, SFLOW_VPP_ATTR_PORTNAME, hw->name,
-                      vec_len (hw->name));
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_IFINDEX, sfif->sw_if_index);
+  SFLOWUS *ust = &smp->sflow_usersock;
+  SFLOWUS_set_msg_type (ust, SFLOW_VPP_MSG_IF_COUNTERS);
+  SFLOWUS_set_attr (ust, SFLOW_VPP_ATTR_PORTNAME, hw->name,
+                   vec_len (hw->name));
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_IFINDEX, sfif->sw_if_index);
 
   if (smp->lcp_itf_pair_get_vif_index_by_phy)
     {
@@ -194,12 +193,11 @@ retry:
     {
       // We know the corresponding Linux ifIndex for this interface, so include
       // that here.
-      SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_OSINDEX,
-                             sfif->linux_if_index);
+      SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_OSINDEX, sfif->linux_if_index);
     }
 
   // Report consistent with vpp-snmp-agent
-  u64 ifSpeed = (hw->link_speed == ~0) ? 0 : (hw->link_speed * 1000);
+  u64 ifSpeed = (hw->link_speed == ~0) ? 0 : ((u64) hw->link_speed * 1000);
   if (startsWith (hw->name, "loop") || startsWith (hw->name, "tap"))
     ifSpeed = 1e9;
 
@@ -215,28 +213,28 @@ retry:
   u32 operUp = (hw->flags & VNET_HW_INTERFACE_FLAG_LINK_UP) ? 1 : 0;
   u32 adminUp = (sw->flags & VNET_SW_INTERFACE_FLAG_ADMIN_UP) ? 1 : 0;
 
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_IFSPEED, ifSpeed);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_IFTYPE, ifType);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_IFDIRECTION, ifDirection);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_OPER_UP, operUp);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_ADMIN_UP, adminUp);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_RX_OCTETS, ifCtrs.rx.byts);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_TX_OCTETS, ifCtrs.tx.byts);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_RX_PKTS, ifCtrs.rx.pkts);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_TX_PKTS, ifCtrs.tx.pkts);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_RX_MCASTS, ifCtrs.rx.m_pkts);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_TX_MCASTS, ifCtrs.tx.m_pkts);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_RX_BCASTS, ifCtrs.rx.b_pkts);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_TX_BCASTS, ifCtrs.tx.b_pkts);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_RX_ERRORS, ifCtrs.rx.errs);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_TX_ERRORS, ifCtrs.tx.errs);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_RX_DISCARDS, ifCtrs.rx.drps);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_TX_DISCARDS, ifCtrs.tx.drps);
-  SFLOWUSSpec_setAttr (&spec, SFLOW_VPP_ATTR_HW_ADDRESS, hw->hw_address,
-                      vec_len (hw->hw_address));
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_IFSPEED, ifSpeed);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_IFTYPE, ifType);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_IFDIRECTION, ifDirection);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_OPER_UP, operUp);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_ADMIN_UP, adminUp);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_RX_OCTETS, ifCtrs.rx.byts);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_TX_OCTETS, ifCtrs.tx.byts);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_RX_PKTS, ifCtrs.rx.pkts);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_TX_PKTS, ifCtrs.tx.pkts);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_RX_MCASTS, ifCtrs.rx.m_pkts);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_TX_MCASTS, ifCtrs.tx.m_pkts);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_RX_BCASTS, ifCtrs.rx.b_pkts);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_TX_BCASTS, ifCtrs.tx.b_pkts);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_RX_ERRORS, ifCtrs.rx.errs);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_TX_ERRORS, ifCtrs.tx.errs);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_RX_DISCARDS, ifCtrs.rx.drps);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_TX_DISCARDS, ifCtrs.tx.drps);
+  SFLOWUS_set_attr (ust, SFLOW_VPP_ATTR_HW_ADDRESS, hw->hw_address,
+                   vec_len (hw->hw_address));
   smp->unixsock_seq++;
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_SEQ, smp->unixsock_seq);
-  if (SFLOWUSSpec_send (&smp->sflow_usersock, &spec) < 0)
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_SEQ, smp->unixsock_seq);
+  if (SFLOWUS_send (ust) < 0)
     smp->csample_send_drops++;
   smp->csample_send++;
 }
@@ -259,14 +257,14 @@ total_drops (sflow_main_t *smp)
 static void
 send_sampling_status_info (sflow_main_t *smp)
 {
-  SFLOWUSSpec spec = {};
+  SFLOWUS *ust = &smp->sflow_usersock;
   u32 all_pipeline_drops = total_drops (smp);
-  SFLOWUSSpec_setMsgType (&spec, SFLOW_VPP_MSG_STATUS);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_UPTIME_S, smp->now_mono_S);
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_DROPS, all_pipeline_drops);
+  SFLOWUS_set_msg_type (ust, SFLOW_VPP_MSG_STATUS);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_UPTIME_S, smp->now_mono_S);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_DROPS, all_pipeline_drops);
   ++smp->unixsock_seq;
-  SFLOWUSSpec_setAttrInt (&spec, SFLOW_VPP_ATTR_SEQ, smp->unixsock_seq);
-  SFLOWUSSpec_send (&smp->sflow_usersock, &spec);
+  SFLOWUS_set_attr_int (ust, SFLOW_VPP_ATTR_SEQ, smp->unixsock_seq);
+  SFLOWUS_send (ust);
 }
 
 static int
@@ -291,8 +289,137 @@ counter_polling_check (sflow_main_t *smp)
   return polled;
 }
 
+static void
+lowercase_and_replace_white (char *str, int len, char replace)
+{
+  if (str)
+    for (int ii = 0; ii < len; ii++)
+      {
+       if (isspace (str[ii]))
+         str[ii] = replace;
+       else
+         str[ii] = tolower (str[ii]);
+      }
+}
+
+static int
+compose_trap_str (char *buf, int buf_len, char *str, int str_len)
+{
+  int prefix_len = strlen (SFLOW_TRAP_PREFIX);
+  int max_cont_len = buf_len - prefix_len - 1;
+  int cont_len = (str_len > max_cont_len) ? max_cont_len : str_len;
+  clib_memcpy_fast (buf, SFLOW_TRAP_PREFIX, prefix_len);
+  clib_memcpy_fast (buf + prefix_len, str, cont_len);
+  lowercase_and_replace_white (buf + prefix_len, cont_len, SFLOW_TRAP_WHITE);
+  buf[prefix_len + cont_len] = '\0';
+  return prefix_len + cont_len;
+}
+
+static int
+send_packet_sample (vlib_main_t *vm, sflow_main_t *smp, sflow_sample_t *sample)
+{
+  if (sample->header_bytes > smp->headerB)
+    {
+      // We get here if header-bytes setting is reduced dynamically
+      // and a sample that was in the FIFO appears with a larger
+      // header.
+      return 0;
+    }
+  SFLOWPS *pst = &smp->sflow_psample;
+  u32 ps_group, seqNo;
+  switch (sample->sample_type)
+    {
+    case SFLOW_SAMPLETYPE_INGRESS:
+      ps_group = SFLOW_VPP_PSAMPLE_GROUP_INGRESS;
+      seqNo = ++smp->psample_seq_ingress;
+      break;
+    case SFLOW_SAMPLETYPE_EGRESS:
+      ps_group = SFLOW_VPP_PSAMPLE_GROUP_EGRESS;
+      seqNo = ++smp->psample_seq_egress;
+      break;
+    default:
+      return 0;
+    }
+  // TODO: is it always ethernet? (affects ifType counter as well)
+  u16 header_protocol = 1; /* ethernet */
+  SFLOWPS_set_attr_int (pst, SFLOWPS_PSAMPLE_ATTR_SAMPLE_GROUP, ps_group);
+  SFLOWPS_set_attr_int (pst, SFLOWPS_PSAMPLE_ATTR_IIFINDEX,
+                       sample->input_if_index);
+  SFLOWPS_set_attr_int (pst, SFLOWPS_PSAMPLE_ATTR_OIFINDEX,
+                       sample->output_if_index);
+  SFLOWPS_set_attr_int (pst, SFLOWPS_PSAMPLE_ATTR_ORIGSIZE,
+                       sample->sampled_packet_size);
+  SFLOWPS_set_attr_int (pst, SFLOWPS_PSAMPLE_ATTR_GROUP_SEQ, seqNo);
+  SFLOWPS_set_attr_int (pst, SFLOWPS_PSAMPLE_ATTR_SAMPLE_RATE,
+                       sample->samplingN);
+  SFLOWPS_set_attr (pst, SFLOWPS_PSAMPLE_ATTR_DATA, sample->header,
+                   sample->header_bytes);
+  SFLOWPS_set_attr_int (pst, SFLOWPS_PSAMPLE_ATTR_PROTO, header_protocol);
+  if (SFLOWPS_send (pst) < 0)
+    return -1;
+  return 1;
+}
+
+static int
+send_discard_sample (vlib_main_t *vm, sflow_main_t *smp,
+                    sflow_sample_t *sample)
+{
+  SFLOWDM *dmt = &smp->sflow_dropmon;
+  if (sample->header_bytes > smp->headerB)
+    {
+      // We get here if header-bytes setting is reduced dynamically
+      // and a sample that was in the FIFO appears with a larger
+      // header.
+      return 0;
+    }
+  if (sample->sample_type != SFLOW_SAMPLETYPE_DISCARD)
+    {
+      SFLOW_ERR ("send_discard_sample sample-sample_type=%u",
+                sample->sample_type);
+      return 0;
+    }
+  vlib_error_main_t *em = &vm->error_main;
+  if (sample->drop_reason >= vec_len (em->counters_heap))
+    return 0;
+  if (sample->drop_reason >= vec_len (vm->node_main.node_by_error))
+    return 0;
+  u32 err_node_idx = vm->node_main.node_by_error[sample->drop_reason];
+  // Are all the ones we want classed as errors, or might some be WARN or INFO?
+  // if (err->severity == VL_COUNTER_SEVERITY_ERROR)
+  // set TRAP_GROUP_NAME to "vpp_<node>"
+  char trap_grp[SFLOW_MAX_TRAP_LEN];
+  char trap[SFLOW_MAX_TRAP_LEN];
+  vlib_node_t *n = vlib_get_node (vm, err_node_idx);
+  int trap_grp_len = compose_trap_str (trap_grp, SFLOW_MAX_TRAP_LEN,
+                                      (char *) n->name, vec_len (n->name));
+  // set TRAP_NAME to "vpp_<error>"
+  vlib_error_desc_t *err = &em->counters_heap[sample->drop_reason];
+  int err_name_len = clib_strnlen (err->name, SFLOW_MAX_TRAP_LEN);
+  int trap_len =
+    compose_trap_str (trap, SFLOW_MAX_TRAP_LEN, err->name, err_name_len);
+  // populate the netlink attributes
+  u16 origin = NET_DM_ORIGIN_SW;
+  SFLOWDM_set_attr_int (dmt, NET_DM_ATTR_ORIGIN, origin);
+  // include NUL termination char in netlink strings.
+  SFLOWDM_set_attr (dmt, NET_DM_ATTR_HW_TRAP_GROUP_NAME, trap_grp,
+                   trap_grp_len + 1);
+  SFLOWDM_set_attr (dmt, NET_DM_ATTR_HW_TRAP_NAME, trap, trap_len + 1);
+  SFLOWDM_set_attr_int (dmt, NET_DM_ATTR_ORIG_LEN,
+                       sample->sampled_packet_size);
+  SFLOWDM_set_attr_int (dmt, NET_DM_ATTR_TRUNC_LEN, sample->header_bytes);
+  // TODO: read from header? (really just needs to be non-zero for hsflowd)
+  u16 proto = 0x0800;
+  SFLOWDM_set_attr_int (dmt, NET_DM_ATTR_PROTO, proto);
+  SFLOWDM_set_attr_int (dmt, NET_DM_ATTR_IN_PORT, sample->input_if_index);
+  SFLOWDM_set_attr (dmt, NET_DM_ATTR_PAYLOAD, sample->header,
+                   sample->header_bytes);
+  if (SFLOWDM_send (dmt) < 0)
+    return -1;
+  return 1;
+}
+
 static u32
-read_worker_fifos (sflow_main_t *smp)
+read_worker_fifos (vlib_main_t *vm, sflow_main_t *smp)
 {
   // Our maximum samples/sec is approximately:
   // (SFLOW_READ_BATCH * smp->total_threads) / SFLOW_POLL_WAIT_S
@@ -321,7 +448,9 @@ read_worker_fifos (sflow_main_t *smp)
   u32 batch = 0;
   for (; batch < SFLOW_READ_BATCH; batch++)
     {
+      u32 psample_found = 0, dropmon_found = 0;
       u32 psample_send = 0, psample_send_fail = 0;
+      u32 dropmon_send = 0, dropmon_send_fail = 0;
       for (clib_thread_index_t thread_index = 0;
           thread_index < smp->total_threads; thread_index++)
        {
@@ -330,48 +459,37 @@ read_worker_fifos (sflow_main_t *smp)
          sflow_sample_t sample;
          if (sflow_fifo_dequeue (&sfwk->fifo, &sample))
            {
-             if (sample.header_bytes > smp->headerB)
-               {
-                 // We get here if header-bytes setting is reduced dynamically
-                 // and a sample that was in the FIFO appears with a larger
-                 // header.
-                 continue;
-               }
-             SFLOWPSSpec spec = {};
-             u32 ps_group = SFLOW_VPP_PSAMPLE_GROUP_INGRESS;
-             u32 seqNo = ++smp->psample_seq_ingress;
-             // TODO: is it always ethernet? (affects ifType counter as well)
-             u16 header_protocol = 1; /* ethernet */
-             SFLOWPSSpec_setAttrInt (&spec, SFLOWPS_PSAMPLE_ATTR_SAMPLE_GROUP,
-                                     ps_group);
-             SFLOWPSSpec_setAttrInt (&spec, SFLOWPS_PSAMPLE_ATTR_IIFINDEX,
-                                     sample.input_if_index);
-             SFLOWPSSpec_setAttrInt (&spec, SFLOWPS_PSAMPLE_ATTR_OIFINDEX,
-                                     sample.output_if_index);
-             SFLOWPSSpec_setAttrInt (&spec, SFLOWPS_PSAMPLE_ATTR_ORIGSIZE,
-                                     sample.sampled_packet_size);
-             SFLOWPSSpec_setAttrInt (&spec, SFLOWPS_PSAMPLE_ATTR_GROUP_SEQ,
-                                     seqNo);
-             SFLOWPSSpec_setAttrInt (&spec, SFLOWPS_PSAMPLE_ATTR_SAMPLE_RATE,
-                                     sample.samplingN);
-             SFLOWPSSpec_setAttr (&spec, SFLOWPS_PSAMPLE_ATTR_DATA,
-                                  sample.header, sample.header_bytes);
-             SFLOWPSSpec_setAttrInt (&spec, SFLOWPS_PSAMPLE_ATTR_PROTO,
-                                     header_protocol);
-             psample_send++;
-             if (SFLOWPSSpec_send (&smp->sflow_psample, &spec) < 0)
+             psample_found++;
+             int sent = send_packet_sample (vm, smp, &sample);
+             if (sent == 1)
+               psample_send++;
+             if (sent == -1)
                psample_send_fail++;
            }
+         if (sflow_drop_fifo_dequeue (&sfwk->drop_fifo, &sample))
+           {
+             dropmon_found++;
+             int sent = send_discard_sample (vm, smp, &sample);
+             if (sent == 1)
+               dropmon_send++;
+             if (sent == -1)
+               dropmon_send_fail++;
+           }
        }
-      if (psample_send == 0)
+      if (psample_found == 0 && dropmon_found == 0)
        {
          // nothing found on FIFOs this time through, so terminate batch early
          break;
        }
       else
        {
-         vlib_node_increment_counter (smp->vlib_main, sflow_node.index,
-                                      SFLOW_ERROR_PSAMPLE_SEND, psample_send);
+         if (psample_send > 0)
+           {
+             vlib_node_increment_counter (smp->vlib_main, sflow_node.index,
+                                          SFLOW_ERROR_PSAMPLE_SEND,
+                                          psample_send);
+             smp->psample_send += psample_send;
+           }
          if (psample_send_fail > 0)
            {
              vlib_node_increment_counter (smp->vlib_main, sflow_node.index,
@@ -379,6 +497,20 @@ read_worker_fifos (sflow_main_t *smp)
                                           psample_send_fail);
              smp->psample_send_drops += psample_send_fail;
            }
+         if (dropmon_send > 0)
+           {
+             vlib_node_increment_counter (smp->vlib_main, sflow_node.index,
+                                          SFLOW_ERROR_DROPMON_SEND,
+                                          dropmon_send);
+             smp->dropmon_send += dropmon_send;
+           }
+         if (dropmon_send_fail > 0)
+           {
+             vlib_node_increment_counter (smp->vlib_main, sflow_node.index,
+                                          SFLOW_ERROR_DROPMON_SEND_FAIL,
+                                          dropmon_send_fail);
+             smp->dropmon_send_drops += dropmon_send_fail;
+           }
        }
     }
   return batch;
@@ -397,6 +529,8 @@ read_node_counters (sflow_main_t *smp, sflow_err_ctrs_t *ctrs)
       ctrs->counters[SFLOW_ERROR_PROCESSED] += sfwk->pool;
       ctrs->counters[SFLOW_ERROR_SAMPLED] += sfwk->smpl;
       ctrs->counters[SFLOW_ERROR_DROPPED] += sfwk->drop;
+      ctrs->counters[SFLOW_ERROR_DIPROCESSED] += sfwk->dsmp;
+      ctrs->counters[SFLOW_ERROR_DIDROPPED] += sfwk->ddrp;
     }
 }
 
@@ -412,9 +546,13 @@ static void
 update_node_counters (sflow_main_t *smp, sflow_err_ctrs_t *prev,
                      sflow_err_ctrs_t *latest)
 {
+  // TODO: is it OK to assess all counters against sflow_node or do we
+  // need to distinguish sflow_drop_node and sflow_egress_node?
   update_node_cntr (smp, prev, latest, SFLOW_ERROR_PROCESSED);
   update_node_cntr (smp, prev, latest, SFLOW_ERROR_SAMPLED);
   update_node_cntr (smp, prev, latest, SFLOW_ERROR_DROPPED);
+  update_node_cntr (smp, prev, latest, SFLOW_ERROR_DIPROCESSED);
+  update_node_cntr (smp, prev, latest, SFLOW_ERROR_DIDROPPED);
   *prev = *latest; // latch for next time
 }
 
@@ -445,12 +583,20 @@ sflow_process_samples (vlib_main_t *vm, vlib_node_runtime_t *node,
 
       // PSAMPLE channel may need extra step (e.g. to learn family_id)
       // before it is ready to send
-      EnumSFLOWPSState psState = SFLOWPS_state (&smp->sflow_psample);
-      if (psState != SFLOWPS_STATE_READY)
+      EnumSFLOWNLState psState = SFLOWPS_state (&smp->sflow_psample);
+      if (psState != SFLOWNL_STATE_READY)
        {
          SFLOWPS_open_step (&smp->sflow_psample);
        }
 
+      // DROPMON channel may need extra step (e.g. to learn family_id)
+      // before it is ready to send
+      EnumSFLOWNLState dmState = SFLOWDM_state (&smp->sflow_dropmon);
+      if (dmState != SFLOWNL_STATE_READY)
+       {
+         SFLOWDM_open_step (&smp->sflow_dropmon);
+       }
+
       // What we want is a monotonic, per-second clock. This seems to do it
       // because it is based on the CPU clock.
       f64 tnow = clib_time_now (&ctm);
@@ -465,7 +611,7 @@ sflow_process_samples (vlib_main_t *vm, vlib_node_runtime_t *node,
          counter_polling_check (smp);
        }
       // process samples from workers
-      read_worker_fifos (smp);
+      read_worker_fifos (vm, smp);
 
       // and sync the global counters
       sflow_err_ctrs_t latest = {};
@@ -511,7 +657,7 @@ sflow_sampling_start (sflow_main_t *smp)
 {
   SFLOW_INFO ("sflow_sampling_start");
 
-  smp->running = 1;
+  smp->running = true;
   // Reset this clock so that the per-second netlink status updates
   // will communicate a restart to hsflowd.  This helps to distinguish:
   // (1) vpp restarted with sFlow off => no status updates (went quiet)
@@ -522,13 +668,22 @@ sflow_sampling_start (sflow_main_t *smp)
   // reset sequence numbers to indicated discontinuity
   smp->psample_seq_ingress = 0;
   smp->psample_seq_egress = 0;
+  smp->psample_send = 0;
   smp->psample_send_drops = 0;
+  smp->csample_send = 0;
+  smp->csample_send_drops = 0;
+  smp->dropmon_send = 0;
+  smp->dropmon_send_drops = 0;
 
   /* open PSAMPLE netlink channel for writing packet samples */
+  SFLOWPS_init (&smp->sflow_psample);
   SFLOWPS_open (&smp->sflow_psample);
   /* open USERSOCK netlink channel for writing counters */
+  SFLOWUS_init (&smp->sflow_usersock);
   SFLOWUS_open (&smp->sflow_usersock);
-  smp->sflow_usersock.group_id = SFLOW_NETLINK_USERSOCK_MULTICAST;
+  /* open DROPMON netlink channel for writing discard events */
+  SFLOWDM_init (&smp->sflow_dropmon);
+  SFLOWDM_open (&smp->sflow_dropmon);
   /* set up (or reset) sampling context for each thread */
   sflow_set_worker_sampling_state (smp);
 }
@@ -537,15 +692,17 @@ static void
 sflow_sampling_stop (sflow_main_t *smp)
 {
   SFLOW_INFO ("sflow_sampling_stop");
-  smp->running = 0;
+  smp->running = false;
   SFLOWPS_close (&smp->sflow_psample);
   SFLOWUS_close (&smp->sflow_usersock);
+  SFLOWDM_close (&smp->sflow_dropmon);
 }
 
 static void
 sflow_sampling_start_stop (sflow_main_t *smp)
 {
-  int run = (smp->samplingN != 0 && smp->interfacesEnabled != 0);
+  int run =
+    ((smp->samplingN != 0 && smp->interfacesEnabled != 0) || smp->dropM);
   if (run != smp->running)
     {
       if (run)
@@ -603,8 +760,75 @@ sflow_header_bytes (sflow_main_t *smp, u32 headerB)
   return 0;
 }
 
+void
+sflow_enable_disable_interface (sflow_main_t *smp,
+                               sflow_per_interface_data_t *sfif)
+{
+  bool ingress_on =
+    sfif->sflow_enabled && (smp->samplingD == SFLOW_DIRN_INGRESS ||
+                           smp->samplingD == SFLOW_DIRN_BOTH);
+  bool egress_on =
+    sfif->sflow_enabled &&
+    (smp->samplingD == SFLOW_DIRN_EGRESS || smp->samplingD == SFLOW_DIRN_BOTH);
+  bool drop_on = sfif->sflow_enabled && smp->dropM;
+  bool ingress_enabled = (vnet_feature_is_enabled ("device-input", "sflow",
+                                                  sfif->sw_if_index) == 1);
+  bool egress_enabled =
+    (vnet_feature_is_enabled ("interface-output", "sflow-egress",
+                             sfif->sw_if_index) == 1);
+  bool drop_enabled = (vnet_feature_is_enabled ("error-drop", "sflow-drop",
+                                               sfif->sw_if_index) == 1);
+
+  if (ingress_on != ingress_enabled)
+    vnet_feature_enable_disable ("device-input", "sflow", sfif->sw_if_index,
+                                ingress_on, 0, 0);
+  if (egress_on != egress_enabled)
+    vnet_feature_enable_disable ("interface-output", "sflow-egress",
+                                sfif->sw_if_index, egress_on, 0, 0);
+  if (drop_on != drop_enabled)
+    vnet_feature_enable_disable ("error-drop", "sflow-drop", sfif->sw_if_index,
+                                smp->dropM, 0, 0);
+}
+
+void
+sflow_enable_disable_all (sflow_main_t *smp)
+{
+  for (int ii = 0; ii < vec_len (smp->per_interface_data); ii++)
+    {
+      sflow_per_interface_data_t *sfif =
+       vec_elt_at_index (smp->per_interface_data, ii);
+      if (sfif && sfif->sflow_enabled)
+       sflow_enable_disable_interface (smp, sfif);
+    }
+}
+
 int
-sflow_enable_disable (sflow_main_t *smp, u32 sw_if_index, int enable_disable)
+sflow_direction (sflow_main_t *smp, sflow_direction_t samplingD)
+{
+  if (samplingD != smp->samplingD)
+    {
+      // direction changed - tell all active interfaces.
+      smp->samplingD = samplingD;
+      sflow_enable_disable_all (smp);
+    }
+  return 0;
+}
+
+int
+sflow_drop_monitoring (sflow_main_t *smp, bool dropM)
+{
+  if (dropM != smp->dropM)
+    {
+      // drop-monitoring changed.
+      smp->dropM = dropM;
+      // Tell all active interfaces.
+      sflow_enable_disable_all (smp);
+    }
+  return 0;
+}
+
+int
+sflow_enable_disable (sflow_main_t *smp, u32 sw_if_index, bool enable_disable)
 {
   vnet_sw_interface_t *sw;
 
@@ -641,8 +865,7 @@ sflow_enable_disable (sflow_main_t *smp, u32 sw_if_index, int enable_disable)
       sfif->hw_if_index = sw->hw_if_index;
       sfif->polled = 0;
       sfif->sflow_enabled = enable_disable;
-      vnet_feature_enable_disable ("device-input", "sflow", sw_if_index,
-                                  enable_disable, 0, 0);
+      sflow_enable_disable_interface (smp, sfif);
       smp->interfacesEnabled += (enable_disable) ? 1 : -1;
     }
 
@@ -746,20 +969,90 @@ sflow_header_bytes_command_fn (vlib_main_t *vm, unformat_input_t *input,
   return 0;
 }
 
+static clib_error_t *
+sflow_direction_command_fn (vlib_main_t *vm, unformat_input_t *input,
+                           vlib_cli_command_t *cmd)
+{
+  sflow_main_t *smp = &sflow_main;
+  u32 sampling_D = SFLOW_DIRN_UNDEFINED;
+
+  int rv;
+
+  while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
+    {
+      if (unformat (input, "rx"))
+       sampling_D = SFLOW_DIRN_INGRESS;
+      else if (unformat (input, "tx"))
+       sampling_D = SFLOW_DIRN_EGRESS;
+      else if (unformat (input, "both"))
+       sampling_D = SFLOW_DIRN_BOTH;
+      else
+       break;
+    }
+
+  if (sampling_D == SFLOW_DIRN_UNDEFINED)
+    return clib_error_return (
+      0, "Please specify a sampling direction (rx|tx|both)...");
+
+  rv = sflow_direction (smp, sampling_D);
+
+  switch (rv)
+    {
+    case 0:
+      break;
+    default:
+      return clib_error_return (0, "sflow_direction returned %d", rv);
+    }
+  return 0;
+}
+
+static clib_error_t *
+sflow_drop_monitoring_command_fn (vlib_main_t *vm, unformat_input_t *input,
+                                 vlib_cli_command_t *cmd)
+{
+  sflow_main_t *smp = &sflow_main;
+  bool drop_M = true;
+
+  int rv;
+
+  while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
+    {
+      if (unformat (input, "disable"))
+       drop_M = false;
+      else if (unformat (input, "enable"))
+       drop_M = true;
+      else
+       break;
+    }
+
+  rv = sflow_drop_monitoring (smp, drop_M);
+
+  switch (rv)
+    {
+    case 0:
+      break;
+    default:
+      return clib_error_return (0, "sflow_drop_monitoring returned %d", rv);
+    }
+  return 0;
+}
+
 static clib_error_t *
 sflow_enable_disable_command_fn (vlib_main_t *vm, unformat_input_t *input,
                                 vlib_cli_command_t *cmd)
 {
   sflow_main_t *smp = &sflow_main;
   u32 sw_if_index = ~0;
-  int enable_disable = 1;
+  int enable_disable = true;
 
   int rv;
 
   while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
     {
       if (unformat (input, "disable"))
-       enable_disable = 0;
+       enable_disable = false;
+      else if (unformat (input, "enable"))
+       enable_disable = true;
       else if (unformat (input, "%U", unformat_vnet_sw_interface,
                         smp->vnet_main, &sw_if_index))
        ;
@@ -793,6 +1086,23 @@ sflow_enable_disable_command_fn (vlib_main_t *vm, unformat_input_t *input,
   return 0;
 }
 
+static const char *
+sflow_direction_str (sflow_direction_t direction)
+{
+  switch (direction)
+    {
+    case SFLOW_DIRN_UNDEFINED:
+      return "undefined";
+    case SFLOW_DIRN_INGRESS:
+      return "rx";
+    case SFLOW_DIRN_EGRESS:
+      return "tx";
+    case SFLOW_DIRN_BOTH:
+      return "both";
+    }
+  return "none";
+}
+
 static clib_error_t *
 show_sflow_command_fn (vlib_main_t *vm, unformat_input_t *input,
                       vlib_cli_command_t *cmd)
@@ -800,9 +1110,12 @@ show_sflow_command_fn (vlib_main_t *vm, unformat_input_t *input,
   sflow_main_t *smp = &sflow_main;
   clib_error_t *error = NULL;
   vlib_cli_output (vm, "sflow sampling-rate %u\n", smp->samplingN);
-  vlib_cli_output (vm, "sflow sampling-direction ingress\n");
+  vlib_cli_output (vm, "sflow direction %s\n",
+                  sflow_direction_str (smp->samplingD));
   vlib_cli_output (vm, "sflow polling-interval %u\n", smp->pollingS);
   vlib_cli_output (vm, "sflow header-bytes %u\n", smp->headerB);
+  vlib_cli_output (vm, "sflow drop-monitoring %s\n",
+                  smp->dropM ? "enable" : "disable");
   u32 itfs_enabled = 0;
   for (int ii = 0; ii < vec_len (smp->per_interface_data); ii++)
     {
@@ -818,12 +1131,14 @@ show_sflow_command_fn (vlib_main_t *vm, unformat_input_t *input,
     }
   vlib_cli_output (vm, "Status\n");
   vlib_cli_output (vm, "  interfaces enabled: %u\n", itfs_enabled);
-  vlib_cli_output (vm, "  packet samples sent: %u\n",
-                  smp->psample_seq_ingress + smp->psample_seq_egress);
+  vlib_cli_output (vm, "  packet samples sent: %u\n", smp->psample_send);
   vlib_cli_output (vm, "  packet samples dropped: %u\n", total_drops (smp));
   vlib_cli_output (vm, "  counter samples sent: %u\n", smp->csample_send);
   vlib_cli_output (vm, "  counter samples dropped: %u\n",
                   smp->csample_send_drops);
+  vlib_cli_output (vm, "  drop samples sent: %u\n", smp->dropmon_send);
+  vlib_cli_output (vm, "  drop samples dropped: %u\n",
+                  smp->dropmon_send_drops);
   return error;
 }
 
@@ -851,6 +1166,18 @@ VLIB_CLI_COMMAND (sflow_header_bytes_command, static) = {
   .function = sflow_header_bytes_command_fn,
 };
 
+VLIB_CLI_COMMAND (sflow_direction_command, static) = {
+  .path = "sflow direction",
+  .short_help = "sflow direction <rx|tx|both>",
+  .function = sflow_direction_command_fn,
+};
+
+VLIB_CLI_COMMAND (sflow_drop_monitoring_command, static) = {
+  .path = "sflow drop-monitoring",
+  .short_help = "sflow drop-monitoring <enable|disable>",
+  .function = sflow_drop_monitoring_command_fn,
+};
+
 VLIB_CLI_COMMAND (show_sflow_command, static) = {
   .path = "show sflow",
   .short_help = "show sflow",
@@ -865,8 +1192,7 @@ vl_api_sflow_enable_disable_t_handler (vl_api_sflow_enable_disable_t *mp)
   sflow_main_t *smp = &sflow_main;
   int rv;
 
-  rv = sflow_enable_disable (smp, ntohl (mp->hw_if_index),
-                            (int) (mp->enable_disable));
+  rv = sflow_enable_disable (smp, ntohl (mp->hw_if_index), mp->enable_disable);
 
   REPLY_MACRO (VL_API_SFLOW_ENABLE_DISABLE_REPLY);
 }
@@ -939,6 +1265,51 @@ vl_api_sflow_header_bytes_get_t_handler (vl_api_sflow_header_bytes_get_t *mp)
                        ({ rmp->header_B = ntohl (smp->headerB); }));
 }
 
+static void
+vl_api_sflow_direction_set_t_handler (vl_api_sflow_direction_set_t *mp)
+{
+  vl_api_sflow_direction_set_reply_t *rmp;
+  sflow_main_t *smp = &sflow_main;
+  int rv;
+
+  rv = sflow_direction (smp, ntohl (mp->sampling_D));
+
+  REPLY_MACRO (VL_API_SFLOW_DIRECTION_SET_REPLY);
+}
+
+static void
+vl_api_sflow_direction_get_t_handler (vl_api_sflow_direction_get_t *mp)
+{
+  vl_api_sflow_direction_get_reply_t *rmp;
+  sflow_main_t *smp = &sflow_main;
+
+  REPLY_MACRO_DETAILS2 (VL_API_SFLOW_DIRECTION_GET_REPLY,
+                       ({ rmp->sampling_D = ntohl (smp->samplingD); }));
+}
+
+static void
+vl_api_sflow_drop_monitoring_set_t_handler (
+  vl_api_sflow_drop_monitoring_set_t *mp)
+{
+  vl_api_sflow_drop_monitoring_set_reply_t *rmp;
+  sflow_main_t *smp = &sflow_main;
+  int rv;
+  rv = sflow_drop_monitoring (smp, ntohl (mp->drop_M));
+
+  REPLY_MACRO (VL_API_SFLOW_DROP_MONITORING_SET_REPLY);
+}
+
+static void
+vl_api_sflow_drop_monitoring_get_t_handler (
+  vl_api_sflow_drop_monitoring_get_t *mp)
+{
+  vl_api_sflow_drop_monitoring_get_reply_t *rmp;
+  sflow_main_t *smp = &sflow_main;
+
+  REPLY_MACRO_DETAILS2 (VL_API_SFLOW_DROP_MONITORING_GET_REPLY,
+                       ({ rmp->drop_M = ntohl (smp->dropM); }));
+}
+
 static void
 send_sflow_interface_details (vpe_api_main_t *am, vl_api_registration_t *reg,
                              u32 context, const u32 hw_if_index)
@@ -1001,6 +1372,8 @@ sflow_init (vlib_main_t *vm)
   smp->samplingN = SFLOW_DEFAULT_SAMPLING_N;
   smp->pollingS = SFLOW_DEFAULT_POLLING_S;
   smp->headerB = SFLOW_DEFAULT_HEADER_BYTES;
+  smp->samplingD = SFLOW_DIRN_INGRESS;
+  smp->dropM = false;
 
   /* Add our API messages to the global name_crc hash table */
   smp->msg_id_base = setup_message_id_table ();
@@ -1030,6 +1403,19 @@ VNET_FEATURE_INIT (sflow, static) = {
   .runs_before = VNET_FEATURES ("ethernet-input"),
 };
 
+VNET_FEATURE_INIT (sflow_egress, static) = {
+  .arc_name = "interface-output",
+  .node_name = "sflow-egress",
+  .runs_before = VNET_FEATURES ("interface-output-arc-end"),
+};
+
+/* Add myself to the feature arc */
+VNET_FEATURE_INIT (sflow_drop, static) = {
+  .arc_name = "error-drop",
+  .node_name = "sflow-drop",
+  .runs_before = VNET_FEATURES ("drop"),
+};
+
 VLIB_PLUGIN_REGISTER () = {
   .version = VPP_BUILD_VER,
   .description = "sFlow random packet sampling",
index 0ec5ac9..ca087e5 100644 (file)
 #include <vppinfra/hash.h>
 #include <vppinfra/error.h>
 #include <sflow/sflow_common.h>
+#include <sflow/sflow_netlink.h>
 #include <sflow/sflow_psample.h>
 #include <sflow/sflow_usersock.h>
+#include <sflow/sflow_dropmon.h>
 
 #define SFLOW_DEFAULT_SAMPLING_N   10000
 #define SFLOW_DEFAULT_POLLING_S           20
@@ -33,6 +35,7 @@
 #define SFLOW_HEADER_BYTES_STEP           32
 
 #define SFLOW_FIFO_DEPTH  2048 // must be power of 2
+#define SFLOW_DROP_FIFO_DEPTH 4           // must be power of 2
 #define SFLOW_POLL_WAIT_S 0.001
 #define SFLOW_READ_BATCH  100
 
   _ (PROCESSED, "sflow packets processed")                                    \
   _ (SAMPLED, "sflow packets sampled")                                        \
   _ (DROPPED, "sflow packets dropped")                                        \
+  _ (DIPROCESSED, "sflow discards processed")                                 \
+  _ (DIDROPPED, "sflow discards dropped")                                     \
   _ (PSAMPLE_SEND, "sflow PSAMPLE sent")                                      \
-  _ (PSAMPLE_SEND_FAIL, "sflow PSAMPLE send failed")
+  _ (PSAMPLE_SEND_FAIL, "sflow PSAMPLE send failed")                          \
+  _ (DROPMON_SEND, "sflow DROPMON sent")                                      \
+  _ (DROPMON_SEND_FAIL, "sflow DROPMON send failed")
 
 typedef enum
 {
@@ -64,15 +71,29 @@ typedef struct
 /* packet sample */
 typedef struct
 {
+  u32 sample_type;
   u32 samplingN;
   u32 input_if_index;
   u32 output_if_index;
   u32 header_protocol;
   u32 sampled_packet_size;
   u32 header_bytes;
+  u32 drop_reason;
   u8 header[SFLOW_MAX_HEADER_BYTES];
 } sflow_sample_t;
 
+typedef enum
+{
+  SFLOW_SAMPLETYPE_UNDEFINED = 0,
+  SFLOW_SAMPLETYPE_INGRESS,
+  SFLOW_SAMPLETYPE_EGRESS,
+  SFLOW_SAMPLETYPE_DISCARD
+} sflow_enum_sample_t;
+
+#define SFLOW_MAX_TRAP_LEN 64
+#define SFLOW_TRAP_WHITE   '_'
+#define SFLOW_TRAP_PREFIX  "vpp_"
+
 // Define SPSC FIFO for sending samples worker-to-main.
 // (I did try to use VPP svm FIFO, but couldn't
 // understand why it was sometimes going wrong).
@@ -104,8 +125,49 @@ sflow_fifo_dequeue (sflow_fifo_t *fifo, sflow_sample_t *sample)
   u32 curr_tx = clib_atomic_load_acq_n (&fifo->tx);
   if (curr_rx == curr_tx)
     return false; // empty
-  memcpy (sample, &fifo->samples[curr_rx], sizeof (*sample));
   u32 next_rx = SFLOW_FIFO_NEXT (curr_rx);
+  memcpy (sample, &fifo->samples[next_rx], sizeof (*sample));
+  clib_atomic_store_rel_n (&fifo->rx, next_rx);
+  return true;
+}
+
+// Define SPSC DROP_FIFO for sending discard events worker-to-main.
+// For now the only difference from the FIFO above is the max depth,
+// but it proved awkward to make depth a variable and this way gives
+// us more freedom to experiment, e.g. with rate-limiting.
+// We also might decide to separate sflow_sample_t into
+// sflow_sample_t and sflow_drop_t if their fields diverge,
+// and doing this keeps that option open.
+typedef struct
+{
+  volatile u32 tx; // can change under consumer's feet
+  volatile u32 rx; // can change under producer's feet
+  sflow_sample_t samples[SFLOW_DROP_FIFO_DEPTH];
+} sflow_drop_fifo_t;
+
+#define SFLOW_DROP_FIFO_NEXT(slot) ((slot + 1) & (SFLOW_DROP_FIFO_DEPTH - 1))
+static inline int
+sflow_drop_fifo_enqueue (sflow_drop_fifo_t *fifo, sflow_sample_t *sample)
+{
+  u32 curr_rx = clib_atomic_load_acq_n (&fifo->rx);
+  u32 curr_tx = fifo->tx; // clib_atomic_load_acq_n(&fifo->tx);
+  u32 next_tx = SFLOW_DROP_FIFO_NEXT (curr_tx);
+  if (next_tx == curr_rx)
+    return false; // full
+  memcpy (&fifo->samples[next_tx], sample, sizeof (*sample));
+  clib_atomic_store_rel_n (&fifo->tx, next_tx);
+  return true;
+}
+
+static inline int
+sflow_drop_fifo_dequeue (sflow_drop_fifo_t *fifo, sflow_sample_t *sample)
+{
+  u32 curr_rx = fifo->rx; // clib_atomic_load_acq_n(&fifo->rx);
+  u32 curr_tx = clib_atomic_load_acq_n (&fifo->tx);
+  if (curr_rx == curr_tx)
+    return false; // empty
+  u32 next_rx = SFLOW_DROP_FIFO_NEXT (curr_rx);
+  memcpy (sample, &fifo->samples[next_rx], sizeof (*sample));
   clib_atomic_store_rel_n (&fifo->rx, next_rx);
   return true;
 }
@@ -119,8 +181,12 @@ typedef struct
   u32 seed;
   u32 smpl;
   u32 drop;
+  u32 dsmp;
+  u32 ddrp;
   CLIB_CACHE_LINE_ALIGN_MARK (_fifo);
   sflow_fifo_t fifo;
+  CLIB_CACHE_LINE_ALIGN_MARK (_drop_fifo);
+  sflow_drop_fifo_t drop_fifo;
 } sflow_per_thread_data_t;
 
 typedef u32 (*IfIndexLookupFn) (u32);
@@ -139,6 +205,8 @@ typedef struct
   u32 samplingN;
   u32 pollingS;
   u32 headerB;
+  sflow_direction_t samplingD;
+  bool dropM;
   u32 total_threads;
   sflow_per_interface_data_t *per_interface_data;
   sflow_per_thread_data_t *per_thread_data;
@@ -147,6 +215,8 @@ typedef struct
   SFLOWPS sflow_psample;
   /* usersock channel (periodic counters) */
   SFLOWUS sflow_usersock;
+  /* dropmon channel (rate-limited discards) */
+  SFLOWDM sflow_dropmon;
 #define SFLOW_NETLINK_USERSOCK_MULTICAST 29
   /* dropmon channel (packet drops) */
   // SFLOWDM sflow_dropmon;
@@ -155,13 +225,16 @@ typedef struct
   u32 now_mono_S;
 
   /* running control */
-  int running;
+  bool running;
   u32 interfacesEnabled;
 
   /* main-thread counters */
   u32 psample_seq_ingress;
   u32 psample_seq_egress;
+  u32 psample_send;
   u32 psample_send_drops;
+  u32 dropmon_send;
+  u32 dropmon_send_drops;
   u32 csample_send;
   u32 csample_send_drops;
   u32 unixsock_seq;
index 26f306b..766033d 100644 (file)
@@ -31,6 +31,15 @@ typedef struct
   int sflow_enabled;
 } sflow_per_interface_data_t;
 
+/* mirror sflow_direction enum in sflow.api */
+typedef enum
+{
+  SFLOW_DIRN_UNDEFINED = 0,
+  SFLOW_DIRN_INGRESS,
+  SFLOW_DIRN_EGRESS,
+  SFLOW_DIRN_BOTH
+} sflow_direction_t;
+
 #endif /* __included_sflow_common_h__ */
 
 /*
diff --git a/src/plugins/sflow/sflow_dropmon.c b/src/plugins/sflow/sflow_dropmon.c
new file mode 100644 (file)
index 0000000..d953c6b
--- /dev/null
@@ -0,0 +1,164 @@
+/*
+ * Copyright (c) 2025 InMon Corp.
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <vlib/vlib.h>
+#include <vnet/vnet.h>
+#include <vnet/pg/pg.h>
+#include <vppinfra/error.h>
+#include <sflow/sflow.h>
+
+#include <fcntl.h>
+#include <asm/types.h>
+#include <sys/socket.h>
+#include <linux/types.h>
+#include <linux/netlink.h>
+#include <linux/genetlink.h>
+#include <linux/net_dropmon.h>
+#include <net/if.h>
+#include <signal.h>
+#include <ctype.h>
+
+#include <sflow/sflow_netlink.h>
+#include <sflow/sflow_dropmon.h>
+
+/*_________________---------------------------__________________
+  _________________       SFLOWDM_init        __________________
+  -----------------___________________________------------------
+*/
+
+EnumSFLOWNLState
+SFLOWDM_init (SFLOWDM *dmt)
+{
+  dmt->nl.id = SFLOWNL_DROPMON;
+  memset (dmt->fam_name, 0, SFLOWDM_FAM_FOOTPRINT);
+  memcpy (dmt->fam_name, SFLOWDM_FAM, SFLOWDM_FAM_LEN);
+  dmt->nl.family_name = dmt->fam_name;
+  dmt->nl.family_len = SFLOWDM_FAM_LEN;
+  dmt->nl.join_group_id = NET_DM_GRP_ALERT;
+  dmt->nl.attr = dmt->attr;
+  dmt->nl.attr_max = SFLOWDM_ATTRS - 1;
+  dmt->nl.iov = dmt->iov;
+  dmt->nl.iov_max = SFLOWDM_IOV_FRAGS - 1;
+  dmt->nl.state = SFLOWNL_STATE_INIT;
+  return dmt->nl.state;
+}
+
+/*_________________---------------------------__________________
+  _________________       SFLOWDM_open        __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWDM_open (SFLOWDM *dmt)
+{
+  if (dmt->nl.state == SFLOWNL_STATE_UNDEFINED)
+    SFLOWDM_init (dmt);
+  if (dmt->nl.nl_sock == 0)
+    {
+      dmt->nl.nl_sock = sflow_netlink_generic_open (&dmt->nl);
+      if (dmt->nl.nl_sock > 0)
+       sflow_netlink_generic_get_family (&dmt->nl);
+    }
+  return (dmt->nl.nl_sock > 0);
+}
+
+/*_________________---------------------------__________________
+  _________________       SFLOWDM_close       __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWDM_close (SFLOWDM *dmt)
+{
+  return (sflow_netlink_close (&dmt->nl) == 0);
+}
+
+/*_________________---------------------------__________________
+  _________________       SFLOWDM_state       __________________
+  -----------------___________________________------------------
+*/
+
+EnumSFLOWNLState
+SFLOWDM_state (SFLOWDM *dmt)
+{
+  return dmt->nl.state;
+}
+
+/*_________________---------------------------__________________
+  _________________    SFLOWDM_open_step      __________________
+  -----------------___________________________------------------
+*/
+
+EnumSFLOWNLState
+SFLOWDM_open_step (SFLOWDM *dmt)
+{
+  switch (dmt->nl.state)
+    {
+    case SFLOWNL_STATE_UNDEFINED:
+      SFLOWDM_init (dmt);
+      break;
+    case SFLOWNL_STATE_INIT:
+      SFLOWDM_open (dmt);
+      break;
+    case SFLOWNL_STATE_OPEN:
+      sflow_netlink_generic_get_family (&dmt->nl);
+      break;
+    case SFLOWNL_STATE_WAIT_FAMILY:
+      sflow_netlink_read (&dmt->nl);
+      break;
+    case SFLOWNL_STATE_READY:
+      break;
+    }
+  return dmt->nl.state;
+}
+
+/*_________________---------------------------__________________
+  _________________    SFLOWDMSpec_setAttr    __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWDM_set_attr (SFLOWDM *dmt, int field, void *val, int len)
+{
+  return sflow_netlink_set_attr (&dmt->nl, field, val, len);
+}
+
+/*_________________---------------------------__________________
+  _________________    SFLOWDMSpec_send       __________________
+  -----------------___________________________------------------
+*/
+
+int
+SFLOWDM_send (SFLOWDM *dmt)
+{
+  dmt->nl.ge.cmd = NET_DM_CMD_PACKET_ALERT;
+  dmt->nl.ge.version = 0; // NET_DM_CFG_VERSION==0 but no NET_DM_CMD_VERSION
+  int status = sflow_netlink_send_attrs (&dmt->nl, true);
+  sflow_netlink_reset_attrs (&dmt->nl);
+  if (status <= 0)
+    {
+      SFLOW_ERR ("DROPMON strerror(errno) = %s; errno = %d\n",
+                strerror (errno), errno);
+    }
+  return status;
+}
+
+/*
+ * fd.io coding-style-patch-verification: ON
+ *
+ * Local Variables:
+ * eval: (c-set-style "gnu")
+ * End:
+ */
diff --git a/src/plugins/sflow/sflow_dropmon.h b/src/plugins/sflow/sflow_dropmon.h
new file mode 100644 (file)
index 0000000..06ff890
--- /dev/null
@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2025 InMon Corp.
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef __included_sflow_dropmon_h__
+#define __included_sflow_dropmon_h__
+
+#include <vlib/vlib.h>
+#include <vnet/vnet.h>
+#include <vnet/pg/pg.h>
+#include <vppinfra/error.h>
+#include <sflow/sflow.h>
+
+#include <asm/types.h>
+#include <sys/socket.h>
+#include <linux/types.h>
+#include <linux/netlink.h>
+#include <linux/genetlink.h>
+#include <linux/net_dropmon.h>
+#include <net/if.h>
+#include <signal.h>
+#include <ctype.h>
+
+#include <sflow/sflow_netlink.h>
+
+#define SFLOWDM_DROPMON_READNL_RCV_BUF 8192
+#define SFLOWDM_DROPMON_READNL_SND_BUF 1000000
+
+#ifndef NET_DM_GENL_NAME
+#define NET_DM_GENL_NAME "NET_DM"
+#endif
+
+#define SFLOWDM_FAM          NET_DM_GENL_NAME
+#define SFLOWDM_FAM_LEN              sizeof (SFLOWDM_FAM)
+#define SFLOWDM_FAM_FOOTPRINT NLMSG_ALIGN (SFLOWDM_FAM_LEN)
+#define SFLOWDM_ATTRS        NET_DM_ATTR_MAX + 1
+#define SFLOWDM_IOV_FRAGS     ((2 * SFLOWDM_ATTRS) + 2)
+
+typedef struct _SFLOWDM
+{
+  SFLOWNL nl;
+  char fam_name[SFLOWDM_FAM_FOOTPRINT];
+  SFLOWNLAttr attr[SFLOWDM_ATTRS];
+  struct iovec iov[SFLOWDM_IOV_FRAGS];
+} SFLOWDM;
+
+EnumSFLOWNLState SFLOWDM_init (SFLOWDM *dmt);
+bool SFLOWDM_open (SFLOWDM *dmt);
+bool SFLOWDM_close (SFLOWDM *dmt);
+EnumSFLOWNLState SFLOWDM_state (SFLOWDM *dmt);
+EnumSFLOWNLState SFLOWDM_open_step (SFLOWDM *dmt);
+
+bool SFLOWDM_set_attr (SFLOWDM *dmt, int field, void *buf, int len);
+#define SFLOWDM_set_attr_int(dmt, field, val)                                 \
+  SFLOWDM_set_attr ((dmt), (field), &(val), sizeof (val))
+
+int SFLOWDM_send (SFLOWDM *dmt);
+
+#endif /* __included_sflow_dropmon_h__ */
+
+/*
+ * fd.io coding-style-patch-verification: ON
+ *
+ * Local Variables:
+ * eval: (c-set-style "gnu")
+ * End:
+ */
diff --git a/src/plugins/sflow/sflow_netlink.c b/src/plugins/sflow/sflow_netlink.c
new file mode 100644 (file)
index 0000000..2c8949f
--- /dev/null
@@ -0,0 +1,480 @@
+/*
+ * Copyright (c) 2025 InMon Corp.
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <vlib/vlib.h>
+#include <vnet/vnet.h>
+#include <vppinfra/error.h>
+#include <sflow/sflow.h>
+
+#include <fcntl.h>
+#include <asm/types.h>
+#include <sys/socket.h>
+#include <linux/types.h>
+#include <linux/netlink.h>
+#include <linux/genetlink.h>
+#include <signal.h>
+#include <ctype.h>
+
+#include <sflow/sflow_netlink.h>
+
+/*_________________---------------------------__________________
+  _________________       fcntl utils         __________________
+  -----------------___________________________------------------
+*/
+
+void
+sflow_netlink_set_nonblocking (int fd)
+{
+  // set the socket to non-blocking
+  int fdFlags = fcntl (fd, F_GETFL);
+  fdFlags |= O_NONBLOCK;
+  if (fcntl (fd, F_SETFL, fdFlags) < 0)
+    {
+      SFLOW_ERR ("fcntl(O_NONBLOCK) failed: %s\n", strerror (errno));
+    }
+}
+
+void
+sflow_netlink_set_close_on_exec (int fd)
+{
+  // make sure it doesn't get inherited, e.g. when we fork a script
+  int fdFlags = fcntl (fd, F_GETFD);
+  fdFlags |= FD_CLOEXEC;
+  if (fcntl (fd, F_SETFD, fdFlags) < 0)
+    {
+      SFLOW_ERR ("fcntl(F_SETFD=FD_CLOEXEC) failed: %s\n", strerror (errno));
+    }
+}
+
+int
+sflow_netlink_set_send_buffer (int fd, int requested)
+{
+  int txbuf = 0;
+  socklen_t txbufsiz = sizeof (txbuf);
+  if (getsockopt (fd, SOL_SOCKET, SO_SNDBUF, &txbuf, &txbufsiz) < 0)
+    {
+      SFLOW_ERR ("getsockopt(SO_SNDBUF) failed: %s", strerror (errno));
+    }
+  if (txbuf < requested)
+    {
+      txbuf = requested;
+      if (setsockopt (fd, SOL_SOCKET, SO_SNDBUF, &txbuf, sizeof (txbuf)) < 0)
+       {
+         SFLOW_WARN ("setsockopt(SO_TXBUF=%d) failed: %s", requested,
+                     strerror (errno));
+       }
+      // see what we actually got
+      txbufsiz = sizeof (txbuf);
+      if (getsockopt (fd, SOL_SOCKET, SO_SNDBUF, &txbuf, &txbufsiz) < 0)
+       {
+         SFLOW_ERR ("getsockopt(SO_SNDBUF) failed: %s", strerror (errno));
+       }
+    }
+  return txbuf;
+}
+
+/*_________________---------------------------__________________
+  _________________       generic_pid         __________________
+  -----------------___________________________------------------
+  choose a 32-bit id that is likely to be unique even if more
+  than one module in this process wants to bind a netlink socket
+*/
+
+u32
+sflow_netlink_generic_pid (u32 mod_id)
+{
+  return ((mod_id << 16) + getpid ());
+}
+
+/*_________________---------------------------__________________
+  _________________       generic_open        __________________
+  -----------------___________________________------------------
+*/
+
+int
+sflow_netlink_generic_open (SFLOWNL *nl)
+{
+  nl->nl_sock = socket (AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
+  if (nl->nl_sock < 0)
+    {
+      SFLOW_ERR ("nl_sock open failed: %s\n", strerror (errno));
+      return -1;
+    }
+  // bind to a suitable id
+  struct sockaddr_nl sa = { .nl_family = AF_NETLINK,
+                           .nl_pid = sflow_netlink_generic_pid (nl->id) };
+  if (bind (nl->nl_sock, (struct sockaddr *) &sa, sizeof (sa)) < 0)
+    {
+      SFLOW_ERR ("sflow_netlink_generic_open: bind failed: sa.nl_pid=%u "
+                "sock=%d id=%d: %s\n",
+                sa.nl_pid, nl->nl_sock, nl->id, strerror (errno));
+    }
+  sflow_netlink_set_nonblocking (nl->nl_sock);
+  sflow_netlink_set_close_on_exec (nl->nl_sock);
+  sflow_netlink_set_send_buffer (nl->nl_sock, SFLOWNL_SND_BUF);
+  nl->state = SFLOWNL_STATE_OPEN;
+  return nl->nl_sock;
+}
+
+/*_________________---------------------------__________________
+  _________________      usersock_open        __________________
+  -----------------___________________________------------------
+*/
+
+int
+sflow_netlink_usersock_open (SFLOWNL *nl)
+{
+  nl->nl_sock = socket (AF_NETLINK, SOCK_RAW, NETLINK_USERSOCK);
+  if (nl->nl_sock < 0)
+    {
+      SFLOW_ERR ("nl_sock open failed: %s\n", strerror (errno));
+      return -1;
+    }
+  sflow_netlink_set_nonblocking (nl->nl_sock);
+  sflow_netlink_set_close_on_exec (nl->nl_sock);
+  nl->state = SFLOWNL_STATE_OPEN;
+  return nl->nl_sock;
+}
+
+/*_________________---------------------------__________________
+  _________________        close              __________________
+  -----------------___________________________------------------
+*/
+
+int
+sflow_netlink_close (SFLOWNL *nl)
+{
+  int err = 0;
+  if (nl->nl_sock > 0)
+    {
+      err = close (nl->nl_sock);
+      if (err == 0)
+       {
+         nl->nl_sock = 0;
+       }
+      else
+       {
+         SFLOW_ERR ("sflow_netlink_close: returned %d : %s\n", err,
+                    strerror (errno));
+       }
+    }
+  nl->state = SFLOWNL_STATE_INIT;
+  return err;
+}
+
+/*_________________---------------------------__________________
+  _________________        set_attr           __________________
+  -----------------___________________________------------------
+*/
+
+bool
+sflow_netlink_set_attr (SFLOWNL *nl, int field, void *val, int len)
+{
+  SFLOWNLAttr *psa = &nl->attr[field];
+  if (psa->included)
+    return false;
+  psa->included = true;
+  psa->attr.nla_type = field;
+  psa->attr.nla_len = sizeof (psa->attr) + len;
+  int len_w_pad = NLMSG_ALIGN (len);
+  psa->val.iov_len = len_w_pad;
+  psa->val.iov_base = val;
+  nl->n_attrs++;
+  nl->attrs_len += sizeof (psa->attr);
+  nl->attrs_len += len_w_pad;
+  return true;
+}
+
+/*_________________---------------------------__________________
+  _________________     generic_send_cmd      __________________
+  -----------------___________________________------------------
+*/
+
+int
+sflow_netlink_generic_send_cmd (int sockfd, u32 mod_id, int type, int cmd,
+                               int req_type, void *req, int req_len,
+                               int req_footprint, u32 seqNo)
+{
+  struct nlmsghdr nlh = {};
+  struct genlmsghdr ge = {};
+  struct nlattr attr = {};
+
+  attr.nla_len = sizeof (attr) + req_len;
+  attr.nla_type = req_type;
+
+  ge.cmd = cmd;
+  ge.version = 1;
+
+  nlh.nlmsg_len = NLMSG_LENGTH (req_footprint + sizeof (attr) + sizeof (ge));
+  nlh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+  nlh.nlmsg_type = type;
+  nlh.nlmsg_seq = seqNo;
+  nlh.nlmsg_pid = sflow_netlink_generic_pid (mod_id);
+
+  struct iovec iov[4] = { { .iov_base = &nlh, .iov_len = sizeof (nlh) },
+                         { .iov_base = &ge, .iov_len = sizeof (ge) },
+                         { .iov_base = &attr, .iov_len = sizeof (attr) },
+                         { .iov_base = req, .iov_len = req_footprint } };
+
+  struct sockaddr_nl sa = { .nl_family = AF_NETLINK };
+  struct msghdr msg = { .msg_name = &sa,
+                       .msg_namelen = sizeof (sa),
+                       .msg_iov = iov,
+                       .msg_iovlen = 4 };
+  return sendmsg (sockfd, &msg, 0);
+}
+
+/*_________________---------------------------__________________
+  _________________       send_attrs          __________________
+  -----------------___________________________------------------
+*/
+
+int
+sflow_netlink_send_attrs (SFLOWNL *nl, bool ge)
+{
+  if (ge)
+    {
+      nl->nlh.nlmsg_len = NLMSG_LENGTH (sizeof (nl->ge) + nl->attrs_len);
+      nl->nlh.nlmsg_type = nl->family_id;
+      nl->nlh.nlmsg_pid = sflow_netlink_generic_pid (nl->id);
+    }
+  else
+    {
+      nl->nlh.nlmsg_len = NLMSG_LENGTH (nl->attrs_len);
+      nl->nlh.nlmsg_pid = getpid ();
+    }
+
+  nl->nlh.nlmsg_flags = 0;
+  nl->nlh.nlmsg_seq = ++nl->nl_seq;
+
+  struct iovec *iov = nl->iov;
+  u32 frag = 0;
+  iov[frag].iov_base = &nl->nlh;
+  iov[frag].iov_len = sizeof (nl->nlh);
+  frag++;
+  if (ge)
+    {
+      iov[frag].iov_base = &nl->ge;
+      iov[frag].iov_len = sizeof (nl->ge);
+      frag++;
+    }
+  int nn = 0;
+  for (u32 ii = 0; ii <= nl->attr_max; ii++)
+    {
+      SFLOWNLAttr *psa = &nl->attr[ii];
+      if (psa->included)
+       {
+         nn++;
+         iov[frag].iov_base = &psa->attr;
+         iov[frag].iov_len = sizeof (psa->attr);
+         frag++;
+         iov[frag] = psa->val; // struct copy
+         frag++;
+       }
+    }
+  ASSERT (nn == nl->n_attrs);
+
+  struct sockaddr_nl da = { .nl_family = AF_NETLINK,
+                           .nl_groups = (1 << (nl->group_id - 1)) };
+
+  struct msghdr msg = { .msg_name = &da,
+                       .msg_namelen = sizeof (da),
+                       .msg_iov = iov,
+                       .msg_iovlen = frag };
+
+  return sendmsg (nl->nl_sock, &msg, 0);
+}
+
+/*_________________---------------------------__________________
+  _________________       reset_attrs         __________________
+  -----------------___________________________------------------
+*/
+
+void
+sflow_netlink_reset_attrs (SFLOWNL *nl)
+{
+  for (u32 ii = 0; ii <= nl->attr_max; ii++)
+    nl->attr[ii].included = false;
+  nl->n_attrs = 0;
+  nl->attrs_len = 0;
+}
+
+/*_________________---------------------------__________________
+  _________________   generic_get_family      __________________
+  -----------------___________________________------------------
+*/
+
+void
+sflow_netlink_generic_get_family (SFLOWNL *nl)
+{
+  int status = sflow_netlink_generic_send_cmd (
+    nl->nl_sock, nl->id, GENL_ID_CTRL, CTRL_CMD_GETFAMILY,
+    CTRL_ATTR_FAMILY_NAME, nl->family_name, nl->family_len,
+    NLMSG_ALIGN (nl->family_len), ++nl->nl_seq);
+  if (status >= 0)
+    nl->state = SFLOWNL_STATE_WAIT_FAMILY;
+}
+
+/*_________________---------------------------__________________
+  _________________       generic_read        __________________
+  -----------------___________________________------------------
+*/
+
+void
+sflow_netlink_generic_read (SFLOWNL *nl, struct nlmsghdr *nlh, int numbytes)
+{
+  int msglen = nlh->nlmsg_len;
+  if (msglen > numbytes)
+    {
+      SFLOW_ERR ("generic read msglen too long\n");
+      return;
+    }
+  if (msglen < (NLMSG_HDRLEN + GENL_HDRLEN + NLA_HDRLEN))
+    {
+      SFLOW_ERR ("generic read msglen too short\n");
+      return;
+    }
+  char *msg = (char *) NLMSG_DATA (nlh);
+  msglen -= NLMSG_HDRLEN;
+  struct genlmsghdr *genl = (struct genlmsghdr *) msg;
+  SFLOW_DBG ("generic netlink CMD = %u\n", genl->cmd);
+  msglen -= GENL_HDRLEN;
+
+  struct nlattr *attr0 = (struct nlattr *) (msg + GENL_HDRLEN);
+  for (int attrs_len = msglen; SFNLA_OK (attr0, attrs_len);
+       attr0 = SFNLA_NEXT (attr0, attrs_len))
+    {
+      switch (attr0->nla_type)
+       {
+       case CTRL_ATTR_VERSION:
+         nl->genetlink_version = *(u32 *) SFNLA_DATA (attr0);
+         break;
+       case CTRL_ATTR_FAMILY_ID:
+         nl->family_id = *(u16 *) SFNLA_DATA (attr0);
+         SFLOW_DBG ("generic family id: %u\n", nl->family_id);
+         break;
+       case CTRL_ATTR_FAMILY_NAME:
+         SFLOW_DBG ("generic family name: %s\n", (char *) SFNLA_DATA (attr0));
+         break;
+       case CTRL_ATTR_MCAST_GROUPS:
+         {
+           struct nlattr *attr1 = (struct nlattr *) SFNLA_DATA (attr0);
+           for (int attr0_len = SFNLA_PAYLOAD (attr0);
+                SFNLA_OK (attr1, attr0_len);
+                attr1 = SFNLA_NEXT (attr1, attr0_len))
+             {
+               char *grp_name = NULL;
+               u32 grp_id = 0;
+               struct nlattr *attr2 = SFNLA_DATA (attr1);
+               for (int attr1_len = SFNLA_PAYLOAD (attr1);
+                    SFNLA_OK (attr2, attr1_len);
+                    attr2 = SFNLA_NEXT (attr2, attr1_len))
+                 {
+
+                   switch (attr2->nla_type)
+                     {
+                     case CTRL_ATTR_MCAST_GRP_NAME:
+                       grp_name = SFNLA_DATA (attr2);
+                       SFLOW_DBG ("netlink multicast group: %s\n", grp_name);
+                       break;
+                     case CTRL_ATTR_MCAST_GRP_ID:
+                       grp_id = *(u32 *) SFNLA_DATA (attr2);
+                       SFLOW_DBG ("netlink multicast group id: %u\n", grp_id);
+                       break;
+                     }
+                 }
+               if (nl->group_id == 0 && grp_name &&
+                   (((nl->join_group_id != 0) &&
+                     grp_id == nl->join_group_id) ||
+                    ((nl->join_group_name != NULL) &&
+                     !strcmp (grp_name, nl->join_group_name))))
+                 {
+                   SFLOW_DBG ("netlink found group %s=%u\n", grp_name,
+                              grp_id);
+                   nl->group_id = grp_id;
+                   // We don't need to actually join the group if we
+                   // are only sending to it.
+                 }
+             }
+         }
+         break;
+       default:
+         SFLOW_DBG ("netlink attr type: %u (nested=%u) len: %u\n",
+                    attr0->nla_type, attr0->nla_type & NLA_F_NESTED,
+                    attr0->nla_len);
+         break;
+       }
+    }
+  if (nl->family_id && nl->group_id)
+    {
+      SFLOW_DBG ("netlink state->READY\n");
+      nl->state = SFLOWNL_STATE_READY;
+    }
+}
+
+/*_________________---------------------------__________________
+  _________________   sflow_netlink_read      __________________
+  -----------------___________________________------------------
+*/
+
+void
+sflow_netlink_read (SFLOWNL *nl)
+{
+  uint8_t recv_buf[SFLOWNL_RCV_BUF];
+  memset (recv_buf, 0, SFLOWNL_RCV_BUF); // for coverity
+  int numbytes = recv (nl->nl_sock, recv_buf, sizeof (recv_buf), 0);
+  if (numbytes <= sizeof (struct nlmsghdr))
+    {
+      SFLOW_ERR ("sflow_netlink_read returned %d : %s\n", numbytes,
+                strerror (errno));
+      return;
+    }
+  for (struct nlmsghdr *nlh = (struct nlmsghdr *) recv_buf;
+       NLMSG_OK (nlh, numbytes); nlh = NLMSG_NEXT (nlh, numbytes))
+    {
+      if (nlh->nlmsg_type == NLMSG_DONE)
+       break;
+      if (nlh->nlmsg_type == NLMSG_ERROR)
+       {
+         struct nlmsgerr *err_msg = (struct nlmsgerr *) NLMSG_DATA (nlh);
+         if (err_msg->error == 0)
+           {
+             SFLOW_DBG ("received Netlink ACK\n");
+           }
+         else
+           {
+             SFLOW_ERR ("error in netlink message: %d : %s\n", err_msg->error,
+                        strerror (-err_msg->error));
+           }
+         return;
+       }
+      if (nlh->nlmsg_type == NETLINK_GENERIC)
+       {
+         sflow_netlink_generic_read (nl, nlh, numbytes);
+       }
+      else if (nlh->nlmsg_type == nl->family_id)
+       {
+         // We are write-only, don't need to read these.
+       }
+    }
+}
+
+/*
+ * fd.io coding-style-patch-verification: ON
+ *
+ * Local Variables:
+ * eval: (c-set-style "gnu")
+ * End:
+ */
diff --git a/src/plugins/sflow/sflow_netlink.h b/src/plugins/sflow/sflow_netlink.h
new file mode 100644 (file)
index 0000000..0330658
--- /dev/null
@@ -0,0 +1,130 @@
+/*
+ * Copyright (c) 2025 InMon Corp.
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef __included_sflow_netlink_h__
+#define __included_sflow_netlink_h__
+
+#include <vlib/vlib.h>
+#include <vnet/vnet.h>
+#include <vppinfra/error.h>
+#include <sflow/sflow.h>
+
+#include <fcntl.h>
+#include <asm/types.h>
+#include <sys/socket.h>
+#include <linux/types.h>
+#include <linux/netlink.h>
+#include <linux/genetlink.h>
+#include <signal.h>
+#include <ctype.h>
+
+#define SFLOWNL_RCV_BUF 8192
+#define SFLOWNL_SND_BUF 1000000
+
+typedef enum
+{
+  SFLOWNL_USERSOCK = 1,
+  SFLOWNL_PSAMPLE,
+  SFLOWNL_DROPMON,
+} EnumSFLOWNLMod;
+
+typedef enum
+{
+  SFLOWNL_STATE_UNDEFINED = 0,
+  SFLOWNL_STATE_INIT,
+  SFLOWNL_STATE_OPEN,
+  SFLOWNL_STATE_WAIT_FAMILY,
+  SFLOWNL_STATE_READY
+} EnumSFLOWNLState;
+
+typedef struct _SFLOWNLAttr
+{
+  bool included : 1;
+  struct nlattr attr;
+  struct iovec val;
+} SFLOWNLAttr;
+
+typedef struct _SFLOWNL
+{
+  // connect
+  EnumSFLOWNLState state;
+  EnumSFLOWNLMod id;
+  int nl_sock;
+  u32 nl_seq;
+  u32 genetlink_version;
+  u16 family_id;
+  u32 group_id;
+  // setup
+  char *family_name;
+  u32 family_len;
+  u32 join_group_id;
+  char *join_group_name;
+  // msg
+  struct nlmsghdr nlh;
+  struct genlmsghdr ge;
+  SFLOWNLAttr *attr;
+  u32 attr_max;
+  u32 n_attrs;
+  u32 attrs_len;
+  u32 iov_max;
+  struct iovec *iov;
+} SFLOWNL;
+
+void sflow_netlink_set_nonblocking (int fd);
+void sflow_netlink_set_close_on_exec (int fd);
+int sflow_netlink_set_send_buffer (int fd, int requested);
+u32 sflow_netlink_generic_pid (u32 mod_id);
+int sflow_netlink_generic_open (SFLOWNL *nl);
+int sflow_netlink_usersock_open (SFLOWNL *nl);
+int sflow_netlink_close (SFLOWNL *nl);
+bool sflow_netlink_set_attr (SFLOWNL *nl, int field, void *val, int len);
+
+#define sflow_netlink_set_attr_int(nl, field, val)                            \
+  sflow_netlink_set_attr ((nl), (field), &(val), sizeof (val))
+
+int sflow_netlink_generic_send_cmd (int sockfd, u32 mod_id, int type, int cmd,
+                                   int req_type, void *req, int req_len,
+                                   int req_footprint, u32 seqNo);
+int sflow_netlink_send_attrs (SFLOWNL *nl, bool ge);
+void sflow_netlink_reset_attrs (SFLOWNL *nl);
+void sflow_netlink_generic_get_family (SFLOWNL *nl);
+void sflow_netlink_generic_read (SFLOWNL *nl, struct nlmsghdr *nlh,
+                                int numbytes);
+void sflow_netlink_read (SFLOWNL *nl);
+
+/* Provide the netlink attribute-walking macros that are strangely
+ * missing from netlink.h, so we can walk attributes the same way
+ * as we walk messages (and satisfy static-analysis algorithms that
+ * are wary of looping over "tainted" input).
+ */
+#define SFNLA_OK(nla, len)                                                    \
+  ((len) > 0 && (nla)->nla_len >= sizeof (struct nlattr) &&                   \
+   (nla)->nla_len <= (len))
+#define SFNLA_NEXT(nla, attrlen)                                              \
+  ((attrlen) -= NLA_ALIGN ((nla)->nla_len),                                   \
+   (struct nlattr *) (((char *) (nla)) + NLA_ALIGN ((nla)->nla_len)))
+#define SFNLA_LENGTH(len)  (NLA_ALIGN (sizeof (struct nlattr)) + (len))
+#define SFNLA_SPACE(len)   (NLA_ALIGN (SFNLA_LENGTH (len)))
+#define SFNLA_DATA(nla)           ((void *) (((char *) (nla)) + SFNLA_LENGTH (0)))
+#define SFNLA_PAYLOAD(nla) ((int) ((nla)->nla_len) - SFNLA_LENGTH (0))
+#endif /* __included_sflow_netlink_h__ */
+
+/*
+ * fd.io coding-style-patch-verification: ON
+ *
+ * Local Variables:
+ * eval: (c-set-style "gnu")
+ * End:
+ */
index 41df454..4684338 100644 (file)
 #include <signal.h>
 #include <ctype.h>
 
+#include <sflow/sflow_netlink.h>
 #include <sflow/sflow_psample.h>
 
-  /*_________________---------------------------__________________
-    _________________       fcntl utils         __________________
-    -----------------___________________________------------------
-  */
-
-  static void
-  setNonBlocking (int fd)
-  {
-    // set the socket to non-blocking
-    int fdFlags = fcntl (fd, F_GETFL);
-    fdFlags |= O_NONBLOCK;
-    if (fcntl (fd, F_SETFL, fdFlags) < 0)
-      {
-       SFLOW_ERR ("fcntl(O_NONBLOCK) failed: %s\n", strerror (errno));
-      }
-  }
-
-  static void
-  setCloseOnExec (int fd)
-  {
-    // make sure it doesn't get inherited, e.g. when we fork a script
-    int fdFlags = fcntl (fd, F_GETFD);
-    fdFlags |= FD_CLOEXEC;
-    if (fcntl (fd, F_SETFD, fdFlags) < 0)
-      {
-       SFLOW_ERR ("fcntl(F_SETFD=FD_CLOEXEC) failed: %s\n", strerror (errno));
-      }
-  }
-
-  static int
-  setSendBuffer (int fd, int requested)
-  {
-    int txbuf = 0;
-    socklen_t txbufsiz = sizeof (txbuf);
-    if (getsockopt (fd, SOL_SOCKET, SO_SNDBUF, &txbuf, &txbufsiz) < 0)
-      {
-       SFLOW_ERR ("getsockopt(SO_SNDBUF) failed: %s", strerror (errno));
-      }
-    if (txbuf < requested)
-      {
-       txbuf = requested;
-       if (setsockopt (fd, SOL_SOCKET, SO_SNDBUF, &txbuf, sizeof (txbuf)) < 0)
-         {
-           SFLOW_WARN ("setsockopt(SO_TXBUF=%d) failed: %s", requested,
-                       strerror (errno));
-         }
-       // see what we actually got
-       txbufsiz = sizeof (txbuf);
-       if (getsockopt (fd, SOL_SOCKET, SO_SNDBUF, &txbuf, &txbufsiz) < 0)
-         {
-           SFLOW_ERR ("getsockopt(SO_SNDBUF) failed: %s", strerror (errno));
-         }
-      }
-    return txbuf;
-  }
-
-  /*_________________---------------------------__________________
-    _________________        generic_pid        __________________
-    -----------------___________________________------------------
-    choose a 32-bit id that is likely to be unique even if more
-    than one module in this process wants to bind a netlink socket
-  */
-
-  static u32
-  generic_pid (u32 mod_id)
-  {
-    return (mod_id << 16) | getpid ();
-  }
-
-  /*_________________---------------------------__________________
-    _________________        generic_open       __________________
-    -----------------___________________________------------------
-  */
-
-  static int
-  generic_open (u32 mod_id)
-  {
-    int nl_sock = socket (AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
-    if (nl_sock < 0)
-      {
-       SFLOW_ERR ("nl_sock open failed: %s\n", strerror (errno));
-       return -1;
-      }
-    // bind to a suitable id
-    struct sockaddr_nl sa = { .nl_family = AF_NETLINK,
-                             .nl_pid = generic_pid (mod_id) };
-    if (bind (nl_sock, (struct sockaddr *) &sa, sizeof (sa)) < 0)
-      SFLOW_ERR ("generic_open: bind failed: %s\n", strerror (errno));
-    setNonBlocking (nl_sock);
-    setCloseOnExec (nl_sock);
-    return nl_sock;
-  }
-
-  /*_________________---------------------------__________________
-    _________________       generic_send        __________________
-    -----------------___________________________------------------
-  */
-
-  static int
-  generic_send (int sockfd, u32 mod_id, int type, int cmd, int req_type,
-               void *req, int req_len, int req_footprint, u32 seqNo)
-  {
-    struct nlmsghdr nlh = {};
-    struct genlmsghdr ge = {};
-    struct nlattr attr = {};
-
-    attr.nla_len = sizeof (attr) + req_len;
-    attr.nla_type = req_type;
-
-    ge.cmd = cmd;
-    ge.version = 1;
-
-    nlh.nlmsg_len = NLMSG_LENGTH (req_footprint + sizeof (attr) + sizeof (ge));
-    nlh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-    nlh.nlmsg_type = type;
-    nlh.nlmsg_seq = seqNo;
-    nlh.nlmsg_pid = generic_pid (mod_id);
-
-    struct iovec iov[4] = { { .iov_base = &nlh, .iov_len = sizeof (nlh) },
-                           { .iov_base = &ge, .iov_len = sizeof (ge) },
-                           { .iov_base = &attr, .iov_len = sizeof (attr) },
-                           { .iov_base = req, .iov_len = req_footprint } };
-
-    struct sockaddr_nl sa = { .nl_family = AF_NETLINK };
-    struct msghdr msg = { .msg_name = &sa,
-                         .msg_namelen = sizeof (sa),
-                         .msg_iov = iov,
-                         .msg_iovlen = 4 };
-    return sendmsg (sockfd, &msg, 0);
-  }
-
-  /*_________________---------------------------__________________
-    _________________    getFamily_PSAMPLE      __________________
-    -----------------___________________________------------------
-  */
-
-  static void
-  getFamily_PSAMPLE (SFLOWPS *pst)
-  {
-#define SFLOWPS_FAM_LEN              sizeof (PSAMPLE_GENL_NAME)
-#define SFLOWPS_FAM_FOOTPRINT NLMSG_ALIGN (SFLOWPS_FAM_LEN)
-    char fam_name[SFLOWPS_FAM_FOOTPRINT] = {};
-    memcpy (fam_name, PSAMPLE_GENL_NAME, SFLOWPS_FAM_LEN);
-    generic_send (pst->nl_sock, pst->id, GENL_ID_CTRL, CTRL_CMD_GETFAMILY,
-                 CTRL_ATTR_FAMILY_NAME, fam_name, SFLOWPS_FAM_LEN,
-                 SFLOWPS_FAM_FOOTPRINT, ++pst->nl_seq);
-    pst->state = SFLOWPS_STATE_WAIT_FAMILY;
-  }
-
-  /*_________________---------------------------__________________
-    _________________  processNetlink_GENERIC   __________________
-    -----------------___________________________------------------
-  */
-
-  static void
-  processNetlink_GENERIC (SFLOWPS *pst, struct nlmsghdr *nlh)
-  {
-    char *msg = (char *) NLMSG_DATA (nlh);
-    int msglen = nlh->nlmsg_len - NLMSG_HDRLEN;
-    struct genlmsghdr *genl = (struct genlmsghdr *) msg;
-    SFLOW_DBG ("generic netlink CMD = %u\n", genl->cmd);
-
-    for (int offset = GENL_HDRLEN; offset < msglen;)
-      {
-       struct nlattr *attr = (struct nlattr *) (msg + offset);
-       if (attr->nla_len == 0 || (attr->nla_len + offset) > msglen)
-         {
-           SFLOW_ERR ("processNetlink_GENERIC attr parse error\n");
-           break; // attr parse error
-         }
-       char *attr_datap = (char *) attr + NLA_HDRLEN;
-       switch (attr->nla_type)
-         {
-         case CTRL_ATTR_VERSION:
-           pst->genetlink_version = *(u32 *) attr_datap;
-           break;
-         case CTRL_ATTR_FAMILY_ID:
-           pst->family_id = *(u16 *) attr_datap;
-           SFLOW_DBG ("generic family id: %u\n", pst->family_id);
-           break;
-         case CTRL_ATTR_FAMILY_NAME:
-           SFLOW_DBG ("generic family name: %s\n", attr_datap);
-           break;
-         case CTRL_ATTR_MCAST_GROUPS:
-           for (int grp_offset = NLA_HDRLEN; grp_offset < attr->nla_len;)
-             {
-               struct nlattr *grp_attr =
-                 (struct nlattr *) (msg + offset + grp_offset);
-               if (grp_attr->nla_len == 0 ||
-                   (grp_attr->nla_len + grp_offset) > attr->nla_len)
-                 {
-                   SFLOW_ERR (
-                     "processNetlink_GENERIC grp_attr parse error\n");
-                   break;
-                 }
-               char *grp_name = NULL;
-               u32 grp_id = 0;
-               for (int gf_offset = NLA_HDRLEN;
-                    gf_offset < grp_attr->nla_len;)
-                 {
-                   struct nlattr *gf_attr =
-                     (struct nlattr *) (msg + offset + grp_offset +
-                                        gf_offset);
-                   if (gf_attr->nla_len == 0 ||
-                       (gf_attr->nla_len + gf_offset) > grp_attr->nla_len)
-                     {
-                       SFLOW_ERR (
-                         "processNetlink_GENERIC gf_attr parse error\n");
-                       break;
-                     }
-                   char *grp_attr_datap = (char *) gf_attr + NLA_HDRLEN;
-                   switch (gf_attr->nla_type)
-                     {
-                     case CTRL_ATTR_MCAST_GRP_NAME:
-                       grp_name = grp_attr_datap;
-                       SFLOW_DBG ("psample multicast group: %s\n", grp_name);
-                       break;
-                     case CTRL_ATTR_MCAST_GRP_ID:
-                       grp_id = *(u32 *) grp_attr_datap;
-                       SFLOW_DBG ("psample multicast group id: %u\n", grp_id);
-                       break;
-                     }
-                   gf_offset += NLMSG_ALIGN (gf_attr->nla_len);
-                 }
-               if (pst->group_id == 0 && grp_name && grp_id &&
-                   !strcmp (grp_name, PSAMPLE_NL_MCGRP_SAMPLE_NAME))
-                 {
-                   SFLOW_DBG ("psample found group %s=%u\n", grp_name,
-                              grp_id);
-                   pst->group_id = grp_id;
-                   // We don't need to join the group if we are only sending
-                   // to it.
-                 }
-
-               grp_offset += NLMSG_ALIGN (grp_attr->nla_len);
-             }
-           break;
-         default:
-           SFLOW_DBG ("psample attr type: %u (nested=%u) len: %u\n",
-                      attr->nla_type, attr->nla_type & NLA_F_NESTED,
-                      attr->nla_len);
-           break;
-         }
-       offset += NLMSG_ALIGN (attr->nla_len);
-      }
-    if (pst->family_id && pst->group_id)
-      {
-       SFLOW_DBG ("psample state->READY\n");
-       pst->state = SFLOWPS_STATE_READY;
-      }
-  }
-
-  // TODO: we can take out the fns for reading PSAMPLE here
-
-  /*_________________---------------------------__________________
-    _________________      processNetlink       __________________
-    -----------------___________________________------------------
-  */
-
-  static void
-  processNetlink (SFLOWPS *pst, struct nlmsghdr *nlh)
-  {
-    if (nlh->nlmsg_type == NETLINK_GENERIC)
-      {
-       processNetlink_GENERIC (pst, nlh);
-      }
-    else if (nlh->nlmsg_type == pst->family_id)
-      {
-       // We are write-only, don't need to read these.
-      }
-  }
-
-  /*_________________---------------------------__________________
-    _________________   readNetlink_PSAMPLE     __________________
-    -----------------___________________________------------------
-  */
-
-  static void
-  readNetlink_PSAMPLE (SFLOWPS *pst, int fd)
-  {
-    uint8_t recv_buf[SFLOWPS_PSAMPLE_READNL_RCV_BUF];
-    int numbytes = recv (fd, recv_buf, sizeof (recv_buf), 0);
-    if (numbytes <= 0)
-      {
-       SFLOW_ERR ("readNetlink_PSAMPLE returned %d : %s\n", numbytes,
-                  strerror (errno));
-       return;
-      }
-    struct nlmsghdr *nlh = (struct nlmsghdr *) recv_buf;
-    while (NLMSG_OK (nlh, numbytes))
-      {
-       if (nlh->nlmsg_type == NLMSG_DONE)
-         break;
-       if (nlh->nlmsg_type == NLMSG_ERROR)
-         {
-           struct nlmsgerr *err_msg = (struct nlmsgerr *) NLMSG_DATA (nlh);
-           if (err_msg->error == 0)
-             {
-               SFLOW_DBG ("received Netlink ACK\n");
-             }
-           else
-             {
-               SFLOW_ERR ("error in netlink message: %d : %s\n",
-                          err_msg->error, strerror (-err_msg->error));
-             }
-           return;
-         }
-       processNetlink (pst, nlh);
-       nlh = NLMSG_NEXT (nlh, numbytes);
-      }
-  }
-
-  /*_________________---------------------------__________________
-    _________________       SFLOWPS_open        __________________
-    -----------------___________________________------------------
-  */
-
-  bool
-  SFLOWPS_open (SFLOWPS *pst)
-  {
-    if (pst->nl_sock == 0)
-      {
-       pst->nl_sock = generic_open (pst->id);
-       if (pst->nl_sock > 0)
-         {
-           pst->state = SFLOWPS_STATE_OPEN;
-           setSendBuffer (pst->nl_sock, SFLOWPS_PSAMPLE_READNL_SND_BUF);
-           getFamily_PSAMPLE (pst);
-         }
-      }
-    return (pst->nl_sock > 0);
-  }
-
-  /*_________________---------------------------__________________
-    _________________       SFLOWPS_close       __________________
-    -----------------___________________________------------------
-  */
-
-  bool
-  SFLOWPS_close (SFLOWPS *pst)
-  {
-    if (pst->nl_sock > 0)
-      {
-       int err = close (pst->nl_sock);
-       if (err == 0)
-         {
-           pst->nl_sock = 0;
-           return true;
-         }
-       else
-         {
-           SFLOW_ERR ("SFLOWPS_close: returned %d : %s\n", err,
-                      strerror (errno));
-         }
-      }
-    return false;
-  }
-
-  /*_________________---------------------------__________________
-    _________________       SFLOWPS_state       __________________
-    -----------------___________________________------------------
-  */
-
-  EnumSFLOWPSState
-  SFLOWPS_state (SFLOWPS *pst)
-  {
-    return pst->state;
-  }
-
-  /*_________________---------------------------__________________
-    _________________    SFLOWPS_open_step      __________________
-    -----------------___________________________------------------
-  */
-
-  EnumSFLOWPSState
-  SFLOWPS_open_step (SFLOWPS *pst)
-  {
-    switch (pst->state)
-      {
-      case SFLOWPS_STATE_INIT:
-       SFLOWPS_open (pst);
-       break;
-      case SFLOWPS_STATE_OPEN:
-       getFamily_PSAMPLE (pst);
-       break;
-      case SFLOWPS_STATE_WAIT_FAMILY:
-       readNetlink_PSAMPLE (pst, pst->nl_sock);
-       break;
-      case SFLOWPS_STATE_READY:
-       break;
-      }
-    return pst->state;
-  }
-
-  /*_________________---------------------------__________________
-    _________________    SFLOWPSSpec_setAttr    __________________
-    -----------------___________________________------------------
-  */
-
-  bool
-  SFLOWPSSpec_setAttr (SFLOWPSSpec *spec, EnumSFLOWPSAttributes field,
-                      void *val, int len)
-  {
-    SFLOWPSAttr *psa = &spec->attr[field];
-    if (psa->included)
+/*_________________---------------------------__________________
+  _________________       SFLOWPS_init        __________________
+  -----------------___________________________------------------
+*/
+
+EnumSFLOWNLState
+SFLOWPS_init (SFLOWPS *pst)
+{
+  pst->nl.id = SFLOWNL_PSAMPLE;
+  memset (pst->fam_name, 0, SFLOWPS_FAM_FOOTPRINT);
+  memcpy (pst->fam_name, SFLOWPS_FAM, SFLOWPS_FAM_LEN);
+  pst->nl.family_name = pst->fam_name;
+  pst->nl.family_len = SFLOWPS_FAM_LEN;
+  pst->nl.join_group_name = PSAMPLE_NL_MCGRP_SAMPLE_NAME;
+  pst->nl.attr = pst->attr;
+  pst->nl.attr_max = __SFLOWPS_PSAMPLE_ATTRS - 1;
+  pst->nl.iov = pst->iov;
+  pst->nl.iov_max = SFLOWPS_IOV_FRAGS - 1;
+  pst->nl.state = SFLOWNL_STATE_INIT;
+  return pst->nl.state;
+}
+
+/*_________________---------------------------__________________
+  _________________       SFLOWPS_open        __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWPS_open (SFLOWPS *pst)
+{
+  if (pst->nl.state == SFLOWNL_STATE_UNDEFINED)
+    SFLOWPS_init (pst);
+  if (pst->nl.nl_sock == 0)
+    {
+      pst->nl.nl_sock = sflow_netlink_generic_open (&pst->nl);
+      if (pst->nl.nl_sock > 0)
+       sflow_netlink_generic_get_family (&pst->nl);
+    }
+  return (pst->nl.nl_sock > 0);
+}
+
+/*_________________---------------------------__________________
+  _________________       SFLOWPS_close       __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWPS_close (SFLOWPS *pst)
+{
+  return (sflow_netlink_close (&pst->nl) == 0);
+}
+
+/*_________________---------------------------__________________
+  _________________       SFLOWPS_state       __________________
+  -----------------___________________________------------------
+*/
+
+EnumSFLOWNLState
+SFLOWPS_state (SFLOWPS *pst)
+{
+  return pst->nl.state;
+}
+
+/*_________________---------------------------__________________
+  _________________    SFLOWPS_open_step      __________________
+  -----------------___________________________------------------
+*/
+
+EnumSFLOWNLState
+SFLOWPS_open_step (SFLOWPS *pst)
+{
+  switch (pst->nl.state)
+    {
+    case SFLOWNL_STATE_UNDEFINED:
+      SFLOWPS_init (pst);
+      break;
+    case SFLOWNL_STATE_INIT:
+      SFLOWPS_open (pst);
+      break;
+    case SFLOWNL_STATE_OPEN:
+      sflow_netlink_generic_get_family (&pst->nl);
+      break;
+    case SFLOWNL_STATE_WAIT_FAMILY:
+      sflow_netlink_read (&pst->nl);
+      break;
+    case SFLOWNL_STATE_READY:
+      break;
+    }
+  return pst->nl.state;
+}
+
+/*_________________---------------------------__________________
+  _________________    SFLOWPS_set_attr       __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWPS_set_attr (SFLOWPS *pst, EnumSFLOWPSAttributes field, void *val,
+                 int len)
+{
+  int expected_len = SFLOWPS_Fields[field].len;
+  if (expected_len && expected_len != len)
+    {
+      SFLOW_ERR ("SFLOWPS_set_attr(%s) length=%u != expected: %u\n",
+                SFLOWPS_Fields[field].descr, len, expected_len);
       return false;
-    psa->included = true;
-    int expected_len = SFLOWPS_Fields[field].len;
-    if (expected_len && expected_len != len)
-      {
-       SFLOW_ERR ("SFLOWPSSpec_setAttr(%s) length=%u != expected: %u\n",
-                  SFLOWPS_Fields[field].descr, len, expected_len);
-       return false;
-      }
-    psa->attr.nla_type = field;
-    psa->attr.nla_len = sizeof (psa->attr) + len;
-    int len_w_pad = NLMSG_ALIGN (len);
-    psa->val.iov_len = len_w_pad;
-    psa->val.iov_base = val;
-    spec->n_attrs++;
-    spec->attrs_len += sizeof (psa->attr);
-    spec->attrs_len += len_w_pad;
-    return true;
-  }
-
-  /*_________________---------------------------__________________
-    _________________    SFLOWPSSpec_send       __________________
-    -----------------___________________________------------------
-  */
-
-  int
-  SFLOWPSSpec_send (SFLOWPS *pst, SFLOWPSSpec *spec)
-  {
-    spec->nlh.nlmsg_len = NLMSG_LENGTH (sizeof (spec->ge) + spec->attrs_len);
-    spec->nlh.nlmsg_flags = 0;
-    spec->nlh.nlmsg_type = pst->family_id;
-    spec->nlh.nlmsg_seq = ++pst->nl_seq;
-    spec->nlh.nlmsg_pid = generic_pid (pst->id);
-
-    spec->ge.cmd = PSAMPLE_CMD_SAMPLE;
-    spec->ge.version = PSAMPLE_GENL_VERSION;
-
-#define MAX_IOV_FRAGMENTS (2 * __SFLOWPS_PSAMPLE_ATTR_MAX) + 2
-
-    struct iovec iov[MAX_IOV_FRAGMENTS];
-    u32 frag = 0;
-    iov[frag].iov_base = &spec->nlh;
-    iov[frag].iov_len = sizeof (spec->nlh);
-    frag++;
-    iov[frag].iov_base = &spec->ge;
-    iov[frag].iov_len = sizeof (spec->ge);
-    frag++;
-    int nn = 0;
-    for (u32 ii = 0; ii < __SFLOWPS_PSAMPLE_ATTR_MAX; ii++)
-      {
-       SFLOWPSAttr *psa = &spec->attr[ii];
-       if (psa->included)
-         {
-           nn++;
-           iov[frag].iov_base = &psa->attr;
-           iov[frag].iov_len = sizeof (psa->attr);
-           frag++;
-           iov[frag] = psa->val; // struct copy
-           frag++;
-         }
-      }
-    ASSERT (nn == spec->n_attrs);
+    }
+  return sflow_netlink_set_attr (&pst->nl, field, val, len);
+}
+
+/*_________________---------------------------__________________
+  _________________        SFLOWPS_send       __________________
+  -----------------___________________________------------------
+*/
+
+int
+SFLOWPS_send (SFLOWPS *pst)
+{
+  pst->nl.ge.cmd = PSAMPLE_CMD_SAMPLE;
+  pst->nl.ge.version = PSAMPLE_GENL_VERSION;
+  int status = sflow_netlink_send_attrs (&pst->nl, true);
+  sflow_netlink_reset_attrs (&pst->nl);
+  if (status <= 0)
+    {
+      SFLOW_ERR ("PSAMPLE strerror(errno) = %s; errno = %d\n",
+                strerror (errno), errno);
+    }
+  return status;
+}
 
-    struct sockaddr_nl da = { .nl_family = AF_NETLINK,
-                             .nl_groups = (1 << (pst->group_id - 1)) };
-
-    struct msghdr msg = { .msg_name = &da,
-                         .msg_namelen = sizeof (da),
-                         .msg_iov = iov,
-                         .msg_iovlen = frag };
-
-    int status = sendmsg (pst->nl_sock, &msg, 0);
-    if (status <= 0)
-      {
-       SFLOW_ERR ("strerror(errno) = %s; errno = %d\n", strerror (errno),
-                  errno);
-       return -1;
-      }
-    return 0;
-  }
+/*
+ * fd.io coding-style-patch-verification: ON
+ *
+ * Local Variables:
+ * eval: (c-set-style "gnu")
+ * End:
+ */
index 5d49442..5b6e7c2 100644 (file)
@@ -31,6 +31,8 @@
 #include <signal.h>
 #include <ctype.h>
 
+#include <sflow/sflow_netlink.h>
+
 // #define SFLOWPS_DEBUG
 
 #define SFLOWPS_PSAMPLE_READNL_RCV_BUF 8192
@@ -45,7 +47,7 @@ typedef enum
 #define SFLOWPS_FIELDDATA(field, len, descr) field,
 #include "sflow/sflow_psample_fields.h"
 #undef SFLOWPS_FIELDDATA
-  __SFLOWPS_PSAMPLE_ATTR_MAX
+  __SFLOWPS_PSAMPLE_ATTRS
 } EnumSFLOWPSAttributes;
 
 typedef struct _SFLOWPS_field_t
@@ -61,51 +63,38 @@ static const SFLOWPS_field_t SFLOWPS_Fields[] = {
 #undef SFLOWPS_FIELDDATA
 };
 
-typedef enum
-{
-  SFLOWPS_STATE_INIT,
-  SFLOWPS_STATE_OPEN,
-  SFLOWPS_STATE_WAIT_FAMILY,
-  SFLOWPS_STATE_READY
-} EnumSFLOWPSState;
+#define SFLOWPS_FAM          PSAMPLE_GENL_NAME
+#define SFLOWPS_FAM_LEN              sizeof (SFLOWPS_FAM)
+#define SFLOWPS_FAM_FOOTPRINT NLMSG_ALIGN (SFLOWPS_FAM_LEN)
+#define SFLOWPS_IOV_FRAGS     ((2 * __SFLOWPS_PSAMPLE_ATTRS) + 2)
 
 typedef struct _SFLOWPS
 {
-  EnumSFLOWPSState state;
-  u32 id;
-  int nl_sock;
-  u32 nl_seq;
-  u32 genetlink_version;
-  u16 family_id;
-  u32 group_id;
+  SFLOWNL nl;
+  char fam_name[SFLOWPS_FAM_FOOTPRINT];
+  SFLOWNLAttr attr[__SFLOWPS_PSAMPLE_ATTRS];
+  struct iovec iov[SFLOWPS_IOV_FRAGS];
 } SFLOWPS;
 
-typedef struct _SFLOWPSAttr
-{
-  bool included : 1;
-  struct nlattr attr;
-  struct iovec val;
-} SFLOWPSAttr;
-
-typedef struct _SFLOWPSSpec
-{
-  struct nlmsghdr nlh;
-  struct genlmsghdr ge;
-  SFLOWPSAttr attr[__SFLOWPS_PSAMPLE_ATTR_MAX];
-  int n_attrs;
-  int attrs_len;
-} SFLOWPSSpec;
-
+EnumSFLOWNLState SFLOWPS_init (SFLOWPS *pst);
 bool SFLOWPS_open (SFLOWPS *pst);
 bool SFLOWPS_close (SFLOWPS *pst);
-EnumSFLOWPSState SFLOWPS_state (SFLOWPS *pst);
-EnumSFLOWPSState SFLOWPS_open_step (SFLOWPS *pst);
+EnumSFLOWNLState SFLOWPS_state (SFLOWPS *pst);
+EnumSFLOWNLState SFLOWPS_open_step (SFLOWPS *pst);
 
-bool SFLOWPSSpec_setAttr (SFLOWPSSpec *spec, EnumSFLOWPSAttributes field,
-                         void *buf, int len);
-#define SFLOWPSSpec_setAttrInt(spec, field, val)                              \
-  SFLOWPSSpec_setAttr ((spec), (field), &(val), sizeof (val))
+bool SFLOWPS_set_attr (SFLOWPS *pst, EnumSFLOWPSAttributes field, void *buf,
+                      int len);
+#define SFLOWPS_set_attr_int(pst, field, val)                                 \
+  SFLOWPS_set_attr ((pst), (field), &(val), sizeof (val))
 
-int SFLOWPSSpec_send (SFLOWPS *pst, SFLOWPSSpec *spec);
+int SFLOWPS_send (SFLOWPS *pst);
 
 #endif /* __included_sflow_psample_h__ */
+
+/*
+ * fd.io coding-style-patch-verification: ON
+ *
+ * Local Variables:
+ * eval: (c-set-style "gnu")
+ * End:
+ */
index 5548066..cdf5a7f 100644 (file)
@@ -27,6 +27,9 @@ uword unformat_sw_if_index (unformat_input_t *input, va_list *args);
 #include <sflow/sflow.api_enum.h>
 #include <sflow/sflow.api_types.h>
 
+/* for token names */
+#include <sflow/sflow_common.h>
+
 typedef struct
 {
   /* API message ID base */
@@ -40,7 +43,7 @@ static int
 api_sflow_enable_disable (vat_main_t *vam)
 {
   unformat_input_t *i = vam->input;
-  int enable_disable = 1;
+  int enable_disable = true;
   u32 hw_if_index = ~0;
   vl_api_sflow_enable_disable_t *mp;
   int ret;
@@ -51,7 +54,9 @@ api_sflow_enable_disable (vat_main_t *vam)
       if (unformat (i, "%U", unformat_sw_if_index, vam, &hw_if_index))
        ;
       else if (unformat (i, "disable"))
-       enable_disable = 0;
+       enable_disable = false;
+      else if (unformat (i, "enable"))
+       enable_disable = true;
       else
        break;
     }
@@ -258,6 +263,128 @@ api_sflow_header_bytes_set (vat_main_t *vam)
   return ret;
 }
 
+static void
+vl_api_sflow_direction_get_reply_t_handler (
+  vl_api_sflow_direction_get_reply_t *mp)
+{
+  vat_main_t *vam = sflow_test_main.vat_main;
+  clib_warning ("sflow direction: %d", ntohl (mp->sampling_D));
+  vam->result_ready = 1;
+}
+
+static int
+api_sflow_direction_get (vat_main_t *vam)
+{
+  vl_api_sflow_direction_get_t *mp;
+  int ret;
+
+  /* Construct the API message */
+  M (SFLOW_DIRECTION_GET, mp);
+
+  /* send it... */
+  S (mp);
+
+  /* Wait for a reply... */
+  W (ret);
+  return ret;
+}
+
+static int
+api_sflow_direction_set (vat_main_t *vam)
+{
+  unformat_input_t *i = vam->input;
+  sflow_direction_t sampling_D = SFLOW_DIRN_UNDEFINED;
+  vl_api_sflow_direction_set_t *mp;
+  int ret;
+
+  /* Parse args required to build the message */
+  while (unformat_check_input (i) != UNFORMAT_END_OF_INPUT)
+    {
+      if (unformat (i, "sampling_D rx"))
+       sampling_D = SFLOW_DIRN_INGRESS;
+      else if (unformat (i, "sampling_D tx"))
+       sampling_D = SFLOW_DIRN_INGRESS;
+      else if (unformat (i, "sampling_D both"))
+       sampling_D = SFLOW_DIRN_BOTH;
+      else
+       break;
+    }
+
+  if (sampling_D == SFLOW_DIRN_UNDEFINED)
+    {
+      errmsg ("missing sampling_D direction\n");
+      return -99;
+    }
+
+  /* Construct the API message */
+  M (SFLOW_DIRECTION_SET, mp);
+  mp->sampling_D = ntohl (sampling_D);
+
+  /* send it... */
+  S (mp);
+
+  /* Wait for a reply... */
+  W (ret);
+  return ret;
+}
+
+static void
+vl_api_sflow_drop_monitoring_get_reply_t_handler (
+  vl_api_sflow_drop_monitoring_get_reply_t *mp)
+{
+  vat_main_t *vam = sflow_test_main.vat_main;
+  clib_warning ("sflow drop_M: %d", mp->drop_M);
+  vam->result_ready = 1;
+}
+
+static int
+api_sflow_drop_monitoring_get (vat_main_t *vam)
+{
+  vl_api_sflow_drop_monitoring_get_t *mp;
+  int ret;
+
+  /* Construct the API message */
+  M (SFLOW_DROP_MONITORING_GET, mp);
+
+  /* send it... */
+  S (mp);
+
+  /* Wait for a reply... */
+  W (ret);
+  return ret;
+}
+
+static int
+api_sflow_drop_monitoring_set (vat_main_t *vam)
+{
+  unformat_input_t *i = vam->input;
+  u32 drop_M = 1;
+  vl_api_sflow_drop_monitoring_set_t *mp;
+  int ret;
+
+  /* Parse args required to build the message */
+  while (unformat_check_input (i) != UNFORMAT_END_OF_INPUT)
+    {
+      if (unformat (i, "disable"))
+       drop_M = 0;
+      else if (unformat (i, "enable"))
+       drop_M = 1;
+      else
+       break;
+    }
+
+  /* Construct the API message */
+  M (SFLOW_DROP_MONITORING_SET, mp);
+  mp->drop_M = ntohl (drop_M);
+
+  /* send it... */
+  S (mp);
+
+  /* Wait for a reply... */
+  W (ret);
+  return ret;
+}
+
 static void
 vl_api_sflow_interface_details_t_handler (vl_api_sflow_interface_details_t *mp)
 {
index 0ccb947..a81fd02 100644 (file)
  * limitations under the License.
  */
 
-#if defined(__cplusplus)
-extern "C"
-{
-#endif
-
 #include <vlib/vlib.h>
 #include <vnet/vnet.h>
 #include <vnet/pg/pg.h>
@@ -32,191 +27,134 @@ extern "C"
 #include <signal.h>
 #include <ctype.h>
 
+#include <sflow/sflow_netlink.h>
 #include <sflow/sflow_usersock.h>
 
-  /*_________________---------------------------__________________
-    _________________       fcntl utils         __________________
-    -----------------___________________________------------------
-  */
-
-  static void
-  setNonBlocking (int fd)
-  {
-    // set the socket to non-blocking
-    int fdFlags = fcntl (fd, F_GETFL);
-    fdFlags |= O_NONBLOCK;
-    if (fcntl (fd, F_SETFL, fdFlags) < 0)
-      {
-       SFLOW_ERR ("fcntl(O_NONBLOCK) failed: %s\n", strerror (errno));
-      }
-  }
-
-  static void
-  setCloseOnExec (int fd)
-  {
-    // make sure it doesn't get inherited, e.g. when we fork a script
-    int fdFlags = fcntl (fd, F_GETFD);
-    fdFlags |= FD_CLOEXEC;
-    if (fcntl (fd, F_SETFD, fdFlags) < 0)
-      {
-       SFLOW_ERR ("fcntl(F_SETFD=FD_CLOEXEC) failed: %s\n", strerror (errno));
-      }
-  }
-
-  /*_________________---------------------------__________________
-    _________________       usersock_open       __________________
-    -----------------___________________________------------------
-  */
-
-  static int
-  usersock_open (void)
-  {
-    int nl_sock = socket (AF_NETLINK, SOCK_RAW, NETLINK_USERSOCK);
-    if (nl_sock < 0)
-      {
-       SFLOW_ERR ("nl_sock open failed: %s\n", strerror (errno));
-       return -1;
-      }
-    setNonBlocking (nl_sock);
-    setCloseOnExec (nl_sock);
-    return nl_sock;
-  }
-
-  /*_________________---------------------------__________________
-    _________________       SFLOWUS_open        __________________
-    -----------------___________________________------------------
-  */
-
-  bool
-  SFLOWUS_open (SFLOWUS *ust)
-  {
-    if (ust->nl_sock == 0)
-      {
-       ust->nl_sock = usersock_open ();
-      }
-    return true;
-  }
-
-  /*_________________---------------------------__________________
-    _________________       SFLOWUS_close       __________________
-    -----------------___________________________------------------
-  */
-
-  bool
-  SFLOWUS_close (SFLOWUS *ust)
-  {
-    if (ust->nl_sock != 0)
-      {
-       int err = close (ust->nl_sock);
-       if (err == 0)
-         {
-           ust->nl_sock = 0;
-           return true;
-         }
-       else
-         {
-           SFLOW_WARN ("SFLOWUS_close: returned %d : %s\n", err,
-                       strerror (errno));
-         }
-      }
-    return false;
-  }
-
-  /*_________________---------------------------__________________
-    _________________  SFLOWUSSpec_setMsgType   __________________
-    -----------------___________________________------------------
-  */
-
-  bool
-  SFLOWUSSpec_setMsgType (SFLOWUSSpec *spec, EnumSFlowVppMsgType msgType)
-  {
-    spec->nlh.nlmsg_type = msgType;
-    return true;
-  }
-
-  /*_________________---------------------------__________________
-    _________________    SFLOWUSSpec_setAttr    __________________
-    -----------------___________________________------------------
-  */
-
-  bool
-  SFLOWUSSpec_setAttr (SFLOWUSSpec *spec, EnumSFlowVppAttributes field,
-                      void *val, int len)
-  {
-    SFLOWUSAttr *usa = &spec->attr[field];
-    if (usa->included)
-      return false;
-    usa->included = true;
-    usa->attr.nla_type = field;
-    usa->attr.nla_len = sizeof (usa->attr) + len;
-    int len_w_pad = NLMSG_ALIGN (len);
-    usa->val.iov_len = len_w_pad;
-    usa->val.iov_base = val;
-    spec->n_attrs++;
-    spec->attrs_len += sizeof (usa->attr);
-    spec->attrs_len += len_w_pad;
-    return true;
-  }
-
-  /*_________________---------------------------__________________
-    _________________    SFLOWUSSpec_send       __________________
-    -----------------___________________________------------------
-  */
-
-  int
-  SFLOWUSSpec_send (SFLOWUS *ust, SFLOWUSSpec *spec)
-  {
-    spec->nlh.nlmsg_len = NLMSG_LENGTH (spec->attrs_len);
-    spec->nlh.nlmsg_flags = 0;
-    spec->nlh.nlmsg_seq = ++ust->nl_seq;
-    spec->nlh.nlmsg_pid = getpid ();
-
-#define MAX_IOV_FRAGMENTS (2 * __SFLOW_VPP_ATTR_MAX) + 2
-
-    struct iovec iov[MAX_IOV_FRAGMENTS];
-    u32 frag = 0;
-    iov[frag].iov_base = &spec->nlh;
-    iov[frag].iov_len = sizeof (spec->nlh);
-    frag++;
-    int nn = 0;
-    for (u32 ii = 0; ii < __SFLOW_VPP_ATTR_MAX; ii++)
-      {
-       SFLOWUSAttr *usa = &spec->attr[ii];
-       if (usa->included)
-         {
-           nn++;
-           iov[frag].iov_base = &usa->attr;
-           iov[frag].iov_len = sizeof (usa->attr);
-           frag++;
-           iov[frag] = usa->val; // struct copy
-           frag++;
-         }
-      }
-    ASSERT (nn == spec->n_attrs);
-
-    struct sockaddr_nl da = {
-      .nl_family = AF_NETLINK,
-      .nl_groups = (1 << (ust->group_id - 1)) // for multicast to the group
-      // .nl_pid = 1e9+6343 // for unicast to receiver bound to netlink socket
-      // with that "pid"
-    };
-
-    struct msghdr msg = { .msg_name = &da,
-                         .msg_namelen = sizeof (da),
-                         .msg_iov = iov,
-                         .msg_iovlen = frag };
-
-    int status = sendmsg (ust->nl_sock, &msg, 0);
-    if (status <= 0)
-      {
-       // Linux replies with ECONNREFUSED when
-       // a multicast is sent via NETLINK_USERSOCK, but
-       // it's not an error so we can just ignore it here.
-       if (errno != ECONNREFUSED)
-         {
-           SFLOW_DBG ("USERSOCK strerror(errno) = %s\n", strerror (errno));
-           return -1;
-         }
-      }
-    return 0;
-  }
+/*_________________---------------------------__________________
+  _________________       SFLOWUS_init        __________________
+  -----------------___________________________------------------
+*/
+
+EnumSFLOWNLState
+SFLOWUS_init (SFLOWUS *ust)
+{
+  ust->nl.id = SFLOWNL_USERSOCK;
+  ust->nl.group_id = SFLOW_NETLINK_USERSOCK_MULTICAST;
+  ust->nl.attr = ust->attr;
+  ust->nl.attr_max = SFLOWUS_ATTRS - 1;
+  ust->nl.iov = ust->iov;
+  ust->nl.iov_max = SFLOWUS_IOV_FRAGS - 1;
+  ust->nl.state = SFLOWNL_STATE_INIT;
+  return ust->nl.state;
+}
+
+/*_________________---------------------------__________________
+  _________________       SFLOWUS_open        __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWUS_open (SFLOWUS *ust)
+{
+  if (ust->nl.state == SFLOWNL_STATE_UNDEFINED)
+    SFLOWUS_init (ust);
+  if (ust->nl.nl_sock == 0)
+    sflow_netlink_usersock_open (&ust->nl);
+  if (ust->nl.nl_sock > 0)
+    {
+      ust->nl.state = SFLOWNL_STATE_READY;
+      return true;
+    }
+  return false;
+}
+
+/*_________________---------------------------__________________
+  _________________       SFLOWUS_close       __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWUS_close (SFLOWUS *ust)
+{
+  return (sflow_netlink_close (&ust->nl) == 0);
+}
+
+/*_________________---------------------------__________________
+  _________________  SFLOWUS_set_msg_type     __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWUS_set_msg_type (SFLOWUS *ust, EnumSFlowVppMsgType msgType)
+{
+  ust->nl.nlh.nlmsg_type = msgType;
+  return true;
+}
+
+/*_________________---------------------------__________________
+  _________________    SFLOWUS_open_step      __________________
+  -----------------___________________________------------------
+*/
+
+EnumSFLOWNLState
+SFLOWUS_open_step (SFLOWUS *ust)
+{
+  switch (ust->nl.state)
+    {
+    case SFLOWNL_STATE_UNDEFINED:
+      SFLOWUS_init (ust);
+      break;
+    case SFLOWNL_STATE_INIT:
+      SFLOWUS_open (ust);
+      break;
+    case SFLOWNL_STATE_OPEN:
+    case SFLOWNL_STATE_WAIT_FAMILY:
+    case SFLOWNL_STATE_READY:
+      break;
+    }
+  return ust->nl.state;
+}
+
+/*_________________---------------------------__________________
+  _________________    SFLOWUS_set_attr       __________________
+  -----------------___________________________------------------
+*/
+
+bool
+SFLOWUS_set_attr (SFLOWUS *ust, EnumSFlowVppAttributes field, void *val,
+                 int len)
+{
+  return sflow_netlink_set_attr (&ust->nl, field, val, len);
+}
+
+/*_________________---------------------------__________________
+  _________________        SFLOWUS_send       __________________
+  -----------------___________________________------------------
+*/
+
+int
+SFLOWUS_send (SFLOWUS *ust)
+{
+  int status = sflow_netlink_send_attrs (&ust->nl, false);
+  sflow_netlink_reset_attrs (&ust->nl);
+  if (status <= 0)
+    {
+      // Linux replies with ECONNREFUSED when
+      // a multicast is sent via NETLINK_USERSOCK, but
+      // it's not an error so we can just ignore it here.
+      if (errno != ECONNREFUSED)
+       {
+         SFLOW_DBG ("USERSOCK strerror(errno) = %s\n", strerror (errno));
+         return -1;
+       }
+    }
+  return 0;
+}
+
+/*
+ * fd.io coding-style-patch-verification: ON
+ *
+ * Local Variables:
+ * eval: (c-set-style "gnu")
+ * End:
+ */
index d663899..607d087 100644 (file)
@@ -29,6 +29,8 @@
 #include <signal.h>
 #include <ctype.h>
 
+#include <sflow/sflow_netlink.h>
+
 // ==================== shared with hsflowd mod_vpp =========================
 // See https://github.com/sflow/host-sflow
 
@@ -67,7 +69,7 @@ typedef enum
   SFLOW_VPP_ATTR_DROPS,              /* u32 all FIFO and netlink sendmsg drops */
   SFLOW_VPP_ATTR_SEQ,        /* u32 send seq no */
   /* enum shared with hsflowd, so only add here */
-  __SFLOW_VPP_ATTR_MAX
+  __SFLOW_VPP_ATTRS
 } EnumSFlowVppAttributes;
 
 #define SFLOW_VPP_PSAMPLE_GROUP_INGRESS 3
@@ -96,38 +98,35 @@ typedef struct _SFLOWUS_field_t
   int len;
 } SFLOWUS_field_t;
 
+#define SFLOWUS_ATTRS __SFLOW_VPP_ATTRS
+#define SFLOWUS_IOV_FRAGS                                                     \
+  ((2 * SFLOWUS_ATTRS) + 2) // TODO: may only be +1 -- no ge header?
+
 typedef struct _SFLOWUS
 {
-  u32 id;
-  int nl_sock;
-  u32 nl_seq;
-  u32 group_id;
+  SFLOWNL nl;
+  SFLOWNLAttr attr[__SFLOW_VPP_ATTRS];
+  struct iovec iov[SFLOWUS_IOV_FRAGS];
 } SFLOWUS;
 
-typedef struct _SFLOWUSAttr
-{
-  bool included : 1;
-  struct nlattr attr;
-  struct iovec val;
-} SFLOWUSAttr;
-
-typedef struct _SFLOWUSSpec
-{
-  struct nlmsghdr nlh;
-  SFLOWUSAttr attr[__SFLOW_VPP_ATTR_MAX];
-  int n_attrs;
-  int attrs_len;
-} SFLOWUSSpec;
-
+EnumSFLOWNLState SFLOWUS_init (SFLOWUS *ust);
 bool SFLOWUS_open (SFLOWUS *ust);
 bool SFLOWUS_close (SFLOWUS *ust);
 
-bool SFLOWUSSpec_setMsgType (SFLOWUSSpec *spec, EnumSFlowVppMsgType type);
-bool SFLOWUSSpec_setAttr (SFLOWUSSpec *spec, EnumSFlowVppAttributes field,
-                         void *buf, int len);
-#define SFLOWUSSpec_setAttrInt(spec, field, val)                              \
-  SFLOWUSSpec_setAttr ((spec), (field), &(val), sizeof (val))
+bool SFLOWUS_set_msg_type (SFLOWUS *ust, EnumSFlowVppMsgType type);
+bool SFLOWUS_set_attr (SFLOWUS *ust, EnumSFlowVppAttributes field, void *buf,
+                      int len);
+#define SFLOWUS_set_attr_int(ust, field, val)                                 \
+  SFLOWUS_set_attr ((ust), (field), &(val), sizeof (val))
 
-int SFLOWUSSpec_send (SFLOWUS *ust, SFLOWUSSpec *spec);
+int SFLOWUS_send (SFLOWUS *ust);
 
 #endif /* __included_sflow_usersock_h__ */
+
+/*
+ * fd.io coding-style-patch-verification: ON
+ *
+ * Local Variables:
+ * eval: (c-set-style "gnu")
+ * End:
+ */
index 0cb4b53..8c5c5b1 100644 (file)
@@ -1070,6 +1070,7 @@ typedef struct
 
   /* feature_arc_index */
   u8 output_feature_arc_index;
+  u8 drop_feature_arc_index;
 
   /* fast lookup tables */
   u32 *hw_if_index_by_sw_if_index;
index 47844dc..8f023a8 100644 (file)
@@ -950,7 +950,6 @@ interface_drop_punt (vlib_main_t * vm,
   vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
   u32 sw_if_indices[VLIB_FRAME_SIZE];
   vlib_simple_counter_main_t *cm;
-  u16 nexts[VLIB_FRAME_SIZE];
   u32 n_trace;
   vnet_main_t *vnm;
 
@@ -1002,7 +1001,6 @@ interface_drop_punt (vlib_main_t * vm,
     interface_trace_buffers (vm, node, frame);
 
   /* All going to drop regardless, this is just a counting exercise */
-  clib_memset (nexts, 0, sizeof (nexts));
 
   cm = vec_elt_at_index (vnm->interface_main.sw_if_counters,
                         (disposition == VNET_ERROR_DISPOSITION_PUNT
@@ -1063,7 +1061,12 @@ interface_drop_punt (vlib_main_t * vm,
          (cm, thread_index, sw_if0->sup_sw_if_index, count);
     }
 
-  vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
+  vnet_interface_main_t *im = &vnm->interface_main;
+  u8 arc_index = im->drop_feature_arc_index;
+  u32 nextAll = 0;
+  vnet_feature_arc_start (arc_index, sw_if_index[0], &nextAll, bufs[0]);
+  vlib_buffer_enqueue_to_single_next (vm, node, from, nextAll,
+                                     frame->n_vectors);
 
   return frame->n_vectors;
 }
@@ -1430,6 +1433,17 @@ vnet_set_interface_output_node (vnet_main_t * vnm,
 }
 #endif /* CLIB_MARCH_VARIANT */
 
+VNET_FEATURE_ARC_INIT (interface_drop, static) = {
+  .arc_name = "error-drop",
+  .start_nodes = VNET_FEATURES ("error-drop"),
+  .last_in_arc = "drop",
+  .arc_index_ptr = &vnet_main.interface_main.drop_feature_arc_index,
+};
+
+VNET_FEATURE_INIT (drop, static) = { .arc_name = "error-drop",
+                                    .node_name = "drop",
+                                    .runs_before = 0 };
+
 /*
  * fd.io coding-style-patch-verification: ON
  *
diff --git a/test/test_sflow_drop.py b/test/test_sflow_drop.py
new file mode 100644 (file)
index 0000000..d60e2f0
--- /dev/null
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+
+import unittest
+from framework import VppTestCase
+from asfframework import VppTestRunner
+from scapy.layers.l2 import Ether
+from scapy.packet import Raw
+from scapy.layers.inet import IP, UDP
+from random import randint
+import re  # for finding counters in "sh errors" output
+
+
+class SFlowDropTestCase(VppTestCase):
+    """sFlow test case"""
+
+    @classmethod
+    def setUpClass(self):
+        super(SFlowDropTestCase, self).setUpClass()
+
+    @classmethod
+    def teadDownClass(cls):
+        super(SFlowDropTestCase, cls).tearDownClass()
+
+    def setUp(self):
+        self.create_pg_interfaces(range(2))  #  create pg0 and pg1
+        for i in self.pg_interfaces:
+            i.admin_up()  # put the interface up
+            i.config_ip4()  # configure IPv4 address on the interface
+            i.resolve_arp()  # resolve ARP, so that we know VPP MAC
+
+    def tearDown(self):
+        for i in self.pg_interfaces:
+            i.admin_down()
+            i.unconfig()
+            i.set_table_ip4(0)
+            i.set_table_ip6(0)
+
+    def enable_sflow_via_api(self):
+        ## TEST: Enable both interfaces
+        ret = self.vapi.sflow_enable_disable(hw_if_index=1, enable_disable=True)
+        self.assertEqual(ret.retval, 0)
+        ret = self.vapi.sflow_enable_disable(hw_if_index=2, enable_disable=True)
+        self.assertEqual(ret.retval, 0)
+
+        ## TEST: sflow_sampling_rate_set()
+        self.vapi.sflow_sampling_rate_set(sampling_N=1)
+        ret = self.vapi.sflow_sampling_rate_get()
+        self.assertEqual(ret.sampling_N, 1)
+
+        ## TEST: sflow_drop_monitoring()
+        self.vapi.sflow_drop_monitoring_set(drop_M=1)
+        ret = self.vapi.sflow_drop_monitoring_get()
+        self.assertEqual(ret.drop_M, 1)
+
+    def create_stream(self, src_if, dst_if, count):
+        packets = []
+        for i in range(count):
+            # create packet info stored in the test case instance
+            info = self.create_packet_info(src_if, dst_if)
+            # convert the info into packet payload
+            payload = self.info_to_payload(info)
+            # create the packet itself
+            p = (
+                Ether(dst=src_if.local_mac, src=src_if.remote_mac)
+                / IP(src=src_if.remote_ip4, dst=dst_if.remote_ip4, ttl=i + 1)
+                / UDP(sport=randint(49152, 65535), dport=5678)
+                / Raw(payload)
+            )
+            # store a copy of the packet in the packet info
+            info.data = p.copy()
+            # append the packet to the list
+            packets.append(p)
+            # return the created packet list
+        return packets
+
+    def get_sflow_counter(self, counter):
+        counters = self.vapi.cli("sh errors").split("\n")
+        for i in range(1, len(counters) - 1):
+            results = counters[i].split()
+            if results[1] == "sflow":
+                if re.search(counter, counters[i]) is not None:
+                    return int(results[0])
+        return None
+
+    def verify_sflow(self, count):
+        ctr_pk_proc = "sflow packets processed"
+        ctr_pk_samp = "sflow packets sampled"
+        ctr_pk_drop = "sflow packets dropped"
+        ctr_di_proc = "sflow discards processed"
+        ctr_di_drop = "sflow discards dropped"
+        ctr_ps_sent = "sflow PSAMPLE sent"
+        ctr_ps_fail = "sflow PSAMPLE send failed"
+        ctr_dm_sent = "sflow DROPMON sent"
+        ctr_dm_fail = "sflow DROPMON send failed"
+        pk_proc = self.get_sflow_counter(ctr_pk_proc)
+        pk_samp = self.get_sflow_counter(ctr_pk_samp)
+        pk_drop = self.get_sflow_counter(ctr_pk_drop)
+        di_proc = self.get_sflow_counter(ctr_di_proc)
+        di_drop = self.get_sflow_counter(ctr_di_drop)
+        ps_sent = self.get_sflow_counter(ctr_ps_sent)
+        ps_fail = self.get_sflow_counter(ctr_ps_fail)
+        dm_sent = self.get_sflow_counter(ctr_dm_sent)
+        dm_fail = self.get_sflow_counter(ctr_dm_fail)
+        self.logger.info(ctr_pk_proc + "=" + str(pk_proc))
+        self.logger.info(ctr_pk_samp + "=" + str(pk_samp))
+        self.logger.info(ctr_pk_drop + "=" + str(pk_drop))
+        self.logger.info(ctr_di_proc + "=" + str(di_proc))
+        self.logger.info(ctr_di_drop + "=" + str(di_drop))
+        self.logger.info(ctr_ps_sent + "=" + str(ps_sent))
+        self.logger.info(ctr_ps_fail + "=" + str(ps_fail))
+        self.logger.info(ctr_dm_sent + "=" + str(dm_sent))
+        self.logger.info(ctr_dm_fail + "=" + str(dm_fail))
+        self.assert_equal(pk_proc, count, ctr_pk_proc)
+        self.assert_equal(pk_samp, count, ctr_pk_samp)
+        self.assert_equal(pk_drop, None, ctr_pk_drop)
+        self.assert_equal(di_proc, 1, ctr_di_proc)
+        self.assert_equal(di_drop, None, ctr_di_drop)
+        # sending to PSAMPLE or DROPMON will fail if not
+        # running with root privileges. So do not insist
+        # on these numbers:
+        # self.assert_equal(ps_sent, count, ctr_ps_sent)
+        # self.assert_equal(ps_fail, None, ctr_ps_fail)
+        # self.assert_equal(dm_sent, 1, ctr_dm_sent)
+        # self.assert_equal(dm_fail, None, ctr_dm_fail)
+
+    def test_basic(self):
+        self.enable_sflow_via_api()
+        count = 7
+        # create the packet stream, with ttl decrementing so
+        # that just 1 packet will be dropped
+        packets = self.create_stream(self.pg0, self.pg1, count)
+        # add the stream to the source interface
+        self.pg0.add_stream(packets)
+        # enable capture on both interfaces
+        self.pg0.enable_capture()
+        self.pg1.enable_capture()
+        # start the packet generator
+        self.pg_start()
+        # get capture - the proper count of packets was saved by
+        # create_packet_info() based on dst_if parameter
+        capture = self.pg1.get_capture(count - 1, timeout=2)
+        # expect an ICMP TTL exceeded message back on pg0
+        # and a dropped packet that sflow will write to DROPMON
+        capture0 = self.pg0.get_capture(1, timeout=2)
+        # allow time for the dropped packet to be fully
+        # processed, and for the counters to be updated
+        self.sleep(1.0)
+        # verify sflow counters
+        self.verify_sflow(count)
diff --git a/test/test_sflow_egress.py b/test/test_sflow_egress.py
new file mode 100644 (file)
index 0000000..4e2613a
--- /dev/null
@@ -0,0 +1,145 @@
+#!/usr/bin/env python3
+
+import unittest
+from framework import VppTestCase
+from asfframework import VppTestRunner
+from scapy.layers.l2 import Ether
+from scapy.packet import Raw
+from scapy.layers.inet import IP, UDP
+from random import randint
+import re  # for finding counters in "sh errors" output
+
+
+class SFlowEgressTestCase(VppTestCase):
+    """sFlow test case"""
+
+    @classmethod
+    def setUpClass(self):
+        super(SFlowEgressTestCase, self).setUpClass()
+
+    @classmethod
+    def teadDownClass(cls):
+        super(SFlowEgressTestCase, cls).tearDownClass()
+
+    def setUp(self):
+        self.create_pg_interfaces(range(2))  #  create pg0 and pg1
+        for i in self.pg_interfaces:
+            i.admin_up()  # put the interface up
+            i.config_ip4()  # configure IPv4 address on the interface
+            i.resolve_arp()  # resolve ARP, so that we know VPP MAC
+
+    def tearDown(self):
+        for i in self.pg_interfaces:
+            i.admin_down()
+            i.unconfig()
+            i.set_table_ip4(0)
+            i.set_table_ip6(0)
+
+    def enable_sflow_via_api(self):
+        ## TEST: Enable both interfaces
+        ret = self.vapi.sflow_enable_disable(hw_if_index=1, enable_disable=True)
+        self.assertEqual(ret.retval, 0)
+        ret = self.vapi.sflow_enable_disable(hw_if_index=2, enable_disable=True)
+        self.assertEqual(ret.retval, 0)
+
+        ## TEST: sflow_sampling_rate_set()
+        self.vapi.sflow_sampling_rate_set(sampling_N=1)
+        ret = self.vapi.sflow_sampling_rate_get()
+        self.assert_equal(ret.sampling_N, 1)
+
+        ## TEST: sflow_direction_set()
+        self.vapi.sflow_direction_set(sampling_D=3)
+        ret = self.vapi.sflow_direction_get()
+        self.assert_equal(ret.sampling_D, 3)
+
+    def create_stream(self, src_if, dst_if, count):
+        packets = []
+        for i in range(count):
+            # create packet info stored in the test case instance
+            info = self.create_packet_info(src_if, dst_if)
+            # convert the info into packet payload
+            payload = self.info_to_payload(info)
+            # create the packet itself
+            p = (
+                Ether(dst=src_if.local_mac, src=src_if.remote_mac)
+                / IP(src=src_if.remote_ip4, dst=dst_if.remote_ip4)
+                / UDP(sport=randint(49152, 65535), dport=5678)
+                / Raw(payload)
+            )
+            # store a copy of the packet in the packet info
+            info.data = p.copy()
+            # append the packet to the list
+            packets.append(p)
+            # return the created packet list
+        return packets
+
+    def get_sflow_counter(self, counter):
+        counters = self.vapi.cli("sh errors").split("\n")
+        for i in range(1, len(counters) - 1):
+            results = counters[i].split()
+            if results[1] == "sflow":
+                if re.search(counter, counters[i]) is not None:
+                    return int(results[0])
+        return None
+
+    def verify_sflow(self, count):
+        ctr_pk_proc = "sflow packets processed"
+        ctr_pk_samp = "sflow packets sampled"
+        ctr_pk_drop = "sflow packets dropped"
+        ctr_di_proc = "sflow discards processed"
+        ctr_di_drop = "sflow discards dropped"
+        ctr_ps_sent = "sflow PSAMPLE sent"
+        ctr_ps_fail = "sflow PSAMPLE send failed"
+        ctr_dm_sent = "sflow DROPMON sent"
+        ctr_dm_fail = "sflow DROPMON send failed"
+        pk_proc = self.get_sflow_counter(ctr_pk_proc)
+        pk_samp = self.get_sflow_counter(ctr_pk_samp)
+        pk_drop = self.get_sflow_counter(ctr_pk_drop)
+        di_proc = self.get_sflow_counter(ctr_di_proc)
+        di_drop = self.get_sflow_counter(ctr_di_drop)
+        ps_sent = self.get_sflow_counter(ctr_ps_sent)
+        ps_fail = self.get_sflow_counter(ctr_ps_fail)
+        dm_sent = self.get_sflow_counter(ctr_dm_sent)
+        dm_fail = self.get_sflow_counter(ctr_dm_fail)
+        self.logger.info(ctr_pk_proc + "=" + str(pk_proc))
+        self.logger.info(ctr_pk_samp + "=" + str(pk_samp))
+        self.logger.info(ctr_pk_drop + "=" + str(pk_drop))
+        self.logger.info(ctr_di_proc + "=" + str(di_proc))
+        self.logger.info(ctr_di_drop + "=" + str(di_drop))
+        self.logger.info(ctr_ps_sent + "=" + str(ps_sent))
+        self.logger.info(ctr_ps_fail + "=" + str(ps_fail))
+        self.logger.info(ctr_dm_sent + "=" + str(dm_sent))
+        self.logger.info(ctr_dm_fail + "=" + str(dm_fail))
+        self.assert_equal(pk_proc, count * 2, ctr_pk_proc)
+        self.assert_equal(pk_samp, count * 2, ctr_pk_samp)
+        self.assert_equal(pk_drop, None, ctr_pk_drop)
+        self.assert_equal(di_proc, None, ctr_di_proc)
+        self.assert_equal(di_drop, None, ctr_di_drop)
+        # sending to PSAMPLE or DROPMON will fail if not
+        # running with root privileges. So do not insist
+        # on these numbers:
+        # self.assert_equal(ps_sent, count * 2, ctr_ps_sent)
+        # self.assert_equal(ps_fail, None, ctr_ps_fail)
+        # self.assert_equal(dm_sent, None, ctr_dm_sent)
+        # self.assert_equal(dm_fail, None, ctr_dm_fail)
+
+    def test_basic(self):
+        self.enable_sflow_via_api()
+        count = 7
+        # create the packet stream, with ttl decrementing so
+        # that just 1 packet will be dropped
+        packets = self.create_stream(self.pg0, self.pg1, count)
+        # add the stream to the source interface
+        self.pg0.add_stream(packets)
+        # enable capture on both interfaces
+        self.pg0.enable_capture()
+        self.pg1.enable_capture()
+        # start the packet generator
+        self.pg_start()
+        # get capture - the proper count of packets was saved by
+        # create_packet_info() based on dst_if parameter
+        capture = self.pg1.get_capture(count, timeout=2)
+        # allow time for the counters to be updated
+        self.sleep(1.0)
+        # verify sflow counters
+        self.verify_sflow(count)