New upstream version 17.11-rc3

[deb_dpdk.git] / doc / guides / nics / i40e.rst
diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst

index 934eb02..cd46874 100644 (file)
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -64,6 +64,7 @@ Features of the I40E PMD are:
  - SR-IOV VF
  - Hot plug
  - IEEE1588/802.1AS timestamping
+- VF Daemon (VFD) - EXPERIMENTAL
  
  
  Prerequisites
@@ -77,6 +78,8 @@ Prerequisites
  - To get better performance on Intel platforms, please follow the "How to get best performance with NICs on Intel platforms"
    section of the :ref:`Getting Started Guide for Linux <linux_gsg>`.
  
+- Upgrade the NVM/FW version following the `Intel® Ethernet NVM Update Tool Quick Usage Guide for Linux
+  <https://www-ssl.intel.com/content/www/us/en/embedded/products/networking/nvm-update-tool-quick-linux-usage-guide.html>`_ if needed.
  
  Pre-Installation Configuration
  ------------------------------
@@ -104,11 +107,6 @@ Please note that enabling debugging options may affect system performance.
    Toggle the use of Vector PMD instead of normal RX/TX path.
    To enable vPMD for RX, bulk allocation for Rx must be allowed.
  
-- ``CONFIG_RTE_LIBRTE_I40E_RX_OLFLAGS_ENABLE`` (default ``y``)
-
-  Toggle to enable RX ``olflags``.
-  This is only meaningful when Vector PMD is used.
-
  - ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` (default ``n``)
  
    Toggle to use a 16-byte RX descriptor, by default the RX descriptor is 32 byte.
@@ -130,82 +128,15 @@ Please note that enabling debugging options may affect system performance.
    Interrupt Throttling interval.
  
  
-Driver Compilation
-~~~~~~~~~~~~~~~~~~
-
-To compile the I40E PMD see :ref:`Getting Started Guide for Linux <linux_gsg>` or
-:ref:`Getting Started Guide for FreeBSD <freebsd_gsg>` depending on your platform.
-
-
-Linux
------
-
-
-Running testpmd
-~~~~~~~~~~~~~~~
-
-This section demonstrates how to launch ``testpmd`` with Intel XL710/X710
-devices managed by ``librte_pmd_i40e`` in the Linux operating system.
-
-#. Load ``igb_uio`` or ``vfio-pci`` driver:
-
-   .. code-block:: console
-
-      modprobe uio
-      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
-
-   or
-
-   .. code-block:: console
-
-      modprobe vfio-pci
-
-#. Bind the XL710/X710 adapters to ``igb_uio`` or ``vfio-pci`` loaded in the previous step:
-
-   .. code-block:: console
-
-      ./tools/dpdk_nic_bind.py --bind igb_uio 0000:83:00.0
-
-   Or setup VFIO permissions for regular users and then bind to ``vfio-pci``:
-
-   .. code-block:: console
-
-      ./tools/dpdk_nic_bind.py --bind vfio-pci 0000:83:00.0
-
-#. Start ``testpmd`` with basic parameters:
-
-   .. code-block:: console
-
-      ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4 -w 83:00.0 -- -i
-
-   Example output:
-
-   .. code-block:: console
-
-      ...
-      EAL: PCI device 0000:83:00.0 on NUMA socket 1
-      EAL: probe driver: 8086:1572 rte_i40e_pmd
-      EAL: PCI memory mapped at 0x7f7f80000000
-      EAL: PCI memory mapped at 0x7f7f80800000
-      PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 8000208a
-      Interactive-mode selected
-      Configuring Port 0 (socket 0)
-      ...
-
-      PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
-      satisfied.Rx Burst Bulk Alloc function will be used on port=0, queue=0.
-
-      ...
-      Port 0: 68:05:CA:26:85:84
-      Checking link statuses...
-      Port 0 Link Up - speed 10000 Mbps - full-duplex
-      Done
+Driver compilation and testing
+------------------------------
  
-      testpmd>
+Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
+for details.
  
  
  SR-IOV: Prerequisites and sample Application Notes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+--------------------------------------------------
  
  #. Load the kernel module:
  
@@ -254,6 +185,37 @@ SR-IOV: Prerequisites and sample Application Notes
  #. Assign VF to VM, and bring up the VM.
     Please see the documentation for the *I40E/IXGBE/IGB Virtual Function Driver*.
  
+#. Running testpmd:
+
+   Follow instructions available in the document
+   :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
+   to run testpmd.
+
+   Example output:
+
+   .. code-block:: console
+
+      ...
+      EAL: PCI device 0000:83:00.0 on NUMA socket 1
+      EAL: probe driver: 8086:1572 rte_i40e_pmd
+      EAL: PCI memory mapped at 0x7f7f80000000
+      EAL: PCI memory mapped at 0x7f7f80800000
+      PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 8000208a
+      Interactive-mode selected
+      Configuring Port 0 (socket 0)
+      ...
+
+      PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
+      satisfied.Rx Burst Bulk Alloc function will be used on port=0, queue=0.
+
+      ...
+      Port 0: 68:05:CA:26:85:84
+      Checking link statuses...
+      Port 0 Link Up - speed 10000 Mbps - full-duplex
+      Done
+
+      testpmd>
+
  
  Sample Application Notes
  ------------------------
@@ -267,7 +229,7 @@ To start ``testpmd``, and add vlan 10 to port 0:
  
  .. code-block:: console
  
-    ./app/testpmd -c ffff -n 4 -- -i --forward-mode=mac
+    ./app/testpmd -l 0-15 -n 4 -- -i --forward-mode=mac
      ...
  
      testpmd> set promisc 0 off
@@ -302,7 +264,7 @@ Start ``testpmd`` with ``--disable-rss`` and ``--pkt-filter-mode=perfect``:
  
  .. code-block:: console
  
-   ./app/testpmd -c ffff -n 4 -- -i --disable-rss --pkt-filter-mode=perfect \
+   ./app/testpmd -l 0-15 -n 4 -- -i --disable-rss --pkt-filter-mode=perfect \
                   --rxq=8 --txq=8 --nb-cores=8 --nb-ports=1
  
  Add a rule to direct ``ipv4-udp`` packet whose ``dst_ip=2.2.2.5, src_ip=2.2.2.3, src_port=32, dst_port=32`` to queue 1:
@@ -366,3 +328,236 @@ Delete all flow director rules on a port:
  
     testpmd> flush_flow_director 0
  
+Floating VEB
+~~~~~~~~~~~~~
+
+The Intel® Ethernet Controller X710 and XL710 Family support a feature called
+"Floating VEB".
+
+A Virtual Ethernet Bridge (VEB) is an IEEE Edge Virtual Bridging (EVB) term
+for functionality that allows local switching between virtual endpoints within
+a physical endpoint and also with an external bridge/network.
+
+A "Floating" VEB doesn't have an uplink connection to the outside world so all
+switching is done internally and remains within the host. As such, this
+feature provides security benefits.
+
+In addition, a Floating VEB overcomes a limitation of normal VEBs where they
+cannot forward packets when the physical link is down. Floating VEBs don't need
+to connect to the NIC port so they can still forward traffic from VF to VF
+even when the physical link is down.
+
+Therefore, with this feature enabled VFs can be limited to communicating with
+each other but not an outside network, and they can do so even when there is
+no physical uplink on the associated NIC port.
+
+To enable this feature, the user should pass a ``devargs`` parameter to the
+EAL, for example::
+
+    -w 84:00.0,enable_floating_veb=1
+
+In this configuration the PMD will use the floating VEB feature for all the
+VFs created by this PF device.
+
+Alternatively, the user can specify which VFs need to connect to this floating
+VEB using the ``floating_veb_list`` argument::
+
+    -w 84:00.0,enable_floating_veb=1,floating_veb_list=1;3-4
+
+In this example ``VF1``, ``VF3`` and ``VF4`` connect to the floating VEB,
+while other VFs connect to the normal VEB.
+
+The current implementation only supports one floating VEB and one regular
+VEB. VFs can connect to a floating VEB or a regular VEB according to the
+configuration passed on the EAL command line.
+
+The floating VEB functionality requires a NIC firmware version of 5.0
+or greater.
+
+
+Limitations or Known issues
+---------------------------
+
+MPLS packet classification on X710/XL710
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For firmware versions prior to 5.0, MPLS packets are not recognized by the NIC.
+The L2 Payload flow type in flow director can be used to classify MPLS packet
+by using a command in testpmd like:
+
+   testpmd> flow_director_filter 0 mode IP add flow l2_payload ether \
+            0x8847 flexbytes () fwd pf queue <N> fd_id <M>
+
+With the NIC firmware version 5.0 or greater, some limited MPLS support
+is added: Native MPLS (MPLS in Ethernet) skip is implemented, while no
+new packet type, no classification or offload are possible. With this change,
+L2 Payload flow type in flow director cannot be used to classify MPLS packet
+as with previous firmware versions. Meanwhile, the Ethertype filter can be
+used to classify MPLS packet by using a command in testpmd like:
+
+   testpmd> ethertype_filter 0 add mac_ignr 00:00:00:00:00:00 ethertype \
+            0x8847 fwd queue <M>
+
+16 Byte RX Descriptor setting on DPDK VF
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently the VF's RX descriptor mode is decided by PF. There's no PF-VF
+interface for VF to request the RX descriptor mode, also no interface to notify
+VF its own RX descriptor mode.
+For all available versions of the i40e driver, these drivers don't support 16
+byte RX descriptor. If the Linux i40e kernel driver is used as host driver,
+while DPDK i40e PMD is used as the VF driver, DPDK cannot choose 16 byte receive
+descriptor. The reason is that the RX descriptor is already set to 32 byte by
+the i40e kernel driver. That is to say, user should keep
+``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n`` in config file.
+In the future, if the Linux i40e driver supports 16 byte RX descriptor, user
+should make sure the DPDK VF uses the same RX descriptor mode, 16 byte or 32
+byte, as the PF driver.
+
+The same rule for DPDK PF + DPDK VF. The PF and VF should use the same RX
+descriptor mode. Or the VF RX will not work.
+
+Receive packets with Ethertype 0x88A8
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Due to the FW limitation, PF can receive packets with Ethertype 0x88A8
+only when floating VEB is disabled.
+
+Incorrect Rx statistics when packet is oversize
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a packet is over maximum frame size, the packet is dropped.
+However the Rx statistics, when calling `rte_eth_stats_get` incorrectly
+shows it as received.
+
+VF & TC max bandwidth setting
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The per VF max bandwidth and per TC max bandwidth cannot be enabled in parallel.
+The dehavior is different when handling per VF and per TC max bandwidth setting.
+When enabling per VF max bandwidth, SW will check if per TC max bandwidth is
+enabled. If so, return failure.
+When enabling per TC max bandwidth, SW will check if per VF max bandwidth
+is enabled. If so, disable per VF max bandwidth and continue with per TC max
+bandwidth setting.
+
+TC TX scheduling mode setting
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There're 2 TX scheduling modes for TCs, round robin and strict priority mode.
+If a TC is set to strict priority mode, it can consume unlimited bandwidth.
+It means if APP has set the max bandwidth for that TC, it comes to no
+effect.
+It's suggested to set the strict priority mode for a TC that is latency
+sensitive but no consuming much bandwidth.
+
+VF performance is impacted by PCI extended tag setting
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To reach maximum NIC performance in the VF the PCI extended tag must be
+enabled. The DPDK I40E PF driver will set this feature during initialization,
+but the kernel PF driver does not. So when running traffic on a VF which is
+managed by the kernel PF driver, a significant NIC performance downgrade has
+been observed (for 64 byte packets, there is about 25% linerate downgrade for
+a 25G device and about 35% for a 40G device).
+
+For kernel version >= 4.11, the kernel's PCI driver will enable the extended
+tag if it detects that the device supports it. So by default, this is not an
+issue. For kernels <= 4.11 or when the PCI extended tag is disabled it can be
+enabled using the steps below.
+
+#. Get the current value of the PCI configure register::
+
+      setpci -s <XX:XX.X> a8.w
+
+#. Set bit 8::
+
+      value = value | 0x100
+
+#. Set the PCI configure register with new value::
+
+      setpci -s <XX:XX.X> a8.w=<value>
+
+Vlan strip of VF
+~~~~~~~~~~~~~~~~
+
+The VF vlan strip function is only supported in the i40e kernel driver >= 2.1.26.
+
+High Performance of Small Packets on 40G NIC
+--------------------------------------------
+
+As there might be firmware fixes for performance enhancement in latest version
+of firmware image, the firmware update might be needed for getting high performance.
+Check with the local Intel's Network Division application engineers for firmware updates.
+Users should consult the release notes specific to a DPDK release to identify
+the validated firmware version for a NIC using the i40e driver.
+
+Use 16 Bytes RX Descriptor Size
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As i40e PMD supports both 16 and 32 bytes RX descriptor sizes, and 16 bytes size can provide helps to high performance of small packets.
+Configuration of ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` in config files can be changed to use 16 bytes size RX descriptors.
+
+High Performance and per Packet Latency Tradeoff
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Due to the hardware design, the interrupt signal inside NIC is needed for per
+packet descriptor write-back. The minimum interval of interrupts could be set
+at compile time by ``CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL`` in configuration files.
+Though there is a default configuration, the interval could be tuned by the
+users with that configuration item depends on what the user cares about more,
+performance or per packet latency.
+
+Example of getting best performance with l3fwd example
+------------------------------------------------------
+
+The following is an example of running the DPDK ``l3fwd`` sample application to get high performance with an
+Intel server platform and Intel XL710 NICs.
+
+The example scenario is to get best performance with two Intel XL710 40GbE ports.
+See :numref:`figure_intel_perf_test_setup` for the performance test setup.
+
+.. _figure_intel_perf_test_setup:
+
+.. figure:: img/intel_perf_test_setup.*
+
+   Performance Test Setup
+
+
+1. Add two Intel XL710 NICs to the platform, and use one port per card to get best performance.
+   The reason for using two NICs is to overcome a PCIe Gen3's limitation since it cannot provide 80G bandwidth
+   for two 40G ports, but two different PCIe Gen3 x8 slot can.
+   Refer to the sample NICs output above, then we can select ``82:00.0`` and ``85:00.0`` as test ports::
+
+      82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
+      85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
+
+2. Connect the ports to the traffic generator. For high speed testing, it's best to use a hardware traffic generator.
+
+3. Check the PCI devices numa node (socket id) and get the cores number on the exact socket id.
+   In this case, ``82:00.0`` and ``85:00.0`` are both in socket 1, and the cores on socket 1 in the referenced platform
+   are 18-35 and 54-71.
+   Note: Don't use 2 logical cores on the same core (e.g core18 has 2 logical cores, core18 and core54), instead, use 2 logical
+   cores from different cores (e.g core18 and core19).
+
+4. Bind these two ports to igb_uio.
+
+5. As to XL710 40G port, we need at least two queue pairs to achieve best performance, then two queues per port
+   will be required, and each queue pair will need a dedicated CPU core for receiving/transmitting packets.
+
+6. The DPDK sample application ``l3fwd`` will be used for performance testing, with using two ports for bi-directional forwarding.
+   Compile the ``l3fwd sample`` with the default lpm mode.
+
+7. The command line of running l3fwd would be something like the following::
+
+      ./l3fwd -l 18-21 -n 4 -w 82:00.0 -w 85:00.0 \
+              -- -p 0x3 --config '(0,0,18),(0,1,19),(1,0,20),(1,1,21)'
+
+   This means that the application uses core 18 for port 0, queue pair 0 forwarding, core 19 for port 0, queue pair 1 forwarding,
+   core 20 for port 1, queue pair 0 forwarding, and core 21 for port 1, queue pair 1 forwarding.
+
+8. Configure the traffic at a traffic generator.
+
+   * Start creating a stream on packet generator.
+
+   * Set the Ethernet II type to 0x0800.