X-Git-Url: https://gerrit.fd.io/r/gitweb?a=blobdiff_plain;f=doc%2Fguides%2Fnics%2Fi40e.rst;h=cd468748f8ae539df25c5cfe26302cd7d41881f3;hb=055c52583a2794da8ba1e85a48cce3832372b12f;hp=934eb02764f28add99604958bba50dde18a08e13;hpb=97f17497d162afdb82c8704bf097f0fee3724b2e;p=deb_dpdk.git diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst index 934eb027..cd468748 100644 --- a/doc/guides/nics/i40e.rst +++ b/doc/guides/nics/i40e.rst @@ -64,6 +64,7 @@ Features of the I40E PMD are: - SR-IOV VF - Hot plug - IEEE1588/802.1AS timestamping +- VF Daemon (VFD) - EXPERIMENTAL Prerequisites @@ -77,6 +78,8 @@ Prerequisites - To get better performance on Intel platforms, please follow the "How to get best performance with NICs on Intel platforms" section of the :ref:`Getting Started Guide for Linux `. +- Upgrade the NVM/FW version following the `Intel® Ethernet NVM Update Tool Quick Usage Guide for Linux + `_ if needed. Pre-Installation Configuration ------------------------------ @@ -104,11 +107,6 @@ Please note that enabling debugging options may affect system performance. Toggle the use of Vector PMD instead of normal RX/TX path. To enable vPMD for RX, bulk allocation for Rx must be allowed. -- ``CONFIG_RTE_LIBRTE_I40E_RX_OLFLAGS_ENABLE`` (default ``y``) - - Toggle to enable RX ``olflags``. - This is only meaningful when Vector PMD is used. - - ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` (default ``n``) Toggle to use a 16-byte RX descriptor, by default the RX descriptor is 32 byte. @@ -130,82 +128,15 @@ Please note that enabling debugging options may affect system performance. Interrupt Throttling interval. -Driver Compilation -~~~~~~~~~~~~~~~~~~ - -To compile the I40E PMD see :ref:`Getting Started Guide for Linux ` or -:ref:`Getting Started Guide for FreeBSD ` depending on your platform. - - -Linux ------ - - -Running testpmd -~~~~~~~~~~~~~~~ - -This section demonstrates how to launch ``testpmd`` with Intel XL710/X710 -devices managed by ``librte_pmd_i40e`` in the Linux operating system. - -#. Load ``igb_uio`` or ``vfio-pci`` driver: - - .. code-block:: console - - modprobe uio - insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko - - or - - .. code-block:: console - - modprobe vfio-pci - -#. Bind the XL710/X710 adapters to ``igb_uio`` or ``vfio-pci`` loaded in the previous step: - - .. code-block:: console - - ./tools/dpdk_nic_bind.py --bind igb_uio 0000:83:00.0 - - Or setup VFIO permissions for regular users and then bind to ``vfio-pci``: - - .. code-block:: console - - ./tools/dpdk_nic_bind.py --bind vfio-pci 0000:83:00.0 - -#. Start ``testpmd`` with basic parameters: - - .. code-block:: console - - ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4 -w 83:00.0 -- -i - - Example output: - - .. code-block:: console - - ... - EAL: PCI device 0000:83:00.0 on NUMA socket 1 - EAL: probe driver: 8086:1572 rte_i40e_pmd - EAL: PCI memory mapped at 0x7f7f80000000 - EAL: PCI memory mapped at 0x7f7f80800000 - PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 8000208a - Interactive-mode selected - Configuring Port 0 (socket 0) - ... - - PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are - satisfied.Rx Burst Bulk Alloc function will be used on port=0, queue=0. - - ... - Port 0: 68:05:CA:26:85:84 - Checking link statuses... - Port 0 Link Up - speed 10000 Mbps - full-duplex - Done +Driver compilation and testing +------------------------------ - testpmd> +Refer to the document :ref:`compiling and testing a PMD for a NIC ` +for details. SR-IOV: Prerequisites and sample Application Notes -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-------------------------------------------------- #. Load the kernel module: @@ -254,6 +185,37 @@ SR-IOV: Prerequisites and sample Application Notes #. Assign VF to VM, and bring up the VM. Please see the documentation for the *I40E/IXGBE/IGB Virtual Function Driver*. +#. Running testpmd: + + Follow instructions available in the document + :ref:`compiling and testing a PMD for a NIC ` + to run testpmd. + + Example output: + + .. code-block:: console + + ... + EAL: PCI device 0000:83:00.0 on NUMA socket 1 + EAL: probe driver: 8086:1572 rte_i40e_pmd + EAL: PCI memory mapped at 0x7f7f80000000 + EAL: PCI memory mapped at 0x7f7f80800000 + PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 8000208a + Interactive-mode selected + Configuring Port 0 (socket 0) + ... + + PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are + satisfied.Rx Burst Bulk Alloc function will be used on port=0, queue=0. + + ... + Port 0: 68:05:CA:26:85:84 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + + testpmd> + Sample Application Notes ------------------------ @@ -267,7 +229,7 @@ To start ``testpmd``, and add vlan 10 to port 0: .. code-block:: console - ./app/testpmd -c ffff -n 4 -- -i --forward-mode=mac + ./app/testpmd -l 0-15 -n 4 -- -i --forward-mode=mac ... testpmd> set promisc 0 off @@ -302,7 +264,7 @@ Start ``testpmd`` with ``--disable-rss`` and ``--pkt-filter-mode=perfect``: .. code-block:: console - ./app/testpmd -c ffff -n 4 -- -i --disable-rss --pkt-filter-mode=perfect \ + ./app/testpmd -l 0-15 -n 4 -- -i --disable-rss --pkt-filter-mode=perfect \ --rxq=8 --txq=8 --nb-cores=8 --nb-ports=1 Add a rule to direct ``ipv4-udp`` packet whose ``dst_ip=2.2.2.5, src_ip=2.2.2.3, src_port=32, dst_port=32`` to queue 1: @@ -366,3 +328,236 @@ Delete all flow director rules on a port: testpmd> flush_flow_director 0 +Floating VEB +~~~~~~~~~~~~~ + +The Intel® Ethernet Controller X710 and XL710 Family support a feature called +"Floating VEB". + +A Virtual Ethernet Bridge (VEB) is an IEEE Edge Virtual Bridging (EVB) term +for functionality that allows local switching between virtual endpoints within +a physical endpoint and also with an external bridge/network. + +A "Floating" VEB doesn't have an uplink connection to the outside world so all +switching is done internally and remains within the host. As such, this +feature provides security benefits. + +In addition, a Floating VEB overcomes a limitation of normal VEBs where they +cannot forward packets when the physical link is down. Floating VEBs don't need +to connect to the NIC port so they can still forward traffic from VF to VF +even when the physical link is down. + +Therefore, with this feature enabled VFs can be limited to communicating with +each other but not an outside network, and they can do so even when there is +no physical uplink on the associated NIC port. + +To enable this feature, the user should pass a ``devargs`` parameter to the +EAL, for example:: + + -w 84:00.0,enable_floating_veb=1 + +In this configuration the PMD will use the floating VEB feature for all the +VFs created by this PF device. + +Alternatively, the user can specify which VFs need to connect to this floating +VEB using the ``floating_veb_list`` argument:: + + -w 84:00.0,enable_floating_veb=1,floating_veb_list=1;3-4 + +In this example ``VF1``, ``VF3`` and ``VF4`` connect to the floating VEB, +while other VFs connect to the normal VEB. + +The current implementation only supports one floating VEB and one regular +VEB. VFs can connect to a floating VEB or a regular VEB according to the +configuration passed on the EAL command line. + +The floating VEB functionality requires a NIC firmware version of 5.0 +or greater. + + +Limitations or Known issues +--------------------------- + +MPLS packet classification on X710/XL710 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For firmware versions prior to 5.0, MPLS packets are not recognized by the NIC. +The L2 Payload flow type in flow director can be used to classify MPLS packet +by using a command in testpmd like: + + testpmd> flow_director_filter 0 mode IP add flow l2_payload ether \ + 0x8847 flexbytes () fwd pf queue fd_id + +With the NIC firmware version 5.0 or greater, some limited MPLS support +is added: Native MPLS (MPLS in Ethernet) skip is implemented, while no +new packet type, no classification or offload are possible. With this change, +L2 Payload flow type in flow director cannot be used to classify MPLS packet +as with previous firmware versions. Meanwhile, the Ethertype filter can be +used to classify MPLS packet by using a command in testpmd like: + + testpmd> ethertype_filter 0 add mac_ignr 00:00:00:00:00:00 ethertype \ + 0x8847 fwd queue + +16 Byte RX Descriptor setting on DPDK VF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Currently the VF's RX descriptor mode is decided by PF. There's no PF-VF +interface for VF to request the RX descriptor mode, also no interface to notify +VF its own RX descriptor mode. +For all available versions of the i40e driver, these drivers don't support 16 +byte RX descriptor. If the Linux i40e kernel driver is used as host driver, +while DPDK i40e PMD is used as the VF driver, DPDK cannot choose 16 byte receive +descriptor. The reason is that the RX descriptor is already set to 32 byte by +the i40e kernel driver. That is to say, user should keep +``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n`` in config file. +In the future, if the Linux i40e driver supports 16 byte RX descriptor, user +should make sure the DPDK VF uses the same RX descriptor mode, 16 byte or 32 +byte, as the PF driver. + +The same rule for DPDK PF + DPDK VF. The PF and VF should use the same RX +descriptor mode. Or the VF RX will not work. + +Receive packets with Ethertype 0x88A8 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Due to the FW limitation, PF can receive packets with Ethertype 0x88A8 +only when floating VEB is disabled. + +Incorrect Rx statistics when packet is oversize +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a packet is over maximum frame size, the packet is dropped. +However the Rx statistics, when calling `rte_eth_stats_get` incorrectly +shows it as received. + +VF & TC max bandwidth setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The per VF max bandwidth and per TC max bandwidth cannot be enabled in parallel. +The dehavior is different when handling per VF and per TC max bandwidth setting. +When enabling per VF max bandwidth, SW will check if per TC max bandwidth is +enabled. If so, return failure. +When enabling per TC max bandwidth, SW will check if per VF max bandwidth +is enabled. If so, disable per VF max bandwidth and continue with per TC max +bandwidth setting. + +TC TX scheduling mode setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There're 2 TX scheduling modes for TCs, round robin and strict priority mode. +If a TC is set to strict priority mode, it can consume unlimited bandwidth. +It means if APP has set the max bandwidth for that TC, it comes to no +effect. +It's suggested to set the strict priority mode for a TC that is latency +sensitive but no consuming much bandwidth. + +VF performance is impacted by PCI extended tag setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To reach maximum NIC performance in the VF the PCI extended tag must be +enabled. The DPDK I40E PF driver will set this feature during initialization, +but the kernel PF driver does not. So when running traffic on a VF which is +managed by the kernel PF driver, a significant NIC performance downgrade has +been observed (for 64 byte packets, there is about 25% linerate downgrade for +a 25G device and about 35% for a 40G device). + +For kernel version >= 4.11, the kernel's PCI driver will enable the extended +tag if it detects that the device supports it. So by default, this is not an +issue. For kernels <= 4.11 or when the PCI extended tag is disabled it can be +enabled using the steps below. + +#. Get the current value of the PCI configure register:: + + setpci -s a8.w + +#. Set bit 8:: + + value = value | 0x100 + +#. Set the PCI configure register with new value:: + + setpci -s a8.w= + +Vlan strip of VF +~~~~~~~~~~~~~~~~ + +The VF vlan strip function is only supported in the i40e kernel driver >= 2.1.26. + +High Performance of Small Packets on 40G NIC +-------------------------------------------- + +As there might be firmware fixes for performance enhancement in latest version +of firmware image, the firmware update might be needed for getting high performance. +Check with the local Intel's Network Division application engineers for firmware updates. +Users should consult the release notes specific to a DPDK release to identify +the validated firmware version for a NIC using the i40e driver. + +Use 16 Bytes RX Descriptor Size +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As i40e PMD supports both 16 and 32 bytes RX descriptor sizes, and 16 bytes size can provide helps to high performance of small packets. +Configuration of ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` in config files can be changed to use 16 bytes size RX descriptors. + +High Performance and per Packet Latency Tradeoff +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Due to the hardware design, the interrupt signal inside NIC is needed for per +packet descriptor write-back. The minimum interval of interrupts could be set +at compile time by ``CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL`` in configuration files. +Though there is a default configuration, the interval could be tuned by the +users with that configuration item depends on what the user cares about more, +performance or per packet latency. + +Example of getting best performance with l3fwd example +------------------------------------------------------ + +The following is an example of running the DPDK ``l3fwd`` sample application to get high performance with an +Intel server platform and Intel XL710 NICs. + +The example scenario is to get best performance with two Intel XL710 40GbE ports. +See :numref:`figure_intel_perf_test_setup` for the performance test setup. + +.. _figure_intel_perf_test_setup: + +.. figure:: img/intel_perf_test_setup.* + + Performance Test Setup + + +1. Add two Intel XL710 NICs to the platform, and use one port per card to get best performance. + The reason for using two NICs is to overcome a PCIe Gen3's limitation since it cannot provide 80G bandwidth + for two 40G ports, but two different PCIe Gen3 x8 slot can. + Refer to the sample NICs output above, then we can select ``82:00.0`` and ``85:00.0`` as test ports:: + + 82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + 85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + +2. Connect the ports to the traffic generator. For high speed testing, it's best to use a hardware traffic generator. + +3. Check the PCI devices numa node (socket id) and get the cores number on the exact socket id. + In this case, ``82:00.0`` and ``85:00.0`` are both in socket 1, and the cores on socket 1 in the referenced platform + are 18-35 and 54-71. + Note: Don't use 2 logical cores on the same core (e.g core18 has 2 logical cores, core18 and core54), instead, use 2 logical + cores from different cores (e.g core18 and core19). + +4. Bind these two ports to igb_uio. + +5. As to XL710 40G port, we need at least two queue pairs to achieve best performance, then two queues per port + will be required, and each queue pair will need a dedicated CPU core for receiving/transmitting packets. + +6. The DPDK sample application ``l3fwd`` will be used for performance testing, with using two ports for bi-directional forwarding. + Compile the ``l3fwd sample`` with the default lpm mode. + +7. The command line of running l3fwd would be something like the following:: + + ./l3fwd -l 18-21 -n 4 -w 82:00.0 -w 85:00.0 \ + -- -p 0x3 --config '(0,0,18),(0,1,19),(1,0,20),(1,1,21)' + + This means that the application uses core 18 for port 0, queue pair 0 forwarding, core 19 for port 0, queue pair 1 forwarding, + core 20 for port 1, queue pair 0 forwarding, and core 21 for port 1, queue pair 1 forwarding. + +8. Configure the traffic at a traffic generator. + + * Start creating a stream on packet generator. + + * Set the Ethernet II type to 0x0800.