X-Git-Url: https://gerrit.fd.io/r/gitweb?p=csit.git;a=blobdiff_plain;f=docs%2Freport%2Fvpp_performance_tests%2Foverview.rst;h=8692b8bf7aef8b13eb1f6398c60d5881e870252b;hp=56ffda03df619fd3cbc3aaa6e5172d931199ebde;hb=b6d7373834eb3fa6e8543be9c8379507d6d91273;hpb=01f9e5ccff93c600c793b78f4b8957289ad3359f diff --git a/docs/report/vpp_performance_tests/overview.rst b/docs/report/vpp_performance_tests/overview.rst index 56ffda03df..8692b8bf7a 100644 --- a/docs/report/vpp_performance_tests/overview.rst +++ b/docs/report/vpp_performance_tests/overview.rst @@ -1,13 +1,14 @@ Overview ======== +.. _tested_physical_topologies: + Tested Physical Topologies -------------------------- CSIT VPP performance tests are executed on physical baremetal servers hosted by -LF FD.io project. Testbed physical topology is shown in the figure below. - -:: +:abbr:`LF (Linux Foundation)` FD.io project. Testbed physical topology is shown +in the figure below.:: +------------------------+ +------------------------+ | | | | @@ -52,20 +53,19 @@ Going forward CSIT project will be looking to add more hardware into FD.io performance labs to address larger scale multi-interface and multi-NIC performance testing scenarios. -For test cases that require DUT (VPP) to communicate with VM(s) over vhost-user -interfaces, N of VM instances are created on SUT1 and SUT2. For N=1 DUT (VPP) -forwards packets between vhostuser and physical interfaces. For N>1 DUT (VPP) a -logical service chain forwarding topology is created on DUT (VPP) by applying L2 -or IPv4/IPv6 configuration depending on the test suite. -DUT (VPP) test topology with N VM instances -is shown in the figure below including applicable packet flow thru the DUTs and -VMs (marked in the figure with ``***``). - -:: +For test cases that require DUT (VPP) to communicate with +VirtualMachines (VMs) / Containers (Linux or Docker Containers) over +vhost-user/memif interfaces, N of VM/Ctr instances are created on SUT1 +and SUT2. For N=1 DUT forwards packets between vhost/memif and physical +interfaces. For N>1 DUT a logical service chain forwarding topology is +created on DUT by applying L2 or IPv4/IPv6 configuration depending on +the test suite. DUT test topology with N VM/Ctr instances is shown in +the figure below including applicable packet flow thru the DUTs and +VMs/Ctrs (marked in the figure with ``***``).:: +-------------------------+ +-------------------------+ | +---------+ +---------+ | | +---------+ +---------+ | - | | VM[1] | | VM[N] | | | | VM[1] | | VM[N] | | + | |VM/Ctr[1]| |VM/Ctr[N]| | | |VM/Ctr[1]| |VM/Ctr[N]| | | | ***** | | ***** | | | | ***** | | ***** | | | +--^---^--+ +--^---^--+ | | +--^---^--+ +--^---^--+ | | *| |* *| |* | | *| |* *| |* | @@ -85,34 +85,41 @@ VMs (marked in the figure with ``***``). **********************| |********************** +-----------+ -For VM tests, packets are switched by DUT (VPP) multiple times: twice for a -single VM, three times for two VMs, N+1 times for N VMs. -Hence the external -throughput rates measured by TG and listed in this report must be multiplied -by (N+1) to represent the actual DUT aggregate packet forwarding rate. - -Note that reported VPP performance results are specific to the SUTs tested. -Current LF FD.io SUTs are based on Intel XEON E5-2699v3 2.3GHz CPUs. SUTs with -other CPUs are likely to yield different results. A good rule of thumb, that -can be applied to estimate VPP packet thoughput for Phy-to-Phy (NIC-to-NIC, -PCI-to-PCI) topology, is to expect the forwarding performance to be -proportional to CPU core frequency, assuming CPU is the only limiting factor -and all other SUT parameters equivalent to FD.io CSIT environment. The same rule -of thumb can be also applied for Phy-to-VM-to-Phy (NIC-to-VM-to-NIC) topology, -but due to much higher dependency on intensive memory operations and -sensitivity to Linux kernel scheduler settings and behaviour, this estimation -may not always yield good enough accuracy. - -For detailed LF FD.io test bed specification and physical topology please refer -to `LF FDio CSIT testbed wiki page `_. +For VM/Ctr tests, packets are switched by DUT multiple times: twice for +a single VM/Ctr, three times for two VMs/Ctrs, N+1 times for N VMs/Ctrs. +Hence the external throughput rates measured by TG and listed in this +report must be multiplied by (N+1) to represent the actual DUT aggregate +packet forwarding rate. + +Note that reported DUT (VPP) performance results are specific to the SUTs +tested. Current :abbr:`LF (Linux Foundation)` FD.io SUTs are based on Intel +XEON E5-2699v3 2.3GHz CPUs. SUTs with other CPUs are likely to yield different +results. A good rule of thumb, that can be applied to estimate VPP packet +thoughput for Phy-to-Phy (NIC-to-NIC, PCI-to-PCI) topology, is to expect +the forwarding performance to be proportional to CPU core frequency, +assuming CPU is the only limiting factor and all other SUT parameters +equivalent to FD.io CSIT environment. The same rule of thumb can be also +applied for Phy-to-VM/Ctr-to-Phy (NIC-to-VM/Ctr-to-NIC) topology, but due to +much higher dependency on intensive memory operations and sensitivity to Linux +kernel scheduler settings and behaviour, this estimation may not always yield +good enough accuracy. + +For detailed FD.io CSIT testbed specification and topology, as well as +configuration and setup of SUTs and DUTs testbeds please refer to +:ref:`test_environment`. + +Similar SUT compute node and DUT VPP settings can be arrived to in a +standalone VPP setup by using a `vpp-config configuration tool +`_ developed within the +VPP project using CSIT recommended settings and scripts. Performance Tests Coverage -------------------------- -Performance tests are split into the two main categories: +Performance tests are split into two main categories: - Throughput discovery - discovery of packet forwarding rate using binary search - in accordance to RFC2544. + in accordance to :rfc:`2544`. - NDR - discovery of Non Drop Rate packet throughput, at zero packet loss; followed by one-way packet latency measurements at 10%, 50% and 100% of @@ -133,6 +140,9 @@ CSIT |release| includes following performance test suites, listed per NIC type: VLAN tagged Ethernet frames. - **L2BD** - L2 Bridge-Domain switched-forwarding of untagged Ethernet frames with MAC learning; disabled MAC learning i.e. static MAC tests to be added. + - **L2BD Scale** - L2 Bridge-Domain switched-forwarding of untagged Ethernet + frames with MAC learning; disabled MAC learning i.e. static MAC tests to be + added with 20k, 200k and 2M FIB entries. - **IPv4** - IPv4 routed-forwarding. - **IPv6** - IPv6 routed-forwarding. - **IPv4 Scale** - IPv4 routed-forwarding with 20k, 200k and 2M FIB entries. @@ -141,14 +151,19 @@ CSIT |release| includes following performance test suites, listed per NIC type: of 2 VMs using vhost-user interfaces, with VPP forwarding modes incl. L2 Cross-Connect, L2 Bridge-Domain, VXLAN with L2BD, IPv4 routed-forwarding. - **COP** - IPv4 and IPv6 routed-forwarding with COP address security. - - **iACL** - IPv4 and IPv6 routed-forwarding with iACL address security. + - **ACL** - L2 Bridge-Domain switched-forwarding and IPv4 and IPv6 routed- + forwarding with iACL and oACL IP address, MAC address and L4 port security. - **LISP** - LISP overlay tunneling for IPv4-over-IPv4, IPv6-over-IPv4, IPv6-over-IPv6, IPv4-over-IPv6 in IPv4 and IPv6 routed-forwarding modes. - **VXLAN** - VXLAN overlay tunnelling integration with L2XC and L2BD. - **QoS Policer** - ingress packet rate measuring, marking and limiting (IPv4). - - **CGNAT** - Carrier Grade Network Address Translation tests with varying + - **NAT** - (Source) Network Address Translation tests with varying number of users and ports per user. + - **Container memif connections** - VPP memif virtual interface tests to + interconnect VPP instances with L2XC and L2BD. + - **Container Orchestrated Topologies** - Container topologies connected over + the memif virtual interface. - 2port40GE XL710 Intel @@ -184,92 +199,57 @@ CSIT |release| includes following performance test suites, listed per NIC type: Execution of performance tests takes time, especially the throughput discovery tests. Due to limited HW testbed resources available within FD.io labs hosted -by Linux Foundation, the number of tests for NICs other than X520 (a.k.a. -Niantic) has been limited to few baseline tests. Over time we expect the HW -testbed resources to grow, and will be adding complete set of performance -tests for all models of hardware to be executed regularly and(or) -continuously. +by :abbr:`LF (Linux Foundation)`, the number of tests for NICs other than X520 +(a.k.a. Niantic) has been limited to few baseline tests. CSIT team expect the +HW testbed resources to grow over time, so that complete set of performance +tests can be regularly and(or) continuously executed against all models of +hardware present in FD.io labs. Performance Tests Naming ------------------------ -CSIT |release| follows a common structured naming convention for all -performance and system functional tests, introduced in CSIT rls1701. +CSIT |release| follows a common structured naming convention for all performance +and system functional tests, introduced in CSIT |release-1|. -The naming should be intuitive for majority of the tests. Complete -description of CSIT test naming convention is provided on `CSIT test naming wiki +The naming should be intuitive for majority of the tests. Complete description +of CSIT test naming convention is provided on `CSIT test naming wiki `_. -Here few illustrative examples of the new naming usage for performance test -suites: - -#. **Physical port to physical port - a.k.a. NIC-to-NIC, Phy-to-Phy, P2P** - - - *PortNICConfig-WireEncapsulation-PacketForwardingFunction- - PacketProcessingFunction1-...-PacketProcessingFunctionN-TestType* - - *10ge2p1x520-dot1q-l2bdbasemaclrn-ndrdisc.robot* => 2 ports of 10GE on - Intel x520 NIC, dot1q tagged Ethernet, L2 bridge-domain baseline switching - with MAC learning, NDR throughput discovery. - - *10ge2p1x520-ethip4vxlan-l2bdbasemaclrn-ndrchk.robot* => 2 ports of 10GE - on Intel x520 NIC, IPv4 VXLAN Ethernet, L2 bridge-domain baseline - switching with MAC learning, NDR throughput discovery. - - *10ge2p1x520-ethip4-ip4base-ndrdisc.robot* => 2 ports of 10GE on Intel - x520 NIC, IPv4 baseline routed forwarding, NDR throughput discovery. - - *10ge2p1x520-ethip6-ip6scale200k-ndrdisc.robot* => 2 ports of 10GE on - Intel x520 NIC, IPv6 scaled up routed forwarding, NDR throughput - discovery. - -#. **Physical port to VM (or VM chain) to physical port - a.k.a. NIC2VM2NIC, - P2V2P, NIC2VMchain2NIC, P2V2V2P** - - - *PortNICConfig-WireEncapsulation-PacketForwardingFunction- - PacketProcessingFunction1-...-PacketProcessingFunctionN-VirtEncapsulation- - VirtPortConfig-VMconfig-TestType* - - *10ge2p1x520-dot1q-l2bdbasemaclrn-eth-2vhost-1vm-ndrdisc.robot* => 2 ports - of 10GE on Intel x520 NIC, dot1q tagged Ethernet, L2 bridge-domain - switching to/from two vhost interfaces and one VM, NDR throughput - discovery. - - *10ge2p1x520-ethip4vxlan-l2bdbasemaclrn-eth-2vhost-1vm-ndrdisc.robot* => 2 - ports of 10GE on Intel x520 NIC, IPv4 VXLAN Ethernet, L2 bridge-domain - switching to/from two vhost interfaces and one VM, NDR throughput - discovery. - - *10ge2p1x520-ethip4vxlan-l2bdbasemaclrn-eth-4vhost-2vm-ndrdisc.robot* => 2 - ports of 10GE on Intel x520 NIC, IPv4 VXLAN Ethernet, L2 bridge-domain - switching to/from four vhost interfaces and two VMs, NDR throughput - discovery. - -Methodology: Multi-Thread and Multi-Core ----------------------------------------- - -**HyperThreading** - CSIT |release| performance tests are executed with SUT -servers' Intel XEON CPUs configured in HyperThreading Disabled mode (BIOS -settings). This is the simplest configuration used to establish baseline -single-thread single-core SW packet processing and forwarding performance. -Subsequent releases of CSIT will add performance tests with Intel -HyperThreading Enabled (requires BIOS settings change and hard reboot). - -**Multi-core Test** - CSIT |release| multi-core tests are executed in the +Methodology: Multi-Core and Multi-Threading +------------------------------------------- + +**Intel Hyper-Threading** - CSIT |release| performance tests are executed with +SUT servers' Intel XEON processors configured in Intel Hyper-Threading Disabled +mode (BIOS setting). This is the simplest configuration used to establish +baseline single-thread single-core application packet processing and forwarding +performance. Subsequent releases of CSIT will add performance tests with Intel +Hyper-Threading Enabled (requires BIOS settings change and hard reboot of +server). + +**Multi-core Tests** - CSIT |release| multi-core tests are executed in the following VPP thread and core configurations: #. 1t1c - 1 VPP worker thread on 1 CPU physical core. #. 2t2c - 2 VPP worker threads on 2 CPU physical cores. -Note that in quite a few test cases running VPP on 2 physical cores hits -the tested NIC I/O bandwidth or packets-per-second limit. +VPP worker threads are the data plane threads. VPP control thread is running on +a separate non-isolated core together with other Linux processes. Note that in +quite a few test cases running VPP workers on 2 physical cores hits the tested +NIC I/O bandwidth or packets-per-second limit. Methodology: Packet Throughput ------------------------------ Following values are measured and reported for packet throughput tests: -- NDR binary search per RFC2544: +- NDR binary search per :rfc:`2544`: - Packet rate: "RATE: pps (2x )" - Aggregate bandwidth: "BANDWIDTH: Gbps (untagged)" -- PDR binary search per RFC2544: +- PDR binary search per :rfc:`2544`: - Packet rate: "RATE: pps (2x )" @@ -283,6 +263,7 @@ Following values are measured and reported for packet throughput tests: - IPv4: 64B, IMIX_v4_1 (28x64B,16x570B,4x1518B), 1518B, 9000B. - IPv6: 78B, 1518B, 9000B. +All rates are reported from external Traffic Generator perspective. Methodology: Packet Latency --------------------------- @@ -312,23 +293,75 @@ latency values are measured using following methodology: Methodology: KVM VM vhost ------------------------- -CSIT |release| introduced environment configuration changes to KVM Qemu vhost- -user tests in order to more representatively measure VPP-17.04 performance in -configurations with vhost-user interfaces and VMs. - -Current setup of CSIT FD.io performance lab is using tuned settings for more -optimal performance of KVM Qemu: - -- Qemu virtio queue size has been increased from default value of 256 to 1024 - descriptors. -- Adjusted Linux kernel CFS scheduler settings, as detailed on this CSIT wiki - page: https://wiki.fd.io/view/CSIT/csit-perf-env-tuning-ubuntu1604. - -Adjusted Linux kernel CFS settings make the NDR and PDR throughput performance -of VPP+VM system less sensitive to other Linux OS system tasks by reducing -their interference on CPU cores that are designated for critical software -tasks under test, namely VPP worker threads in host and Testpmd threads in -guest dealing with data plan. +CSIT |release| introduced test environment configuration changes to KVM Qemu +vhost-user tests in order to more representatively measure |vpp-release| +performance in configurations with vhost-user interfaces and different Qemu +settings. + +FD.io CSIT performance lab is testing VPP vhost with KVM VMs using following +environment settings: + +- Tests with varying Qemu virtio queue (a.k.a. vring) sizes: [vr256] default 256 + descriptors, [vr1024] 1024 descriptors to optimize for packet throughput; + +- Tests with varying Linux :abbr:`CFS (Completely Fair Scheduler)` settings: + [cfs] default settings, [cfsrr1] CFS RoundRobin(1) policy applied to all data + plane threads handling test packet path including all VPP worker threads and + all Qemu testpmd poll-mode threads; + +- Resulting test cases are all combinations with [vr256,vr1024] and + [cfs,cfsrr1] settings; + +- Adjusted Linux kernel :abbr:`CFS (Completely Fair Scheduler)` scheduler policy + for data plane threads used in CSIT is documented in + `CSIT Performance Environment Tuning wiki `_. + The purpose is to verify performance impact (NDR, PDR throughput) and + same test measurements repeatability, by making VPP and VM data plane + threads less susceptible to other Linux OS system tasks hijacking CPU + cores running those data plane threads. + +Methodology: LXC and Docker Containers memif +-------------------------------------------- + +CSIT |release| introduced additional tests taking advantage of VPP memif +virtual interface (shared memory interface) tests to interconnect VPP +instances. VPP vswitch instance runs in bare-metal user-mode handling +Intel x520 NIC 10GbE interfaces and connecting over memif (Master side) +virtual interfaces to more instances of VPP running in :abbr:`LXC (Linux +Container)` or in Docker Containers, both with memif virtual interfaces +(Slave side). LXCs and Docker Containers run in a priviliged mode with +VPP data plane worker threads pinned to dedicated physical CPU cores per +usual CSIT practice. All VPP instances run the same version of software. +This test topology is equivalent to existing tests with vhost-user and +VMs as described earlier in :ref:`tested_physical_topologies`. + +More information about CSIT LXC and Docker Container setup and control +is available in :ref:`containter_orchestration_in_csit`. + +Methodology: Container Topologies Orchestrated by K8s +----------------------------------------------------- + +CSIT |release| introduced new tests of Container topologies connected +over the memif virtual interface (shared memory interface). In order to +provide simple topology coding flexibility and extensibility container +orchestration is done with `Kubernetes `_ +using `Docker `_ images for all container +applications including VPP. `Ligato `_ is +used to address the container networking orchestration that is +integrated with K8s, including memif support. + +For these tests VPP vswitch instance runs in a Docker Container handling +Intel x520 NIC 10GbE interfaces and connecting over memif (Master side) +virtual interfaces to more instances of VPP running in Docker Containers +with memif virtual interfaces (Slave side). All Docker Containers run in +a priviliged mode with VPP data plane worker threads pinned to dedicated +physical CPU cores per usual CSIT practice. All VPP instances run the +same version of software. This test topology is equivalent to existing +tests with vhost-user and VMs as described earlier in +:ref:`tested_physical_topologies`. + +More information about CSIT Container Topologies Orchestrated by K8s is +available in :ref:`containter_orchestration_in_csit`. Methodology: IPSec with Intel QAT HW cards ------------------------------------------ @@ -356,27 +389,30 @@ specific configuration. TRex is installed and run on the TG compute node. The typical procedure is: - - If the TRex is not already installed on TG, it is installed in the - suite setup phase - see `TRex intallation `_. - - TRex configuration is set in its configuration file:: +- If the TRex is not already installed on TG, it is installed in the + suite setup phase - see `TRex intallation`_. +- TRex configuration is set in its configuration file + :: - /etc/trex_cfg.yaml + /etc/trex_cfg.yaml - - TRex is started in the background mode:: +- TRex is started in the background mode + :: - sh -c 'cd /opt/trex-core-2.22/scripts/ && sudo nohup ./t-rex-64 -i -c 7 --iom 0 > /dev/null 2>&1 &' > /dev/null + $ sh -c 'cd /opt/trex-core-2.25/scripts/ && sudo nohup ./t-rex-64 -i -c 7 --iom 0 > /dev/null 2>&1 &' > /dev/null - - There are traffic streams dynamically prepared for each test. The traffic - is sent and the statistics obtained using trex_stl_lib.api.STLClient. +- There are traffic streams dynamically prepared for each test, based on traffic + profiles. The traffic is sent and the statistics obtained using + :command:`trex_stl_lib.api.STLClient`. **Measuring packet loss** - - Create an instance of STLClient - - Connect to the client - - Add all streams - - Clear statistics - - Send the traffic for defined time - - Get the statistics +- Create an instance of STLClient +- Connect to the client +- Add all streams +- Clear statistics +- Send the traffic for defined time +- Get the statistics If there is a warm-up phase required, the traffic is sent also before test and the statistics are ignored.