Changes in |csit-release|
-------------------------
-#. **VPP Performance Tests**
+#. VPP PERFORMANCE TESTS
+
+ - **Intel Xeon 2n-skx, 3n-skx testbeds**: VPP performance test data
+ is now included in this report version. Due to substantial impact
+ of test environment changes (applied during the CSIT-2001
+ development cycle) on the performance of VPP software, a new
+ approach to performance comparison and progression/regression
+ root cause analysis (RCA) has been applied.
+
+ - CSIT test environment is now versioned, with ver. 1 associated
+ with CSIT rls1908 git branch as of 2019-08-21, and ver. 2
+ associated with CSIT rls2001 git branch as of 2020-03-27.
+
+ - To identify performance changes due to CSIT test environment
+ changes from ver. 1 to ver. 2, VPP v19.08.1 has been re-tested in
+ ver. 2 and compared against the past results obtained with
+ testing in ver. 1. Separate RCA1 analysis has been applied to
+ this part. See :ref:`vpp_compare_current_vs_previous_release` and
+ :ref:`vpp_known_issues`.
+
+ - To identify performance changes due to VPP code changes from
+ v19.08.1 to v20.01.0, both have been tested in CSIT environment
+ ver. 2 and compared against each other. Separate RCA2 analysis
+ has been applied to this part. See
+ :ref:`vpp_compare_current_vs_previous_release` and
+ :ref:`vpp_known_issues`.
+
+ - **Intel Xeon 2n-clx testbeds**: VPP performance test data is now
+ included in this report. See :ref:`vpp_known_issues`.
+
+ - **Service density 2n-skx tests**: Added new NF density tests with
+ IPsec encryption between DUTs.
+
+ - **AVF tests**: Full test coveraged based on code changes in CSIT
+ core layer (driver/interface awareness) and generated by suite
+ generator (Intel Fortville NICs only).
+
+ - **Hoststack tests**: Major refactor of VPP Hoststack TCP/IP
+ performance tests using WRK generator talking to the VPP HTTP
+ static server plugin measuring connections per second and
+ requests per second. Added new iperf3 with LDPreload tests,
+ iperf3/LDPreload tests with packet loss induced via the VPP NSIM
+ (Network Simulator) plugin, and QUIC/UDP/IP transport tests.
+ All of the new tests measure goodput through the VPP Hoststack
+ from client to server.
+
+ - **Latency HDRHistogram**: Added High Dynamic Range Histogram
+ latency measurements based on the new capability in TRex traffic
+ generator. HDRH latency data presented in latency packet
+ percentile graphs and in detailed results tables.
+
+ - **Mellanox CX556A-EDAT tests**: Added tests with Mellanox
+ ConnectX5-2p100GE NICs in 2n-clx testbeds using VPP native rdma
+ driver.
+
+ - **IPsec reconfiguration tests**: Added tests measuring the impact
+ of IPsec tunnels creations and removals.
+
+ - **Load Balancer tests**: Added VPP performance tests for Maglev,
+ L3DSR (Direct Server Return), Layer 4 Load Balancing NAT Mode.
+
+#. TEST FRAMEWORK
+
+ - **CSIT Python3 support**: Full migration of CSIT from Python2.7 to
+ Python3.6. This change includes library migration, PIP dependency
+ upgrade, CSIT container images, infrastructure packages
+ ugrade/installation.
+
+ - **CSIT PAPI support**: Finished conversion of CSIT VAT L1 keywords
+ to PAPI L1 KWs in CSIT using VPP Python bindings (VPP PAPI).
+ Redesign of key components of PAPI Socket Executor and PAPI
+ history. Due to issues with PAPI performance, VAT is still used
+ in CSIT for all VPP scale tests. See known issues below.
+
+ - **Test Suite Generator**: Added capability to generate suites for
+ different drivers per NIC model including DPDK, AVF, RDMA.
+ Extended coverage for all tests.
- - **MRR Throughput**: MRR (Maximum Receive Rate) test code has now
- configurable trial duration and number of consecutive executions.
- Coverage of MRR tests has been extended across more test
- scenarios. MRR tests are used for continuous performance trending
- and for comparison between VPP releases.
+ - **General Code Housekeeping**: Ongoing RF keywords optimizations,
+ removal of redundant RF keywords and aligning of suite/test
+ setup/teardowns.
- - **MLRsearch Throughput**: MLRsearch algorithm has been introduced
- for all NDR and PDR throughput tests. All tests that previously
- used binary search got converted to MLRsearch. Coverage of NDR/PDR
- tests has been extended across more test scenarios.
+#. TEST ENVIRONMENT
- - **L2patch Tests**: Tests measure performance of VPP L2patch, the
- fastest L2 forwarding path implemented in VPP, that cross-links
- RX and TX of two physical interfaces.
+ - **TRex Fortville NIC Performance**: Received FVL fix from Intel
+ resolving TRex low throughput issue. TRex per FVL NIC throughput
+ increased from ~27 Mpps to the nominal ~37 Mpps. For detail see
+ `CSIT-1503 <https://jira.fd.io/browse/CSIT-1503>`_ and `TRex-519
+ <https://trex-tgn.cisco.com/youtrack/issue/trex-519>`_].
- - **2-Node Tests**: A new baseline set of 2-node tests covering base
- ip4, ip6, l2patch, l2bd, l2xc, running on new Xeon Skylake
- testbeds.
+ - **New Intel Xeon Cascadelake Testbeds**: Added performance tests
+ for 2-Node-Cascadelake (2n-clx) testbeds with x710, xxv710 and
+ cx556a-edat NIC cards.
- - **Generated tests**: Simplified and unified test structure, semi-
- autogenerated by generator script. Test generator is currently
- able to create test combinations with various frame size and
- cores combinations. All existing test cases were converted to new
- format.
+#. PRESENTATION AND ANALYTICS LAYER
- - **Simultaneous Multi-Threading**: SMT-aware detection of server
- processor operation mode (HyperThreading enabled/disabled) with
- associated compute resource configuration including thread
- affinity, number of Rx queues and DPDK I/O mbufs. Tests are
- automatically tagged during execution to indicate executed thread
- configuration.
+ - **Graphs layout improvements**: Improved performance graphs layout
+ for better readibility and maintenance: test grouping, axis
+ labels, descriptions, other informative decoration.
- - **Intel Xeon Skylake Support**: Support for 2-Node and 3-Node
- physical testbed topologies based on the new SuperMirco servers
- each with two Intel Xeon Skylake Platinum processors. Full
- Ansible playbooks refactor for quick server (re)installation and
- reference pointers of configuration.
+ - **Latency graphs**: Min/Avg/Max group bar latency graphs are
+ replaced with packet latency percentile distributon at different
+ background packet loads based on TRex latency hdrhistogram
+ measurements.
-#. **Presentation and Analytics Layer**
+..
+ // Alternative Note for 1st Bullet when bad microcode Skx, Clx results are published
+ - **Intel Xeon 2n-skx, 3n-skx and 2n-clx testbeds**: VPP performance
+ test data is included in this report version, but it shows lower
+ performance and behaviour inconsistency of these systems
+ following the upgrade of processor microcode packages (skx ucode
+ 0x2000064, clx ucode 0x500002c) as part of updating Ubuntu 18.04
+ LTS kernel version. Tested VPP and DPDK applications (L3fwd) are
+ affected. Skx and Clx test data will be corrected in subsequent
+ maintenance report version(s) once the issue is resolved. See
+ :ref:`vpp_known_issues`.
- - **Performance trending**: Further improved continuous performance
- trending with anomaly detection and analysis.
+.. raw:: latex
-#. **Test Framework Optimizations**
+ \clearpage
- - **General Code Housekeeping**: Ongoing RF keywords optimizations,
- removal of redundant RF keywords.
+.. _vpp_known_issues:
Known Issues
------------
List of known issues in |csit-release| for VPP performance tests:
-+---+-----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
-| # | JiraID | Issue Description |
-+===+=========================================+=================================================================================================================================+
-| 1 | `CSIT-570 | Sporadic (1 in 200) NDR discovery test failures on x520. DPDK reporting rx-errors, indicating L1 issue. |
-| | <https://jira.fd.io/browse/CSIT-570>`_ | Suspected issue with HW combination of X710-X520 in LF testbeds. Not observed outside of LF testbeds. |
-+---+-----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
-| 2 | `VPP-1361 | High failure rate of api call sw_interface_set_flags [admin-up|link-up]. |
-| | <https://jira.fd.io/browse/VPP-1361>`_ | Failure rate: 30-40% of tests failing due to interfaces not in link-up state after API call sw_interface_set_flags. |
-+---+-----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
-| 3 | `CSIT-1234 | VPP IPSecHW scale interface mode 1core, low NDR and PDR 64B throughput in 3n-hsw testbeds, in CSIT-18.07 vs. CSIT-18.04. |
-| | <https://jira.fd.io/browse/CSIT-1234>`_ | ip4ipsecscale1000tnl-ip4base-int 1core CSIT-18.07/18.04 relative change: NDR -31%, PDR -32%, MRR -38%. |
-+---+-----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
-| 4 | `CSIT-1242 | VPP xl710 ip4base test 1core, low NDR and PDR 64B throughput in 3n-hsw testbeds, in CSIT-18.07 vs. CSIT-18.04. |
-| | <https://jira.fd.io/browse/CSIT-1242>`_ | xl710 ip4base 1core CSIT-18.07/18.04 relative change: NDR -29%, high stdev. |
-+---+-----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
-| 5 | `CSIT-1243 | VPP nat44 base test 2core, low NDR and PDR 64B throughput in 3n-skx testbeds, compared to 3n-hsw testbeds. |
-| | <https://jira.fd.io/browse/CSIT-1243>`_ | ip4base-nat44 2core 3n-skx/3n-hsw relative change: NDR -19%, PDR -22%. |
-+---+-----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
-| 6 | `CSIT-1244 | VPP lispip4 base test 2core, low NDR and PDR 64B throughput in 3n-skx testbeds, compared to 3n-hsw testbeds. |
-| | <https://jira.fd.io/browse/CSIT-1244>`_ | ip4lispip4-ip4base 2core 3n-skx/3n-hsw relative change: NDR -11%, PDR -18%. |
-+---+-----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
-| 7 | `CSIT-1245 | VPP srv6proxy-stat and srv6proxy-masq, much higher NDR and PDR 78B throughput in 3n-hsw testbeds, in CSIT-18.07 vs. CSIT-18.04. |
-| | <https://jira.fd.io/browse/CSIT-1245>`_ | Due to wrong test suite configuration in dynamic-proxy mode. Artefact of suite code refactoring. |
-+---+-----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| # | JiraID | Issue Description |
++====+=========================================+===========================================================================================================+
+| 1 | `CSIT-570 | Sporadic (1 in 200) NDR discovery test failures on x520. DPDK reporting rx-errors, indicating L1 issue. |
+| | <https://jira.fd.io/browse/CSIT-570>`_ | Suspected issue with HW combination of X710-X520 in LF testbeds. Not observed outside of LF testbeds. |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 2 | `VPP-662 | 9000B packets not supported by NICs VIC1227 and VIC1387. |
+| | <https://jira.fd.io/browse/VPP-662>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 3 | `CSIT-1498 | Memif tests are sporadically failing on initialization of memif connection. |
+| | <https://jira.fd.io/browse/CSIT-1498>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 4 | `VPP-1677 | 9000B ip4 nat44: VPP crash + coredump. |
+| | <https://jira.fd.io/browse/VPP-1677>`_ | VPP crashes very often in case that NAT44 is configured and it has to process IP4 jumbo frames (9000B). |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 5 | `CSIT-1591 | All CSIT scale tests can not use PAPI due to much slower performance compared to VAT/CLI (it takes much |
+| | <https://jira.fd.io/browse/CSIT-1499>`_ | longer to program VPP). This needs to be addressed on the PAPI side. |
+| +-----------------------------------------+ |
+| | `VPP-1763 | |
+| | <https://jira.fd.io/browse/VPP-1763>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 6 | `VPP-1675 | IPv4 IPSEC 9000B packet tests are failing as no packet is forwarded. |
+| | <https://jira.fd.io/browse/VPP-1675>`_ | Reason: chained buffers are not supported. |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 7 | `CSIT-1593 | IPv4 AVF 9000B packet tests are failing on 3n-skx while passing on 2n-skx. |
+| | <https://jira.fd.io/browse/CSIT-1593>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 8 | `CSIT-1675 | Intel Xeon 2n-skx, 3n-skx and 2n-clx testbeds behaviour and performance became inconsistent following |
+| | <https://jira.fd.io/browse/CSIT-1675>`_ | the upgrade to the latest Ubuntu 18.04 LTS kernel version (4.15.0-72-generic) and associated microcode |
+| | | packages (skx ucode 0x2000064, clx ucode 0x500002c). VPP as well as DPDK L3fwd tests are affected. |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 9 | `CSIT-1679 | All 2n-clx VPP ip4 tests with xxv710 and avf driver are failing. |
+| | <https://jira.fd.io/browse/CSIT-1679>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 10 | `CSIT-1680 | Some 2n-clx cx556a rdma tests are failing. |
+| | <https://jira.fd.io/browse/CSIT-1680>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 11 | `CSIT-1699 | Root Cause Analysis for CSIT-2001. Investigate high stdev of tests with VPP inside VM. |
+| | <https://jira.fd.io/browse/CSIT-1699>`_ | |
+| +-----------------------------------------+ |
+| | `CSIT-1704 | |
+| | <https://jira.fd.io/browse/CSIT-1704>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 12 | `CSIT-1699 | Root Cause Analysis for CSIT-2001. Identify cause of dot1q-l2xcbase progression. |
+| | <https://jira.fd.io/browse/CSIT-1699>`_ | |
+| +-----------------------------------------+ |
+| | `CSIT-1705 | |
+| | <https://jira.fd.io/browse/CSIT-1705>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 13 | `CSIT-1699 | Root Cause Analysis for CSIT-2001. Identify cause of avf-ip4scale regression. |
+| | <https://jira.fd.io/browse/CSIT-1699>`_ | |
+| +-----------------------------------------+ |
+| | `CSIT-1706 | |
+| | <https://jira.fd.io/browse/CSIT-1706>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 14 | `CSIT-1699 | Root Cause Analysis for CSIT-2001. Identify cause of progression in vhost-user tests with testpmd in VM. |
+| | <https://jira.fd.io/browse/CSIT-1699>`_ | |
+| +-----------------------------------------+ |
+| | `CSIT-1707 | |
+| | <https://jira.fd.io/browse/CSIT-1707>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+
+| 15 | `CSIT-1699 | Root Cause Analysis for CSIT-2001. Identify cause for avf-ip4base regression. |
+| | <https://jira.fd.io/browse/CSIT-1699>`_ | |
+| +-----------------------------------------+ |
+| | `CSIT-1708 | |
+| | <https://jira.fd.io/browse/CSIT-1708>`_ | |
++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+