X-Git-Url: https://gerrit.fd.io/r/gitweb?a=blobdiff_plain;f=docs%2Freport%2Fvpp_performance_tests%2Fcsit_release_notes.rst;h=0128c629063b595e6b2e545ec8b09a8f4a233dd2;hb=a8a1c5a78b1c2ac0b9224cc9dbdb1fa0486ae30b;hp=3daff961b4566429d8062e521d90b15286595856;hpb=1911067935bdf6481fc9b89f40d79ca61f2448d0;p=csit.git diff --git a/docs/report/vpp_performance_tests/csit_release_notes.rst b/docs/report/vpp_performance_tests/csit_release_notes.rst index 3daff961b4..0128c62906 100644 --- a/docs/report/vpp_performance_tests/csit_release_notes.rst +++ b/docs/report/vpp_performance_tests/csit_release_notes.rst @@ -1,181 +1,176 @@ -CSIT Release Notes -================== - -Changes in CSIT |release| -------------------------- - -#. Test environment changes in VPP data plane performance tests: - - - Further characterization and optimizations of VPP vhost-user and VM - test methodology and test environment; - - - Tests with varying Qemu virtio queue (a.k.a. vring) sizes: - [vr256] default 256 descriptors, [vr1024] 1024 descriptors to - optimize for packet throughput; - - - Tests with varying Linux CFS (Completely Fair Scheduler) - settings: [cfs] default settings, [cfsrr1] CFS RoundRobin(1) - policy applied to all data plane threads handling test packet - path including all VPP worker threads and all Qemu testpmd - poll-mode threads; - - - Resulting test cases are all combinations with [vr256,vr1024] and - [cfs,cfsrr1] settings; - - - For more detail see performance results observations section in - this report; - -#. Code updates and optimizations in CSIT performance framework: - - - Complete CSIT framework code revision and optimizations as descried - on CSIT wiki page - `Design_Optimizations `_. - - - For more detail see the CSIT Framework Design section in this - report; - -#. Changes to CSIT driver for TRex Traffic Generator: - - - Complete refactor of TRex CSIT driver; - - - Introduction of packet traffic profiles to improve usability and - manageability of traffic profiles for a growing number of test - scenarios. - - - Support for packet traffic profiles to test IPv4/IPv6 stateful and - stateless DUT data plane features; - -#. Added VPP performance tests - - - **Linux Container VPP memif virtual interface tests** - - - VPP Memif virtual interface (shared memory interface) tests - interconnecting VPP instances over memif. VPP vswitch - instance runs in bare-metal user-mode handling Intel x520 NIC - 10GbE interfaces and connecting over memif (Master side) virtual - interfaces to another instance of VPP running in bare-metal Linux - Container (LXC) with memif virtual interfaces (Slave side). LXC - runs in a priviliged mode with VPP data plane worker threads - pinned to dedicated physical CPU cores per usual CSIT practice. - Both VPP run the same version of software. This test topology is - equivalent to existing tests with vhost-user and VMs. - - - **Stateful Security Groups** - - - New tests of VPP stateful security-groups a.k.a. acl-plugin - functionally compatible with networking-vpp OpenStack; - - - New tested security-groups access-control-lists (acl) - configuration variants include: [iaclNsl] input acl stateless, - [oaclNsl] output acl stateless, [iaclNsf] input acl stateful - a.k.a. reflect, [oaclNsf] output acl stateful a.k.a. reflect, - where N is number of access-control-entries (ace) in the acl. - - - Testing packet flows transmitted by TG: 100, 10k, 100k, always - hitting the last permit entry in acl. - - - **VPP vhost and VM tests** - - - New VPP vhost-user and VM test cases to benchmark performance of - VPP and VM topologies with Qemu and CFS policy combinations of - [vr256,vr1024] x [cfs,cfsrr1]; - - - Statistical analysis of repeatibility of results; - -Performance Improvements ------------------------- - -Substantial improvements in measured packet throughput have been -observed in a number of CSIT |release| tests listed below, with relative -increase of double-digit percentage points. Relative improvements for -this release are calculated against the test results listed in CSIT -|release-1| report. The comparison is calculated between the mean values -based on collected and archived test results' samples for involved VPP -releases. Standard deviation has been also listed for CSIT |release|. -VPP-16.09 and VPP-17.01 numbers are provided for reference. - -NDR Throughput -~~~~~~~~~~~~~~ - -Non-Drop Rate Throughput discovery tests: - -.. csv-table:: - :align: center - :header: VPP Functionality,Test Name,VPP-16.09 [Mpps],VPP-17.01 [Mpps],VPP-17.04 mean [Mpps],VPP-17.07 mean [Mpps],VPP-17.07 stdev [Mpps],17.04 to 17.07 change - :file: ../../../docs/report/vpp_performance_tests/performance_improvements/ndr_throughput.csv - -PDR Throughput -~~~~~~~~~~~~~~ - -Partial Drop Rate thoughput discovery tests with packet Loss Tolerance of 0.5%: - -.. csv-table:: - :align: center - :header: VPP Functionality,Test Name,VPP-16.09 [Mpps],VPP-17.01 [Mpps],VPP-17.04 mean [Mpps],VPP-17.07 mean [Mpps],VPP-17.07 stdev [Mpps],17.04 to 17.07 change - :file: ../../../docs/report/vpp_performance_tests/performance_improvements/pdr_throughput.csv - -Measured improvements are in line with VPP code optimizations listed in -`VPP-17.07 release notes -`_. - -Other Performance Changes -------------------------- - -Other changes in measured packet throughput, with either minor relative -increase or decrease, have been observed in a number of CSIT |release| -tests listed below. Relative changes are calculated against the test -results listed in CSIT |release-1| report. - -NDR Throughput -~~~~~~~~~~~~~~ - -Non-Drop Rate Throughput discovery tests: - -.. csv-table:: - :align: center - :header: VPP Functionality,Test Name,VPP-16.09 [Mpps],VPP-17.01 [Mpps],VPP-17.04 mean [Mpps],VPP-17.07 mean [Mpps],VPP-17.07 stdev [Mpps],17.04 to 17.07 change - :file: ../../../docs/report/vpp_performance_tests/performance_improvements/ndr_throughput_others.csv - -PDR Throughput -~~~~~~~~~~~~~~ - -Partial Drop Rate thoughput discovery tests with packet Loss Tolerance of 0.5%: - -.. csv-table:: - :align: center - :header: VPP Functionality,Test Name,VPP-16.09 [Mpps],VPP-17.01 [Mpps],VPP-17.04 mean [Mpps],VPP-17.07 mean [Mpps],VPP-17.07 stdev [Mpps],17.04 to 17.07 change - :file: ../../../docs/report/vpp_performance_tests/performance_improvements/pdr_throughput_others.csv - - -Known Issues ------------- - -Here is the list of known issues in CSIT |release| for VPP performance tests: - -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ -| # | Issue | Jira ID | Description | -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ -| 1 | Security-groups acl-plugin scale tests failure | CSIT-xxx | VPP with 2 worker threads crashes during security-groups | -| | with stateful acls if VPP with 2 worker threads | VPP-912 | iaclNsf and oaclNsf tests with 100k flows. | -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ -| 2 | VPP fails memif tests in 4 worker 2 core setup | CSIT-xxx | VPP with 4 worker threads running on 2 physical cores crashes | -| | | VPP-xxx | during memif tests. Initial debugging points to DPDK code | -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ -| X | NDR discovery test failures 1518B frame size | VPP-663 | VPP reporting errors: dpdk-input Rx ip checksum errors. | -| | for ip4scale200k, ip4scale2m scale IPv4 routed- | | Observed frequency: all test runs. | -| | forwarding tests. ip4scale20k tests are fine. | | | -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ -| X | Vic1385 and Vic1227 low performance. | VPP-664 | Low NDR performance. | -| | | | | -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ -| X | Sporadic NDR discovery test failures on x520. | CSIT-750 | Suspected issue with HW settings (BIOS, FW) in LF | -| | | | infrastructure. Issue can't be replicated outside LF. | -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ -| X | VPP in 2t2c setups - large variation | CSIT-568 | Suspected NIC firmware or DPDK driver issue affecting NDR | -| | of discovered NDR throughput values across | | throughput. Applies to XL710 and X710 NICs, x520 NICs are fine. | -| | multiple test runs with xl710 and x710 NICs. | | | -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ -| X | Lower than expected NDR and PDR throughput with | CSIT-569 | Suspected NIC firmware or DPDK driver issue affecting NDR and | -| | xl710 and x710 NICs, compared to x520 NICs. | | PDR throughput. Applies to XL710 and X710 NICs. | -+---+-------------------------------------------------+------------+-----------------------------------------------------------------+ - +.. _vpp_performance_tests_release_notes: + +Release Notes +============= + +Changes in |csit-release| +------------------------- + +#. VPP PERFORMANCE TESTS + + - **Added new performance testbed 3n-snr** (3 Node SnowRidge, with Intel + Atom processors). + + - **Added GTPU HW offload tests** using VPP GTPU hardware offload + with Intel e810 4p25ge NICs (3n-icx testbeds only). These tests + were already there in CSIT-2206, but were yielding invalid + results due to using TRex v2.97 that was incompatible with e810 + NICs used for those tests. + + - **Added Wireguard tests** using VPP software crypto (3n-icx, 3n-snr + testbeds) and using built-in hardware crypto QAT device (3n-snr testbed + only). + + - **Reduction of tests**: Removed certain test variations executed + iteratively for the report (as well as in daily and weekly + trending) due to physical testbeds overload. + +#. TEST FRAMEWORK + + - CSIT-2210 executes all VPP v22.10 performance tests using vpp ubuntu2204 + images, due to CSIT execution environment change as noted below. This + applies to all performance testbeds except Denverton. Consequently, VPP + v22.06 has not been re-tested in CSIT-2210 environment, as no ubuntu204 + images are available for that VPP version. Performance comparison + between VPP v22.10 (current version) vs VPP v22.06 (previous version) + may be impacted by VPP build environment change (ubuntu2004 to ubuntu + 2204) change and CSIT environment change. See :ref:`vpp_rca` for + details. + + - **CSIT test environment** version has been updated to ver. 11, see + :ref:`test_environment_versioning`. + + - **TCP TPUT profiles** had to be changed, as newer TRex versions + are not deterministic enough when deciding when to send an ACK. + + - **CSIT PAPI support**: Due to issues with PAPI performance, and + deprecation of VAT, VPP CLI is used in CSIT for many VPP scale + tests. See :ref:`vpp_known_issues`. + + - **General Code Housekeeping**: Ongoing code optimizations and bug + fixes. + +#. PRESENTATION AND ANALYTICS LAYER + + - **C-Dash** `performance dashboard `_ got updated UI and + updated backend increasing its performance and robustness. + +.. raw:: latex + + \clearpage + +.. _vpp_known_issues: + +Known Issues +------------ + +New +___ + ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| # | JiraID | Issue Description | ++====+=========================================+===========================================================================================================+ +| 1 | `CSIT-1890 | 3n-alt: Tests failing until 40Ge Interface comes up. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ + +Previous +________ + +Issues reported in previous releases which still affect the current results. + ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| # | JiraID | Issue Description | ++====+=========================================+===========================================================================================================+ +| 1 | `CSIT-1782 | Multicore AVF tests are failing when trying to create interface. | +| | `_ | Frequency is reduced by CSIT workaround, but occasional failures do still happen. | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 2 | `CSIT-1785 | NAT44ED tests failing to establish all TCP sessions. | +| | `_ | At least for max scale, in allotted time (limited by session 500s timeout) due to worse | +| +-----------------------------------------+ slow path performance than previously measured and calibrated for. | +| | `VPP-1972 | CSIT removed the max scale NAT tests to avoid this issue. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 3 | `CSIT-1799 | All NAT44-ED 16M sessions CPS scale tests fail while setting NAT44 address range. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 4 | `CSIT-1800 | All Geneve L3 mode scale tests (1024 tunnels) are failing. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 5 | `CSIT-1801 | 9000B payload frames not forwarded over tunnels due to violating supported Max Frame Size (VxLAN, LISP, | +| | `_ | SRv6). | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 6 | `CSIT-1802 | all testbeds: AF-XDP - NDR tests failing from time to time. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 7 | `CSIT-1804 | All testbeds: NDR tests failing from time to time. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 8 | `CSIT-1808 | All tests with 9000B payload frames not forwarded over memif interfaces. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 9 | `CSIT-1827 | 3n-icx, 3n-skx: all AVF crypto tests sporadically fail. 1518B with no traffic, IMIX with excessive | +| | `_ | packet loss. | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 10 | `CSIT-1835 | 3n-icx: QUIC vppecho BPS tests failing on timeout when checking hoststack finished. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 11 | `CSIT-1849 | 2n-skx, 2n-clx, 2n-icx: UDP 16m TPUT tests fail to create all sessions. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 12 | `CSIT-1864 | 2n-clx: half of the packets lost on PDR tests. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 13 | `CSIT-1877 | 3n-tsh: all VM tests failing to boot VM. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 14 | `CSIT-1883 | 3n-snr: All hwasync wireguard tests failing when trying to verify device. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 15 | `CSIT-1884 | 2n-clx, 2n-icx: All NAT44DET NDR PDR IMIX over 1M sessions BIDIR tests failing to create enough sessions. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 16 | `CSIT-1885 | 3n-icx: 9000b ip4 ip6 l2 NDRPDR AVF tests are failing to forward traffic. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 17 | `CSIT-1886 | 3n-icx: Wireguard tests with 100 and more tunnels are failing PDR criteria. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ + +Fixed +_____ + +Issues reported in previous releases which were fixed in this release: + ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| # | JiraID | Issue Description | ++====+=========================================+===========================================================================================================+ +| 1 | `CSIT-1868 | 2n-clx: ALL ldpreload-nginx tests fails when trying to start nginx. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ +| 2 | `CSIT-1871 | 3n-snr: 25GE interface between SUT and TG/TRex goes down randomly. | +| | `_ | | ++----+-----------------------------------------+-----------------------------------------------------------------------------------------------------------+ + +.. _vpp_rca: + +Root Cause Analysis for Performance Changes +------------------------------------------- + +List of RCAs in |csit-release| for VPP performance changes: + ++----+-----------------------------------------+--------------------------------------------------------------------+ +| # | JiraID | Issue Description | ++====+=========================================+====================================================================+ +| 1 | `CSIT-1887 | rls2210 RCA: ASTF tests | +| | `_ | TRex upgrade decreased TRex performance. NAT results not affected, | +| | | except on Denverton due to interference from VPP-2010. | ++----+-----------------------------------------+--------------------------------------------------------------------+ +| 2 | `CSIT-1888 | rls2210 RCA: testbed differences, especially for ipsec | +| | `_ | Not caused by VPP code nor CSIT code. | +| | | Most probable cause is clang-14 behavior. | ++----+-----------------------------------------+--------------------------------------------------------------------+ +| 3 | `CSIT-1889 | rls2210 RCA: policy-outbound-nocrypto | +| | `_ | When VPP added spd fast path matching (Gerrit 36097), | +| | | it decreased MRR of the corresponding tests, at least on 3-alt. | ++----+-----------------------------------------+--------------------------------------------------------------------+