X-Git-Url: https://gerrit.fd.io/r/gitweb?p=csit.git;a=blobdiff_plain;f=docs%2Freport%2Fdpdk_performance_tests%2Foverview.rst;h=6af7fe9032c163a1771fb824f4f3c04a44411349;hp=e326de0b227ea8c00397895ce75542fb4fbba703;hb=bbcaa22c4425c32c3e3d2bcde434cdefc6b9a992;hpb=1a65bb060fb464ea0113ec082af66fce481c7773 diff --git a/docs/report/dpdk_performance_tests/overview.rst b/docs/report/dpdk_performance_tests/overview.rst index e326de0b22..6af7fe9032 100644 --- a/docs/report/dpdk_performance_tests/overview.rst +++ b/docs/report/dpdk_performance_tests/overview.rst @@ -5,9 +5,8 @@ Tested Physical Topologies -------------------------- CSIT DPDK performance tests are executed on physical baremetal servers hosted -by LF FD.io project. Testbed physical topology is shown in the figure below. - -:: +by :abbr:`LF (Linux Foundation)` FD.io project. Testbed physical topology is +shown in the figure below.:: +------------------------+ +------------------------+ | | | | @@ -27,14 +26,13 @@ by LF FD.io project. Testbed physical topology is shown in the figure below. | | +-----------+ -SUT1 and SUT2 are two System Under Test servers (currently Cisco UCS C240, -each with two Intel XEON CPUs), TG is a Traffic Generator (TG, currently -another Cisco UCS C240, with two Intel XEON CPUs). SUTs run Testpmd/L3FWD SW -application in Linux user-mode as a Device Under Test (DUT). TG runs TRex SW -application as a packet Traffic Generator. Physical connectivity between SUTs -and to TG is provided using direct links (no L2 switches) connecting different -NIC models that need to be tested for performance. Currently installed and -tested NIC models include: +SUT1 and SUT2 are two System Under Test servers (Cisco UCS C240, each with two +Intel XEON CPUs), TG is a Traffic Generator (TG, another Cisco UCS C240, with +two Intel XEON CPUs). SUTs run Testpmd/L3FWD SW SW application in Linux +user-mode as a Device Under Test (DUT). TG runs TRex SW application as a packet +Traffic Generator. Physical connectivity between SUTs and to TG is provided +using different NIC models that need to be tested for performance. Currently +installed and tested NIC models include: #. 2port10GE X520-DA2 Intel. #. 2port10GE X710 Intel. @@ -42,26 +40,51 @@ tested NIC models include: #. 2port40GE VIC1385 Cisco. #. 2port40GE XL710 Intel. -For detailed LF FD.io test bed specification and physical topology please refer -to `LF FDio CSIT testbed wiki page `_. +From SUT and DUT perspective, all performance tests involve forwarding packets +between two physical Ethernet ports (10GE or 40GE). Due to the number of +listed NIC models tested and available PCI slot capacity in SUT servers, in +all of the above cases both physical ports are located on the same NIC. In +some test cases this results in measured packet throughput being limited not +by VPP DUT but by either the physical interface or the NIC capacity. + +Going forward CSIT project will be looking to add more hardware into FD.io +performance labs to address larger scale multi-interface and multi-NIC +performance testing scenarios. + +Note that reported DUT (DPDK) performance results are specific to the SUTs +tested. Current :abbr:`LF (Linux Foundation)` FD.io SUTs are based on Intel +XEON E5-2699v3 2.3GHz CPUs. SUTs with other CPUs are likely to yield different +results. A good rule of thumb, that can be applied to estimate DPDK packet +thoughput for Phy-to-Phy (NIC-to-NIC, PCI-to-PCI) topology, is to expect +the forwarding performance to be proportional to CPU core frequency, +assuming CPU is the only limiting factor and all other SUT parameters +equivalent to FD.io CSIT environment. The same rule of thumb can be also +applied for Phy-to-VM/LXC-to-Phy (NIC-to-VM/LXC-to-NIC) topology, but due to +much higher dependency on intensive memory operations and sensitivity to Linux +kernel scheduler settings and behaviour, this estimation may not always yield +good enough accuracy. + +For detailed :abbr:`LF (Linux Foundation)` FD.io test bed specification and +physical topology please refer to `LF FD.io CSIT testbed wiki page +`_. Performance Tests Coverage -------------------------- -Performance tests are split into the two main categories: +Performance tests are split into two main categories: - Throughput discovery - discovery of packet forwarding rate using binary search - in accordance with RFC2544. + in accordance to :rfc:`2544`. - NDR - discovery of Non Drop Rate packet throughput, at zero packet loss; - followed by packet one-way latency measurements at 10%, 50% and 100% of + followed by one-way packet latency measurements at 10%, 50% and 100% of discovered NDR throughput. - PDR - discovery of Partial Drop Rate, with specified non-zero packet loss - currently set to 0.5%; followed by packet one-way latency measurements at + currently set to 0.5%; followed by one-way packet latency measurements at 100% of discovered PDR throughput. - Throughput verification - verification of packet forwarding rate against - previously discovered NDR throughput. These tests are currently done against + previously discovered throughput rate. These tests are currently done against 0.9 of reference NDR, with reference rates updated periodically. CSIT |release| includes following performance test suites, listed per NIC type: @@ -89,21 +112,32 @@ testbed resources to grow, and will be adding complete set of performance tests for all models of hardware to be executed regularly and(or) continuously. -Methodology: Multi-Thread and Multi-Core ----------------------------------------- +Performance Tests Naming +------------------------ + +CSIT |release| follows a common structured naming convention for all performance +and system functional tests, introduced in CSIT |release-1|. + +The naming should be intuitive for majority of the tests. Complete description +of CSIT test naming convention is provided on `CSIT test naming wiki +`_. -**HyperThreading** - CSIT |release| performance tests are executed with SUT -servers' Intel XEON CPUs configured in HyperThreading Disabled mode (BIOS -settings). This is the simplest configuration used to establish baseline -single-thread single-core SW packet processing and forwarding performance. -Subsequent releases of CSIT will add performance tests with Intel -HyperThreading Enabled (requires BIOS settings change and hard reboot). +Methodology: Multi-Core and Multi-Threading +------------------------------------------- -**Multi-core Test** - CSIT |release| multi-core tests are executed in the -following thread and core configurations: +**Intel Hyper-Threading** - CSIT |release| performance tests are executed with +SUT servers' Intel XEON processors configured in Intel Hyper-Threading Disabled +mode (BIOS setting). This is the simplest configuration used to establish +baseline single-thread single-core application packet processing and forwarding +performance. Subsequent releases of CSIT will add performance tests with Intel +Hyper-Threading Enabled (requires BIOS settings change and hard reboot of +server). -#. 1t1c - 1 pmd thread on 1 CPU physical core. -#. 2t2c - 2 pmd threads on 2 CPU physical cores. +**Multi-core Tests** - CSIT |release| multi-core tests are executed in the +following VPP thread and core configurations: + +#. 1t1c - 1 pmd worker thread on 1 CPU physical core. +#. 2t2c - 2 pmd worker threads on 2 CPU physical cores. Note that in many tests running Testpmd/L3FWD reaches tested NIC I/O bandwidth or packets-per-second limit. @@ -113,14 +147,14 @@ Methodology: Packet Throughput Following values are measured and reported for packet throughput tests: -- NDR binary search per RFC2544: +- NDR binary search per :rfc:`2544`: - Packet rate: "RATE: pps (2x )" - Aggregate bandwidth: "BANDWIDTH: Gbps (untagged)" -- PDR binary search per RFC2544: +- PDR binary search per :rfc:`2544`: - Packet rate: "RATE: pps (2x )" @@ -133,6 +167,8 @@ Following values are measured and reported for packet throughput tests: - IPv4: 64B, 1518B, 9000B. +All rates are reported from external Traffic Generator perspective. + Methodology: Packet Latency --------------------------- @@ -157,3 +193,48 @@ Reported latency values are measured using following methodology: additonal Tx/Rx interface latency induced by TRex SW writing and reading packet timestamps on CPU cores without HW acceleration on NICs closer to the interface line. + +Methodology: TRex Traffic Generator Usage +----------------------------------------- + +The `TRex traffic generator `_ is used for all +CSIT performance tests. TRex stateless mode is used to measure NDR and PDR +throughputs using binary search (NDR and PDR discovery tests) and for quick +checks of DUT performance against the reference NDRs (NDR check tests) for +specific configuration. + +TRex is installed and run on the TG compute node. The typical procedure is: + +- If the TRex is not already installed on TG, it is installed in the + suite setup phase - see `TRex intallation`_. +- TRex configuration is set in its configuration file + :: + + /etc/trex_cfg.yaml + +- TRex is started in the background mode + :: + + $ sh -c 'cd /opt/trex-core-2.25/scripts/ && sudo nohup ./t-rex-64 -i -c 7 --iom 0 > /dev/null 2>&1 &' > /dev/null + +- There are traffic streams dynamically prepared for each test, based on traffic + profiles. The traffic is sent and the statistics obtained using + :command:`trex_stl_lib.api.STLClient`. + +**Measuring packet loss** + +- Create an instance of STLClient +- Connect to the client +- Add all streams +- Clear statistics +- Send the traffic for defined time +- Get the statistics + +If there is a warm-up phase required, the traffic is sent also before test and +the statistics are ignored. + +**Measuring latency** + +If measurement of latency is requested, two more packet streams are created (one +for each direction) with TRex flow_stats parameter set to STLFlowLatencyStats. In +that case, returned statistics will also include min/avg/max latency values.