docs/report/vpp_performance_tests/overview.rst

   1 Overview
   2 ========
   3
   4 For description of physical testbeds used for VPP performance tests
   5 please refer to :ref:`physical_testbeds`.
   6
   7 Logical Topologies
   8 ------------------
   9
  10 .. _tested_logical_topologies:
  11
  12 CSIT VPP performance tests are executed on physical testbeds described
  13 in :ref:`physical_testbeds`. Based on the packet path thru server SUTs,
  14 three distinct logical topology types are used for VPP DUT data plane
  15 testing:
  16
  17 #. NIC-to-NIC switching topologies.
  18 #. VM service switching topologies.
  19 #. Container service switching topologies.
  20
  21 NIC-to-NIC Switching
  22 ~~~~~~~~~~~~~~~~~~~~
  23
  24 The simplest logical topology for software data plane application like
  25 VPP is NIC-to-NIC switching. Tested topologies for 2-Node and 3-Node
  26 testbeds are shown in figures below.
  27
  28 .. only:: latex
  29
  30     .. raw:: latex
  31
  32         \begin{figure}[H]
  33         \centering
  34             \includesvg[width=0.90\textwidth]{../_tmp/src/vpp_performance_tests/logical-2n-nic2nic}
  35             \label{fig:logical-2n-nic2nic}
  36         \end{figure}
  37
  38 .. only:: html
  39
  40     .. figure:: logical-2n-nic2nic.svg
  41         :alt: logical-2n-nic2nic
  42         :align: center
  43
  44
  45 .. only:: latex
  46
  47     .. raw:: latex
  48
  49         \begin{figure}[H]
  50         \centering
  51             \includesvg[width=0.90\textwidth]{../_tmp/src/vpp_performance_tests/logical-3n-nic2nic}
  52             \label{fig:logical-3n-nic2nic}
  53         \end{figure}
  54
  55 .. only:: html
  56
  57     .. figure:: logical-3n-nic2nic.svg
  58         :alt: logical-3n-nic2nic
  59         :align: center
  60
  61 Server Systems Under Test (SUT) run VPP application in Linux user-mode
  62 as a Device Under Test (DUT). Server Traffic Generator (TG) runs T-Rex
  63 application. Physical connectivity between SUTs and TG is provided using
  64 different drivers and NIC models that need to be tested for performance
  65 (packet/bandwidth throughput and latency).
  66
  67 From SUT and DUT perspectives, all performance tests involve forwarding
  68 packets between two (or more) physical Ethernet ports (10GE, 25GE, 40GE,
  69 100GE). In most cases both physical ports on SUT are located on the same
  70 NIC. The only exceptions are link bonding and 100GE tests. In the latter
  71 case only one port per NIC can be driven at linerate due to PCIe Gen3
  72 x16 slot bandwidth limiations. 100GE NICs are not supported in PCIe Gen3
  73 x8 slots.
  74
  75 Note that reported VPP DUT performance results are specific to the SUTs
  76 tested. SUTs with other processors than the ones used in FD.io lab are
  77 likely to yield different results. A good rule of thumb, that can be
  78 applied to estimate VPP packet thoughput for NIC-to-NIC switching
  79 topology, is to expect the forwarding performance to be proportional to
  80 processor core frequency for the same processor architecture, assuming
  81 processor is the only limiting factor and all other SUT parameters are
  82 equivalent to FD.io CSIT environment.
  83
  84 VM Service Switching
  85 ~~~~~~~~~~~~~~~~~~~~
  86
  87 VM service switching topology test cases require VPP DUT to communicate
  88 with Virtual Machines (VMs) over vhost-user virtual interfaces.
  89
  90 Two types of VM service topologies are tested in CSIT |release|:
  91
  92 #. "Parallel" topology with packets flowing within SUT from NIC(s) via
  93    VPP DUT to VM, back to VPP DUT, then out thru NIC(s).
  94
  95 #. "Chained" topology (a.k.a. "Snake") with packets flowing within SUT
  96    from NIC(s) via VPP DUT to VM, back to VPP DUT, then to the next VM,
  97    back to VPP DUT and so on and so forth until the last VM in a chain,
  98    then back to VPP DUT and out thru NIC(s).
  99
 100 For each of the above topologies, VPP DUT is tested in a range of L2
 101 or IPv4/IPv6 configurations depending on the test suite. Sample VPP DUT
 102 "Chained" VM service topologies for 2-Node and 3-Node testbeds with each
 103 SUT running N of VM instances is shown in the figures below.
 104
 105 .. only:: latex
 106
 107     .. raw:: latex
 108
 109         \begin{figure}[H]
 110         \centering
 111             \includesvg[width=0.90\textwidth]{../_tmp/src/vpp_performance_tests/logical-2n-vm-vhost}
 112             \label{fig:logical-2n-vm-vhost}
 113         \end{figure}
 114
 115 .. only:: html
 116
 117     .. figure:: logical-2n-vm-vhost.svg
 118         :alt: logical-2n-vm-vhost
 119         :align: center
 120
 121
 122 .. only:: latex
 123
 124     .. raw:: latex
 125
 126         \begin{figure}[H]
 127         \centering
 128             \includesvg[width=0.90\textwidth]{../_tmp/src/vpp_performance_tests/logical-3n-vm-vhost}
 129             \label{fig:logical-3n-vm-vhost}
 130         \end{figure}
 131
 132 .. only:: html
 133
 134     .. figure:: logical-3n-vm-vhost.svg
 135         :alt: logical-3n-vm-vhost
 136         :align: center
 137
 138 In "Chained" VM topologies, packets are switched by VPP DUT multiple
 139 times: twice for a single VM, three times for two VMs, N+1 times for N
 140 VMs. Hence the external throughput rates measured by TG and listed in
 141 this report must be multiplied by N+1 to represent the actual VPP DUT
 142 aggregate packet forwarding rate.
 143
 144 For "Parallel" service topology packets are always switched twice by VPP
 145 DUT per service chain.
 146
 147 Note that reported VPP DUT performance results are specific to the SUTs
 148 tested. SUTs with other processor than the ones used in FD.io lab are
 149 likely to yield different results. Similarly to NIC-to-NIC switching
 150 topology, here one can also expect the forwarding performance to be
 151 proportional to processor core frequency for the same processor
 152 architecture, assuming processor is the only limiting factor. However
 153 due to much higher dependency on intensive memory operations in VM
 154 service chained topologies and sensitivity to Linux scheduler settings
 155 and behaviour, this estimation may not always yield good enough
 156 accuracy.
 157
 158 Container Service Switching
 159 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 160
 161 Container service switching topology test cases require VPP DUT to
 162 communicate with Containers (Ctrs) over memif virtual interfaces.
 163
 164 Three types of VM service topologies are tested in CSIT |release|:
 165
 166 #. "Parallel" topology with packets flowing within SUT from NIC(s) via
 167    VPP DUT to Container, back to VPP DUT, then out thru NIC(s).
 168
 169 #. "Chained" topology (a.k.a. "Snake") with packets flowing within SUT
 170    from NIC(s) via VPP DUT to Container, back to VPP DUT, then to the
 171    next Container, back to VPP DUT and so on and so forth until the
 172    last Container in a chain, then back to VPP DUT and out thru NIC(s).
 173
 174 #. "Horizontal" topology with packets flowing within SUT from NIC(s) via
 175    VPP DUT to Container, then via "horizontal" memif to the next
 176    Container, and so on and so forth until the last Container, then
 177    back to VPP DUT and out thru NIC(s).
 178
 179 For each of the above topologies, VPP DUT is tested in a range of L2
 180 or IPv4/IPv6 configurations depending on the test suite. Sample VPP DUT
 181 "Chained" Container service topologies for 2-Node and 3-Node testbeds
 182 with each SUT running N of Container instances is shown in the figures
 183 below.
 184
 185 .. only:: latex
 186
 187     .. raw:: latex
 188
 189         \begin{figure}[H]
 190         \centering
 191             \includesvg[width=0.90\textwidth]{../_tmp/src/vpp_performance_tests/logical-2n-container-memif}
 192             \label{fig:logical-2n-container-memif}
 193         \end{figure}
 194
 195 .. only:: html
 196
 197     .. figure:: logical-2n-container-memif.svg
 198         :alt: logical-2n-container-memif
 199         :align: center
 200
 201
 202 .. only:: latex
 203
 204     .. raw:: latex
 205
 206         \begin{figure}[H]
 207         \centering
 208             \includesvg[width=0.90\textwidth]{../_tmp/src/vpp_performance_tests/logical-3n-container-memif}
 209             \label{fig:logical-3n-container-memif}
 210         \end{figure}
 211
 212 .. only:: html
 213
 214     .. figure:: logical-3n-container-memif.svg
 215         :alt: logical-3n-container-memif
 216         :align: center
 217
 218 In "Chained" Container topologies, packets are switched by VPP DUT
 219 multiple times: twice for a single Container, three times for two
 220 Containers, N+1 times for N Containers. Hence the external throughput
 221 rates measured by TG and listed in this report must be multiplied by N+1
 222 to represent the actual VPP DUT aggregate packet forwarding rate.
 223
 224 For a "Parallel" and "Horizontal" service topologies packets are always
 225 switched by VPP DUT twice per service chain.
 226
 227 Note that reported VPP DUT performance results are specific to the SUTs
 228 tested. SUTs with other processor than the ones used in FD.io lab are
 229 likely to yield different results. Similarly to NIC-to-NIC switching
 230 topology, here one can also expect the forwarding performance to be
 231 proportional to processor core frequency for the same processor
 232 architecture, assuming processor is the only limiting factor. However
 233 due to much higher dependency on intensive memory operations in
 234 Container service chained topologies and sensitivity to Linux scheduler
 235 settings and behaviour, this estimation may not always yield good enough
 236 accuracy.
 237
 238 Performance Tests Coverage
 239 --------------------------
 240
 241 Performance tests measure following metrics for tested VPP DUT
 242 topologies and configurations:
 243
 244 - Packet Throughput: measured in accordance with :rfc:`2544`, using
 245   FD.io CSIT Multiple Loss Ratio search (MLRsearch), an optimized binary
 246   search algorithm, producing throughput at different Packet Loss Ratio
 247   (PLR) values:
 248
 249   - Non Drop Rate (NDR): packet throughput at PLR=0%.
 250   - Partial Drop Rate (PDR): packet throughput at PLR=0.5%.
 251
 252 - One-Way Packet Latency: measured at different offered packet loads:
 253
 254   - 100% of discovered NDR throughput.
 255   - 100% of discovered PDR throughput.
 256
 257 - Maximum Receive Rate (MRR): measure packet forwarding rate under the
 258   maximum load offered by traffic generator over a set trial duration,
 259   regardless of packet loss. Maximum load for specified Ethernet frame
 260   size is set to the bi-directional link rate.
 261
 262 CSIT |release| includes following performance test areas covered across
 263 a range of NIC drivers and NIC models:
 264
 265 +-----------------------+----------------------------------------------+
 266 | Test Area             |  Description                                 |
 267 +=======================+==============================================+
 268 | ACL                   | L2 Bridge-Domain switching and               |
 269 |                       | IPv4and IPv6 routing with iACL and oACL IP   |
 270 |                       | address, MAC address and L4 port security.   |
 271 +-----------------------+----------------------------------------------+
 272 | COP                   | IPv4 and IPv6 routing with COP address       |
 273 |                       | security.                                    |
 274 +-----------------------+----------------------------------------------+
 275 | IPv4                  | IPv4 routing.                                |
 276 +-----------------------+----------------------------------------------+
 277 | IPv6                  | IPv6 routing.                                |
 278 +-----------------------+----------------------------------------------+
 279 | IPv4 Scale            | IPv4 routing with 20k, 200k and 2M FIB       |
 280 |                       | entries.                                     |
 281 +-----------------------+----------------------------------------------+
 282 | IPv6 Scale            | IPv6 routing with 20k, 200k and 2M FIB       |
 283 |                       | entries.                                     |
 284 +-----------------------+----------------------------------------------+
 285 | IPSecHW               | IPSec encryption with AES-GCM, CBC-SHA1      |
 286 |                       | ciphers, in combination with IPv4 routing.   |
 287 |                       | Intel QAT HW acceleration.                   |
 288 +-----------------------+----------------------------------------------+
 289 | IPSec+LISP            | IPSec encryption with CBC-SHA1 ciphers, in   |
 290 |                       | combination with LISP-GPE overlay tunneling  |
 291 |                       | for IPv4-over-IPv4.                          |
 292 +-----------------------+----------------------------------------------+
 293 | IPSecSW               | IPSec encryption with AES-GCM, CBC-SHA1      |
 294 |                       | ciphers, in combination with IPv4 routing.   |
 295 +-----------------------+----------------------------------------------+
 296 | K8s Containers Memif  | K8s orchestrated container VPP service chain |
 297 |                       | topologies connected over the memif virtual  |
 298 |                       | interface.                                   |
 299 +-----------------------+----------------------------------------------+
 300 | KVM VMs vhost-user    | Virtual topologies with service              |
 301 |                       | chains of 1 and 2 VMs using vhost-user       |
 302 |                       | interfaces, with different VPP forwarding    |
 303 |                       | modes incl. L2XC, L2BD, VXLAN with L2BD,     |
 304 |                       | IPv4 routing.                                |
 305 +-----------------------+----------------------------------------------+
 306 | L2BD                  | L2 Bridge-Domain switching of untagged       |
 307 |                       | Ethernet frames with MAC learning; disabled  |
 308 |                       | MAC learning i.e. static MAC tests to be     |
 309 |                       | added.                                       |
 310 +-----------------------+----------------------------------------------+
 311 | L2BD Scale            | L2 Bridge-Domain switching of untagged       |
 312 |                       | Ethernet frames with MAC learning; disabled  |
 313 |                       | MAC learning i.e. static MAC tests to be     |
 314 |                       | added with 20k, 200k and 2M FIB entries.     |
 315 +-----------------------+----------------------------------------------+
 316 | L2XC                  | L2 Cross-Connect switching of untagged,      |
 317 |                       | dot1q, dot1ad VLAN tagged Ethernet frames.   |
 318 +-----------------------+----------------------------------------------+
 319 | LISP                  | LISP overlay tunneling for IPv4-over-IPv4,   |
 320 |                       | IPv6-over-IPv4, IPv6-over-IPv6,              |
 321 |                       | IPv4-over-IPv6 in IPv4 and IPv6 routing      |
 322 |                       | modes.                                       |
 323 +-----------------------+----------------------------------------------+
 324 | LXC/DRC Containers    | Container VPP memif virtual interface tests  |
 325 | Memif                 | with different VPP forwarding modes incl.    |
 326 |                       | L2XC, L2BD.                                  |
 327 +-----------------------+----------------------------------------------+
 328 | NAT                   | (Source) Network Address Translation tests   |
 329 |                       | with varying number of users and ports per   |
 330 |                       | user.                                        |
 331 +-----------------------+----------------------------------------------+
 332 | QoS Policer           | Ingress packet rate measuring, marking and   |
 333 |                       | limiting (IPv4).                             |
 334 +-----------------------+----------------------------------------------+
 335 | SRv6 Routing          | Segment Routing IPv6 tests.                  |
 336 +-----------------------+----------------------------------------------+
 337 | VPP TCP/IP stack      | Tests of VPP TCP/IP stack used with VPP      |
 338 |                       | built-in HTTP server.                        |
 339 +-----------------------+----------------------------------------------+
 340 | VXLAN                 | VXLAN overlay tunnelling integration with    |
 341 |                       | L2XC and L2BD.                               |
 342 +-----------------------+----------------------------------------------+
 343
 344 Execution of performance tests takes time, especially the throughput
 345 tests. Due to limited HW testbed resources available within FD.io labs
 346 hosted by :abbr:`LF (Linux Foundation)`, the number of tests for some
 347 NIC models has been limited to few baseline tests.
 348
 349 Performance Tests Naming
 350 ------------------------
 351
 352 FD.io CSIT |release| follows a common structured naming convention for
 353 all performance and system functional tests, introduced in CSIT rls1701.
 354
 355 The naming should be intuitive for majority of the tests. Complete
 356 description of FD.io CSIT test naming convention is provided on
 357 :ref:`csit_test_naming`.