docs/report/vpp_performance_tests/overview.rst

   1 Overview
   2 ========
   3
   4 VPP performance test results are reported for all three physical testbed
   5 types present in FD.io labs: 3-Node Xeon Haswell (3n-hsw), 3-Node Xeon
   6 Skylake (3n-skx), 2-Node Xeon Skylake (2n-skx) and installed NIC models.
   7 For description of physical testbeds used for VPP performance tests
   8 please refer to :ref:`tested_physical_topologies`.
   9
  10 .. _tested_logical_topologies:
  11
  12 Logical Topologies
  13 ------------------
  14
  15 CSIT VPP performance tests are executed on physical testbeds described
  16 in :ref:`tested_physical_topologies`. Based on the packet path thru
  17 server SUTs, three distinct logical topology types are used for VPP DUT
  18 data plane testing:
  19
  20 #. NIC-to-NIC switching topologies.
  21 #. VM service switching topologies.
  22 #. Container service switching topologies.
  23
  24 NIC-to-NIC Switching
  25 ~~~~~~~~~~~~~~~~~~~~
  26
  27 The simplest logical topology for software data plane application like
  28 VPP is NIC-to-NIC switching. Tested topologies for 2-Node and 3-Node
  29 testbeds are shown in figures below.
  30
  31 .. only:: latex
  32
  33     .. raw:: latex
  34
  35         \begin{figure}[H]
  36             \centering
  37                 \graphicspath{{../_tmp/src/vpp_performance_tests/}}
  38                 \includegraphics[width=0.90\textwidth]{logical-2n-nic2nic}
  39                 \label{fig:logical-2n-nic2nic}
  40         \end{figure}
  41
  42 .. only:: html
  43
  44     .. figure:: logical-2n-nic2nic.svg
  45         :alt: logical-2n-nic2nic
  46         :align: center
  47
  48
  49 .. only:: latex
  50
  51     .. raw:: latex
  52
  53         \begin{figure}[H]
  54             \centering
  55                 \graphicspath{{../_tmp/src/vpp_performance_tests/}}
  56                 \includegraphics[width=0.90\textwidth]{logical-3n-nic2nic}
  57                 \label{fig:logical-3n-nic2nic}
  58         \end{figure}
  59
  60 .. only:: html
  61
  62     .. figure:: logical-3n-nic2nic.svg
  63         :alt: logical-3n-nic2nic
  64         :align: center
  65
  66 Server Systems Under Test (SUT) run VPP application in Linux user-mode
  67 as a Device Under Test (DUT). Server Traffic Generator (TG) runs T-Rex
  68 application. Physical connectivity between SUTs and TG is provided using
  69 different drivers and NIC models that need to be tested for performance
  70 (packet/bandwidth throughput and latency).
  71
  72 From SUT and DUT perspectives, all performance tests involve forwarding
  73 packets between two (or more) physical Ethernet ports (10GE, 25GE, 40GE,
  74 100GE). In most cases both physical ports on SUT are located on the same
  75 NIC. The only exceptions are link bonding and 100GE tests. In the latter
  76 case only one port per NIC can be driven at linerate due to PCIe Gen3
  77 x16 slot bandwidth limiations. 100GE NICs are not supported in PCIe Gen3
  78 x8 slots.
  79
  80 Note that reported VPP DUT performance results are specific to the SUTs
  81 tested. SUTs with other processors than the ones used in FD.io lab are
  82 likely to yield different results. A good rule of thumb, that can be
  83 applied to estimate VPP packet thoughput for NIC-to-NIC switching
  84 topology, is to expect the forwarding performance to be proportional to
  85 processor core frequency for the same processor architecture, assuming
  86 processor is the only limiting factor and all other SUT parameters are
  87 equivalent to FD.io CSIT environment.
  88
  89 VM Service Switching
  90 ~~~~~~~~~~~~~~~~~~~~
  91
  92 VM service switching topology test cases require VPP DUT to communicate
  93 with Virtual Machines (VMs) over vhost-user virtual interfaces.
  94
  95 Two types of VM service topologies are tested in |csit-release|:
  96
  97 #. "Parallel" topology with packets flowing within SUT from NIC(s) via
  98    VPP DUT to VM, back to VPP DUT, then out thru NIC(s).
  99
 100 #. "Chained" topology (a.k.a. "Snake") with packets flowing within SUT
 101    from NIC(s) via VPP DUT to VM, back to VPP DUT, then to the next VM,
 102    back to VPP DUT and so on and so forth until the last VM in a chain,
 103    then back to VPP DUT and out thru NIC(s).
 104
 105 For each of the above topologies, VPP DUT is tested in a range of L2
 106 or IPv4/IPv6 configurations depending on the test suite. Sample VPP DUT
 107 "Chained" VM service topologies for 2-Node and 3-Node testbeds with each
 108 SUT running N of VM instances is shown in the figures below.
 109
 110 .. only:: latex
 111
 112     .. raw:: latex
 113
 114         \begin{figure}[H]
 115             \centering
 116                 \graphicspath{{../_tmp/src/vpp_performance_tests/}}
 117                 \includegraphics[width=0.90\textwidth]{logical-2n-vm-vhost}
 118                 \label{fig:logical-2n-vm-vhost}
 119         \end{figure}
 120
 121 .. only:: html
 122
 123     .. figure:: logical-2n-vm-vhost.svg
 124         :alt: logical-2n-vm-vhost
 125         :align: center
 126
 127
 128 .. only:: latex
 129
 130     .. raw:: latex
 131
 132         \begin{figure}[H]
 133             \centering
 134                 \graphicspath{{../_tmp/src/vpp_performance_tests/}}
 135                 \includegraphics[width=0.90\textwidth]{logical-3n-vm-vhost}
 136                 \label{fig:logical-3n-vm-vhost}
 137         \end{figure}
 138
 139 .. only:: html
 140
 141     .. figure:: logical-3n-vm-vhost.svg
 142         :alt: logical-3n-vm-vhost
 143         :align: center
 144
 145 In "Chained" VM topologies, packets are switched by VPP DUT multiple
 146 times: twice for a single VM, three times for two VMs, N+1 times for N
 147 VMs. Hence the external throughput rates measured by TG and listed in
 148 this report must be multiplied by N+1 to represent the actual VPP DUT
 149 aggregate packet forwarding rate.
 150
 151 For "Parallel" service topology packets are always switched twice by VPP
 152 DUT per service chain.
 153
 154 Note that reported VPP DUT performance results are specific to the SUTs
 155 tested. SUTs with other processor than the ones used in FD.io lab are
 156 likely to yield different results. Similarly to NIC-to-NIC switching
 157 topology, here one can also expect the forwarding performance to be
 158 proportional to processor core frequency for the same processor
 159 architecture, assuming processor is the only limiting factor. However
 160 due to much higher dependency on intensive memory operations in VM
 161 service chained topologies and sensitivity to Linux scheduler settings
 162 and behaviour, this estimation may not always yield good enough
 163 accuracy.
 164
 165 Container Service Switching
 166 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 167
 168 Container service switching topology test cases require VPP DUT to
 169 communicate with Containers (Ctrs) over memif virtual interfaces.
 170
 171 Three types of VM service topologies are tested in |csit-release|:
 172
 173 #. "Parallel" topology with packets flowing within SUT from NIC(s) via
 174    VPP DUT to Container, back to VPP DUT, then out thru NIC(s).
 175
 176 #. "Chained" topology (a.k.a. "Snake") with packets flowing within SUT
 177    from NIC(s) via VPP DUT to Container, back to VPP DUT, then to the
 178    next Container, back to VPP DUT and so on and so forth until the
 179    last Container in a chain, then back to VPP DUT and out thru NIC(s).
 180
 181 #. "Horizontal" topology with packets flowing within SUT from NIC(s) via
 182    VPP DUT to Container, then via "horizontal" memif to the next
 183    Container, and so on and so forth until the last Container, then
 184    back to VPP DUT and out thru NIC(s).
 185
 186 For each of the above topologies, VPP DUT is tested in a range of L2
 187 or IPv4/IPv6 configurations depending on the test suite. Sample VPP DUT
 188 "Chained" Container service topologies for 2-Node and 3-Node testbeds
 189 with each SUT running N of Container instances is shown in the figures
 190 below.
 191
 192 .. only:: latex
 193
 194     .. raw:: latex
 195
 196         \begin{figure}[H]
 197             \centering
 198                 \graphicspath{{../_tmp/src/vpp_performance_tests/}}
 199                 \includegraphics[width=0.90\textwidth]{logical-2n-container-memif}
 200                 \label{fig:logical-2n-container-memif}
 201         \end{figure}
 202
 203 .. only:: html
 204
 205     .. figure:: logical-2n-container-memif.svg
 206         :alt: logical-2n-container-memif
 207         :align: center
 208
 209
 210 .. only:: latex
 211
 212     .. raw:: latex
 213
 214         \begin{figure}[H]
 215             \centering
 216                 \graphicspath{{../_tmp/src/vpp_performance_tests/}}
 217                 \includegraphics[width=0.90\textwidth]{logical-3n-container-memif}
 218                 \label{fig:logical-3n-container-memif}
 219         \end{figure}
 220
 221 .. only:: html
 222
 223     .. figure:: logical-3n-container-memif.svg
 224         :alt: logical-3n-container-memif
 225         :align: center
 226
 227 In "Chained" Container topologies, packets are switched by VPP DUT
 228 multiple times: twice for a single Container, three times for two
 229 Containers, N+1 times for N Containers. Hence the external throughput
 230 rates measured by TG and listed in this report must be multiplied by N+1
 231 to represent the actual VPP DUT aggregate packet forwarding rate.
 232
 233 For a "Parallel" and "Horizontal" service topologies packets are always
 234 switched by VPP DUT twice per service chain.
 235
 236 Note that reported VPP DUT performance results are specific to the SUTs
 237 tested. SUTs with other processor than the ones used in FD.io lab are
 238 likely to yield different results. Similarly to NIC-to-NIC switching
 239 topology, here one can also expect the forwarding performance to be
 240 proportional to processor core frequency for the same processor
 241 architecture, assuming processor is the only limiting factor. However
 242 due to much higher dependency on intensive memory operations in
 243 Container service chained topologies and sensitivity to Linux scheduler
 244 settings and behaviour, this estimation may not always yield good enough
 245 accuracy.
 246
 247 Performance Tests Coverage
 248 --------------------------
 249
 250 Performance tests measure following metrics for tested VPP DUT
 251 topologies and configurations:
 252
 253 - Packet Throughput: measured in accordance with :rfc:`2544`, using
 254   FD.io CSIT Multiple Loss Ratio search (MLRsearch), an optimized binary
 255   search algorithm, producing throughput at different Packet Loss Ratio
 256   (PLR) values:
 257
 258   - Non Drop Rate (NDR): packet throughput at PLR=0%.
 259   - Partial Drop Rate (PDR): packet throughput at PLR=0.5%.
 260
 261 - One-Way Packet Latency: measured at different offered packet loads:
 262
 263   - 100% of discovered NDR throughput.
 264   - 100% of discovered PDR throughput.
 265
 266 - Maximum Receive Rate (MRR): measure packet forwarding rate under the
 267   maximum load offered by traffic generator over a set trial duration,
 268   regardless of packet loss. Maximum load for specified Ethernet frame
 269   size is set to the bi-directional link rate.
 270
 271 |csit-release| includes following VPP data plane functionality
 272 performance tested across a range of NIC drivers and NIC models:
 273
 274 +-----------------------+----------------------------------------------+
 275 | Functionality         |  Description                                 |
 276 +=======================+==============================================+
 277 | ACL                   | L2 Bridge-Domain switching and               |
 278 |                       | IPv4and IPv6 routing with iACL and oACL IP   |
 279 |                       | address, MAC address and L4 port security.   |
 280 +-----------------------+----------------------------------------------+
 281 | COP                   | IPv4 and IPv6 routing with COP address       |
 282 |                       | security.                                    |
 283 +-----------------------+----------------------------------------------+
 284 | IPv4                  | IPv4 routing.                                |
 285 +-----------------------+----------------------------------------------+
 286 | IPv6                  | IPv6 routing.                                |
 287 +-----------------------+----------------------------------------------+
 288 | IPv4 Scale            | IPv4 routing with 20k, 200k and 2M FIB       |
 289 |                       | entries.                                     |
 290 +-----------------------+----------------------------------------------+
 291 | IPv6 Scale            | IPv6 routing with 20k, 200k and 2M FIB       |
 292 |                       | entries.                                     |
 293 +-----------------------+----------------------------------------------+
 294 | IPSecHW               | IPSec encryption with AES-GCM, CBC-SHA-256   |
 295 |                       | ciphers, in combination with IPv4 routing.   |
 296 |                       | Intel QAT HW acceleration.                   |
 297 +-----------------------+----------------------------------------------+
 298 | IPSec+LISP            | IPSec encryption with CBC-SHA1 ciphers, in   |
 299 |                       | combination with LISP-GPE overlay tunneling  |
 300 |                       | for IPv4-over-IPv4.                          |
 301 +-----------------------+----------------------------------------------+
 302 | IPSecSW               | IPSec encryption with AES-GCM, CBC-SHA-256   |
 303 |                       | ciphers, in combination with IPv4 routing.   |
 304 +-----------------------+----------------------------------------------+
 305 | KVM VMs vhost-user    | Virtual topologies with service              |
 306 |                       | chains of 1 VM using vhost-user              |
 307 |                       | interfaces, with different VPP forwarding    |
 308 |                       | modes incl. L2XC, L2BD, VXLAN with L2BD,     |
 309 |                       | IPv4 routing.                                |
 310 +-----------------------+----------------------------------------------+
 311 | L2BD                  | L2 Bridge-Domain switching of untagged       |
 312 |                       | Ethernet frames with MAC learning; disabled  |
 313 |                       | MAC learning i.e. static MAC tests to be     |
 314 |                       | added.                                       |
 315 +-----------------------+----------------------------------------------+
 316 | L2BD Scale            | L2 Bridge-Domain switching of untagged       |
 317 |                       | Ethernet frames with MAC learning; disabled  |
 318 |                       | MAC learning i.e. static MAC tests to be     |
 319 |                       | added with 20k, 200k and 2M FIB entries.     |
 320 +-----------------------+----------------------------------------------+
 321 | L2XC                  | L2 Cross-Connect switching of untagged,      |
 322 |                       | dot1q, dot1ad VLAN tagged Ethernet frames.   |
 323 +-----------------------+----------------------------------------------+
 324 | LISP                  | LISP overlay tunneling for IPv4-over-IPv4,   |
 325 |                       | IPv6-over-IPv4, IPv6-over-IPv6,              |
 326 |                       | IPv4-over-IPv6 in IPv4 and IPv6 routing      |
 327 |                       | modes.                                       |
 328 +-----------------------+----------------------------------------------+
 329 | LXC/DRC Containers    | Container VPP memif virtual interface tests  |
 330 | Memif                 | with different VPP forwarding modes incl.    |
 331 |                       | L2XC, L2BD.                                  |
 332 +-----------------------+----------------------------------------------+
 333 | NAT                   | (Source) Network Address Translation tests   |
 334 |                       | with varying number of users and ports per   |
 335 |                       | user.                                        |
 336 +-----------------------+----------------------------------------------+
 337 | QoS Policer           | Ingress packet rate measuring, marking and   |
 338 |                       | limiting (IPv4).                             |
 339 +-----------------------+----------------------------------------------+
 340 | SRv6 Routing          | Segment Routing IPv6 tests.                  |
 341 +-----------------------+----------------------------------------------+
 342 | VPP TCP/IP stack      | Tests of VPP TCP/IP stack used with VPP      |
 343 |                       | built-in HTTP server.                        |
 344 +-----------------------+----------------------------------------------+
 345 | VTS                   | Virtual Topology System use case tests       |
 346 |                       | combining VXLAN overlay tunneling with L2BD, |
 347 |                       | ACL and KVM VM vhost-user features.          |
 348 +-----------------------+----------------------------------------------+
 349 | VXLAN                 | VXLAN overlay tunnelling integration with    |
 350 |                       | L2XC and L2BD.                               |
 351 +-----------------------+----------------------------------------------+
 352
 353 Execution of performance tests takes time, especially the throughput
 354 tests. Due to limited HW testbed resources available within FD.io labs
 355 hosted by :abbr:`LF (Linux Foundation)`, the number of tests for some
 356 NIC models has been limited to few baseline tests.
 357
 358 Performance Tests Naming
 359 ------------------------
 360
 361 FD.io |csit-release| follows a common structured naming convention for
 362 all performance and system functional tests, introduced in CSIT-17.01.
 363
 364 The naming should be intuitive for majority of the tests. Complete
 365 description of FD.io CSIT test naming convention is provided on
 366 :ref:`csit_test_naming`.