report: Methodology section, added forwarded modes, tunnel encaps and features.

[csit.git] / docs / report / introduction / methodology.rst
diff --git a/docs/report/introduction/methodology.rst b/docs/report/introduction/methodology.rst

index ff5714c..16d3eda 100644 (file)
--- a/docs/report/introduction/methodology.rst
+++ b/docs/report/introduction/methodology.rst
@@ -1,26 +1,240 @@
-Performance Test Methodology
-============================
  
-Throughput
-----------
+.. _test_methodology:
  
-Packet and bandwidth throughput are measured in accordance with
-:rfc:`2544`, using FD.io CSIT Multiple Loss Ratio search (MLRsearch), an
-optimized binary search algorithm, that measures SUT/DUT throughput at
-different Packet Loss Ratio (PLR) values.
+Test Methodology
+================
+
+VPP Forwarding Modes
+--------------------
+
+VPP is tested in a number of L2 and IP packet lookup and forwarding
+modes. Within each mode baseline and scale tests are executed, the
+latter with varying number of lookup entries.
+
+L2 Ethernet Switching
+~~~~~~~~~~~~~~~~~~~~~
+
+VPP is tested in three L2 forwarding modes:
+
+- *l2patch*: L2 patch, the fastest point-to-point L2 path that loops
+  packets between two interfaces without any Ethernet frame checks or
+  lookups.
+- *l2xc*: L2 cross-connect, point-to-point L2 path with all Ethernet
+  frame checks, but no MAC learning and no MAC lookup.
+- *l2bd*: L2 bridge-domain, multipoint-to-multipoint L2 path with all
+  Ethernet frame checks, with MAC learning (unless static MACs are used)
+  and MAC lookup.
+
+l2bd tests are executed in baseline and scale configurations:
+
+- *l2bdbase*: low number of L2 flows (253 per direction) is switched by
+  VPP. They drive the content of MAC FIB size (506 total MAC entries).
+  Both source and destination MAC addresses are incremented on a packet
+  by packet basis.
+
+- *l2bdscale*: high number of L2 flows is switched by VPP. Tested MAC
+  FIB sizes include: i) 10k (5k unique flows per direction), ii) 100k
+  (2x 50k flows) and iii) 1M (2x 500k). Both source and destination MAC
+  addresses are incremented on a packet by packet basis, ensuring new
+  entries are learn refreshed and looked up at every packet, making it
+  the worst case scenario.
+
+Ethernet wire encapsulations tested include: untagged, dot1q, dot1ad.
+
+IPv4 Routing
+~~~~~~~~~~~~
+
+IPv4 routing tests are executed in baseline and scale configurations:
+
+- *ip4base*: low number of IPv4 flows (253 per direction) is routed by
+  VPP. They drive the content of IPv4 FIB size (506 total /32 prefixes).
+  Destination IPv4 addresses are incremented on a packet by packet
+  basis.
+
+- *ip4scale*: high number of IPv4 flows is routed by VPP. Tested IPv4
+  FIB sizes of /32 prefixes include: i) 20k (10k unique flows per
+  direction), ii) 200k (2x 100k flows) and iii) 2M (2x 1M). Destination
+  IPv4 addresses are incremented on a packet by packet basis, ensuring
+  new FIB entries are looked up at every packet, making it the worst
+  case scenario.
+
+IPv6 Routing
+~~~~~~~~~~~~
+
+IPv6 routing tests are executed in baseline and scale configurations:
+
+- *ip6base*: low number of IPv6 flows (253 per direction) is routed by
+  VPP. They drive the content of IPv6 FIB size (506 total /128 prefixes).
+  Destination IPv6 addresses are incremented on a packet by packet
+  basis.
+
+- *ip6scale*: high number of IPv6 flows is routed by VPP. Tested IPv6
+  FIB sizes of /128 prefixes include: i) 20k (10k unique flows per
+  direction), ii) 200k (2x 100k flows) and iii) 2M (2x 1M). Destination
+  IPv6 addresses are incremented on a packet by packet basis, ensuring
+  new FIB entries are looked up at every packet, making it the worst
+  case scenario.
+
+SRv6 Routing
+~~~~~~~~~~~~
+
+SRv6 routing tests are executed in a number of baseline configurations,
+in each case SR policy and steering policy are configured for one
+direction and one (or two) SR behaviours (functions) in the other
+directions:
+
+- *srv6enc1sid*: One SID (no SRH present), one SR function - End.
+- *srv6enc2sids*: Two SIDs (SRH present), two SR functions - End and
+  End.DX6.
+- *srv6enc2sids-nodecaps*: Two SIDs (SRH present) without decapsulation,
+  one SR function - End.
+- *srv6proxy-dyn*: Dynamic SRv6 proxy, one SR function - End.AD.
+- *srv6proxy-masq*: Masquerading SRv6 proxy, one SR function - End.AM.
+- *srv6proxy-stat*: Static SRv6 proxy, one SR function - End.AS.
+
+In all listed cases low number of IPv6 flows (253 per direction) is
+routed by VPP.
+
+Tunnel Encapsulations
+---------------------
+
+Tunnel encapsulations testing is grouped based on the type of outer
+header: IPv4 or IPv6.
+
+IPv4 Tunnels
+~~~~~~~~~~~~
+
+VPP is tested in the following IPv4 tunnel baseline configurations:
+
+- *ip4vxlan-l2bdbase*: VXLAN over IPv4 tunnels with L2 bridge-domain MAC
+  switching.
+- *ip4vxlan-l2xcbase*: VXLAN over IPv4 tunnels with L2 cross-connect.
+- *ip4lispip4-ip4base*: LISP over IPv4 tunnels with IPv4 routing.
+- *ip4lispip6-ip6base*: LISP over IPv4 tunnels with IPv6 routing.
+
+In all cases listed above low number of MAC, IPv4, IPv6 flows (253 per
+direction) is switched or routed by VPP.
+
+In addition selected IPv4 tunnels are tested at scale:
+
+- *dot1q--ip4vxlanscale-l2bd*: VXLAN over IPv4 tunnels with L2 bridge-
+  domain MAC switching, with scaled up dot1q VLANs (10, 100, 1k),
+  mapped to scaled up L2 bridge-domains (10, 100, 1k), that are in turn
+  mapped to (10, 100, 1k) VXLAN tunnels. 64.5k flows are transmitted per
+  direction.
+
+IPv6 Tunnels
+~~~~~~~~~~~~
+
+VPP is tested in the following IPv6 tunnel baseline configurations:
+
+- *ip6lispip4-ip4base*: LISP over IPv4 tunnels with IPv4 routing.
+- *ip6lispip6-ip6base*: LISP over IPv4 tunnels with IPv6 routing.
+
+In all cases listed above low number of IPv4, IPv6 flows (253 per
+direction) is routed by VPP.
+
+VPP Features
+------------
+
+VPP is tested in a number of data plane feature configurations across
+different forwarding modes. Following sections list features tested.
+
+ACL Security-Groups
+~~~~~~~~~~~~~~~~~~~
+
+Both stateless and stateful access control lists (ACL), also known as
+security-groups, are supported by VPP.
+
+Following ACL configurations are tested for MAC switching with L2
+bridge-domains:
+
+- *l2bdbasemaclrn-iacl{E}sl-{F}flows*: Input stateless ACL, with {E}
+  entries and {F} flows.
+- *l2bdbasemaclrn-oacl{E}sl-{F}flows*: Output stateless ACL, with {E}
+  entries and {F} flows.
+- *l2bdbasemaclrn-iacl{E}sf-{F}flows*: Input stateful ACL, with {E}
+  entries and {F} flows.
+- *l2bdbasemaclrn-oacl{E}sf-{F}flows*: Output stateful ACL, with {E}
+  entries and {F} flows.
+
+Following ACL configurations are tested with IPv4 routing:
+
+- *ip4base-iacl{E}sl-{F}flows*: Input stateless ACL, with {E} entries
+  and {F} flows.
+- *ip4base-oacl{E}sl-{F}flows*: Output stateless ACL, with {E} entries
+  and {F} flows.
+- *ip4base-iacl{E}sf-{F}flows*: Input stateful ACL, with {E} entries and
+  {F} flows.
+- *ip4base-oacl{E}sf-{F}flows*: Output stateful ACL, with {E} entries
+  and {F} flows.
+
+ACL tests are executed with the following combinations of ACL entries
+and number of flows:
+
+- ACL entry definitions
+
+  - flow non-matching deny entry: (src-ip4, dst-ip4, src-port, dst-port).
+  - flow matching permit ACL entry: (src-ip4, dst-ip4).
+
+- {E} - number of non-matching deny ACL entries, {E} = [1, 10, 50].
+- {F} - number of UDP flows with different tuple (src-ip4, dst-ip4,
+  src-port, dst-port), {F} = [100, 10k, 100k].
+- All {E}x{F} combinations are tested per ACL type, total of 9.
+
+ACL MAC-IP
+~~~~~~~~~~
+
+MAC-IP binding ACLs are tested for MAC switching with L2 bridge-domains:
+
+- *l2bdbasemaclrn-macip-iacl{E}sl-{F}flows*: Input stateless ACL, with
+  {E} entries and {F} flows.
+
+MAC-IP ACL tests are executed with the following combinations of ACL
+entries and number of flows:
+
+- ACL entry definitions
+
+  - flow non-matching deny entry: (dst-ip4, dst-mac, bit-mask)
+  - flow matching permit ACL entry: (dst-ip4, dst-mac, bit-mask)
+
+- {E} - number of non-matching deny ACL entries, {E} = [1, 10, 50]
+- {F} - number of UDP flows with different tuple (dst-ip4, dst-mac),
+  {F} = [100, 10k, 100k]
+- All {E}x{F} combinations are tested per ACL type, total of 9.
+
+NAT44
+~~~~~
+
+NAT44 is tested in baseline and scale configurations with IPv4 routing:
+
+- *ip4base-nat44*: baseline test with single NAT entry (addr, port),
+  single UDP flow.
+- *ip4base-udpsrcscale{U}-nat44*: baseline test with {U} NAT entries
+  (addr, {U}ports), {U}=15.
+- *ip4scale{R}-udpsrcscale{U}-nat44*: scale tests with {R}*{U} NAT
+  entries ({R}addr, {U}ports), {R}=[100, 1k, 2k, 4k], {U}=15.
+
+Data Plane Throughput
+---------------------
+
+Network data plane packet and bandwidth throughput are measured in
+accordance with :rfc:`2544`, using FD.io CSIT Multiple Loss Ratio search
+(MLRsearch), an optimized throughput search algorithm, that measures
+SUT/DUT packet throughput rates at different Packet Loss Ratio (PLR)
+values.
  
  Following MLRsearch values are measured across a range of L2 frame sizes
  and reported:
  
-- **Non Drop Rate (NDR)**: packet and bandwidth throughput at PLR=0%.
+- NON DROP RATE (NDR): packet and bandwidth throughput at PLR=0%.
  
    - **Aggregate packet rate**: NDR_LOWER <bi-directional packet rate>
      pps.
    - **Aggregate bandwidth rate**: NDR_LOWER <bi-directional bandwidth
      rate> Gbps.
  
-- **Partial Drop Rate (PDR)**: packet and bandwidth throughput at
-  PLR=0.5%.
+- PARTIAL DROP RATE (PDR): packet and bandwidth throughput at PLR=0.5%.
  
    - **Aggregate packet rate**: PDR_LOWER <bi-directional packet rate>
      pps.
@@ -30,20 +244,22 @@ and reported:
  NDR and PDR are measured for the following L2 frame sizes (untagged
  Ethernet):
  
-- IPv4 payload: 64B, IMIX_v4_1 (28x64B, 16x570B, 4x1518B), 1518B, 9000B.
-- IPv6 payload: 78B, 1518B, 9000B.
+- IPv4 payload: 64B, IMIX (28x64B, 16x570B, 4x1518B), 1518B, 9000B.
+- IPv6 payload: 78B, IMIX (28x78B, 16x570B, 4x1518B), 1518B, 9000B.
  
  All rates are reported from external Traffic Generator perspective.
  
  .. _mlrsearch_algorithm:
  
-MLRsearch Algorithm
--------------------
+MLRsearch Tests
+---------------
  
-Multiple Loss Rate search (MLRsearch) is a new search algorithm
+Multiple Loss Rate search (MLRsearch) tests use new search algorithm
  implemented in FD.io CSIT project. MLRsearch discovers multiple packet
  throughput rates in a single search, with each rate associated with a
-distinct Packet Loss Ratio (PLR) criteria.
+distinct Packet Loss Ratio (PLR) criteria. MLRsearch is being
+standardized in IETF with `draft-vpolak-mkonstan-mlrsearch-XX
+<https://tools.ietf.org/html/draft-vpolak-mkonstan-mlrsearch-00>`_.
  
  Two throughput measurements used in FD.io CSIT are Non-Drop Rate (NDR,
  with zero packet loss, PLR=0) and Partial Drop Rate (PDR, with packet
@@ -164,7 +380,7 @@ Input Parameters
     Default (2). (Value chosen based on limited experimentation to date.
     More experimentation needed to arrive to clearer guidelines.)
  
-Initial phase
+Initial Phase
  `````````````
  
  1. First trial measures at maximum rate and discovers MRR.
@@ -190,7 +406,7 @@ Initial phase
     c. *do*: single trial.
     d. *out*: measured loss ratio.
  
-Non-initial phases
+Non-initial Phases
  ``````````````````
  
  1. Main loop:
@@ -316,15 +532,22 @@ but without detailing their mutual interaction.
     than pure binary search, the implemented tests fail themselves
     when the search takes too long (given by argument *timeout*).
  
-Maximum Receive Rate MRR
-------------------------
+(B)MRR Throughput
+-----------------
  
-MRR tests measure the packet forwarding rate under the maximum
-load offered by traffic generator over a set trial duration,
+Maximum Receive Rate (MRR) tests are complementary to MLRsearch tests,
+as they provide a maximum "raw" throughput benchmark for development and
+testing community. MRR tests measure the packet forwarding rate under
+the maximum load offered by traffic generator over a set trial duration,
  regardless of packet loss. Maximum load for specified Ethernet frame
  size is set to the bi-directional link rate.
  
-Current parameters for MRR tests:
+In |csit-release| MRR test code has been updated with a configurable
+burst MRR parameters: trial duration and number of trials in a single
+burst. This enabled a new Burst MRR (BMRR) methodology for more precise
+performance trending.
+
+Current parameters for BMRR tests:
  
  - Ethernet frame sizes: 64B (78B for IPv6), IMIX, 1518B, 9000B; all
    quoted sizes include frame CRC, but exclude per frame transmission
@@ -344,17 +567,25 @@ Current parameters for MRR tests:
      XL710. Packet rate for other tested frame sizes is limited by PCIe
      Gen3 x8 bandwidth limitation of ~50Gbps.
  
-- Trial duration: 10sec.
+- Trial duration: 1 sec.
+
+- Number of trials per burst: 10.
  
  Similarly to NDR/PDR throughput tests, MRR test should be reporting bi-
  directional link rate (or NIC rate, if lower) if tested VPP
  configuration can handle the packet rate higher than bi-directional link
  rate, e.g. large packet tests and/or multi-core tests.
  
-MRR tests are used for continuous performance trending and for
-comparison between releases. Daily trending job tests subset of frame
-sizes, focusing on 64B (78B for IPv6) for all tests and IMIX for
-selected tests (vhost, memif).
+MRR tests are currently used for FD.io CSIT continuous performance
+trending and for comparison between releases. Daily trending job tests
+subset of frame sizes, focusing on 64B (78B for IPv6) for all tests and
+IMIX for selected tests (vhost, memif).
+
+MRR-like measurements are being used to establish starting conditions
+for experimental Probabilistic Loss Ratio Search (PLRsearch) used for
+soak testing, aimed at verifying continuous system performance over an
+extended period of time, hours, days, weeks, months. PLRsearch code is
+currently in experimental phase in FD.io CSIT project.
  
  Packet Latency
  --------------
@@ -508,7 +739,8 @@ following environment settings:
    [cfs,cfsrr1] settings.
  - Adjusted Linux kernel :abbr:`CFS (Completely Fair Scheduler)`
    scheduler policy for data plane threads used in CSIT is documented in
-  `CSIT Performance Environment Tuning wiki <https://wiki.fd.io/view/CSIT/csit-perf-env-tuning-ubuntu1604>`_.
+  `CSIT Performance Environment Tuning wiki
+  <https://wiki.fd.io/view/CSIT/csit-perf-env-tuning-ubuntu1604>`_.
  - The purpose is to verify performance impact (MRR and NDR/PDR
    throughput) and same test measurements repeatability, by making VPP
    and VM data plane threads less susceptible to other Linux OS system
@@ -560,6 +792,20 @@ VMs as described earlier in :ref:`tested_physical_topologies`.
  Further documentation is available in
  :ref:`container_orchestration_in_csit`.
  
+VPP_Device Functional
+---------------------
+
+|csit-release| added new VPP_Device test environment for functional VPP
+device tests integrated into LFN CI/CD infrastructure. VPP_Device tests
+run on 1-Node testbeds (1n-skx, 1n-arm) and rely on Linux SRIOV Virtual
+Function (VF), dot1q VLAN tagging and external loopback cables to
+facilitate packet passing over exernal physical links. Initial focus is
+on few baseline tests. Existing CSIT VIRL tests can be moved to
+VPP_Device framework by changing L1 and L2 KW(s). RF test definition
+code stays unchanged with the exception of requiring adjustments from
+3-Node to 2-Node logical topologies. CSIT VIRL to VPP_Device migration
+is expected in the next CSIT release.
+
  IPSec on Intel QAT
  ------------------
  
@@ -629,7 +875,7 @@ created (one for each direction) with TRex flow_stats parameter set to
  STLFlowLatencyStats. In that case, returned statistics will also include
  min/avg/max latency values.
  
-HTTP/TCP with WRK tool
+HTTP/TCP with WRK Tool
  ----------------------
  
  `WRK HTTP benchmarking tool <https://github.com/wg/wrk>`_ is used for