docs/report/introduction/methodology_reconf.rst

   1 .. _reconf_tests:
   2
   3 Reconfiguration Tests
   4 ---------------------
   5
   6 .. important::
   7
   8     **DISCLAIMER**: Described reconf test methodology is experimental, and
   9     subject to change following consultation within csit-dev, vpp-dev
  10     and user communities. Current test results should be treated as indicative.
  11
  12 Overview
  13 ~~~~~~~~
  14
  15 Reconf tests are designed to measure the impact of VPP re-configuration
  16 on data plane traffic.
  17 While VPP takes some measures against the traffic being
  18 entirely stopped for a prolonged time,
  19 the immediate forwarding rate varies during the re-configuration,
  20 as some configurations steps need the active dataplane worker threads
  21 to be stopped temporarily.
  22
  23 As the usual methods of measuring throughput need multiple trial measurements
  24 with somewhat long durations, and the re-configuration process can also be long,
  25 finding an offered load which would result in zero loss
  26 during the re-configuration process would be time-consuming.
  27
  28 Instead, reconf tests first find a througput value (lower bound for NDR)
  29 without re-configuration, and then maintain that ofered load
  30 during re-configuration. The measured loss count is then assumed to be caused
  31 by the re-configuration process. The result published by reconf tests
  32 is the effective blocked time, that is
  33 the loss count divided by the offered load.
  34
  35 Current Implementation
  36 ~~~~~~~~~~~~~~~~~~~~~~
  37
  38 Each reconf suite is based on a similar MLRsearch performance suite.
  39
  40 MLRsearch parameters are changed to speed up the throughput discovery.
  41 For example, PDR is not searched for, and the final trial duration is shorter.
  42
  43 The MLRsearch suite has to contain a configuration parameter
  44 that can be scaled up, e.g. number of tunnels or number of service chains.
  45 Currently, only increasing the scale is supported
  46 as the re-configuration operation. In future, scale decrease
  47 or other operations can be implemented.
  48
  49 The traffic profile is not changed, so the traffic present is processed
  50 only by the smaller scale configuration. The added tunnels / chains
  51 are not targetted by the traffic.
  52
  53 For the re-configuration, the same Robot Framework and Python libraries
  54 are used, as were used in the initial configuration, with the exception
  55 of the final calls that do not interact with VPP (e.g. starting
  56 virtual machines) being skipped to reduce the test overall duration.
  57
  58 Discussion
  59 ~~~~~~~~~~
  60
  61 Robot Framework introduces a certain overhead, which may affect timing
  62 of individual VPP API calls, which in turn may affect
  63 the number of packets lost.
  64
  65 The exact calls executed may contain unnecessary info dumps, repeated commands,
  66 or commands which change a value that do not need to be changed (e.g. MTU).
  67 Thus, implementation details are affecting the results, even if their effect
  68 on the corresponding MLRsearch suite is negligible.
  69
  70 The lower bound for NDR is the only value safe to be used when zero packets lost
  71 are expected without re-configuration. But different suites show different
  72 "jitter" in that value. For some suites, the lower bound is not tight,
  73 allowing full NIC buffers to drain quickly between worker pauses.
  74 For other suites, lower bound for NDR still has quite a large probability
  75 of non-zero packet loss even without re-configuration.