From: Vratko Polak <vrpolak@cisco.com>
Date: Mon, 16 Sep 2019 16:34:29 +0000 (+0200)
Subject: Report: Add methodology for reconf tests
X-Git-Url: https://gerrit.fd.io/r/gitweb?p=csit.git;a=commitdiff_plain;h=34d060b123d21c0c51efa2438368d509a9411d95

Report: Add methodology for reconf tests

Change-Id: I833d63f4e305ba60c94aa16cbb3a31a52ee3b3bb
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
(cherry picked from commit abca97bfc198c4f947571462cc57732d187c3c20)
---

diff --git a/docs/report/introduction/methodology_data_plane_throughput/index.rst b/docs/report/introduction/methodology_data_plane_throughput/index.rst
index f87283810f..71af4a9add 100644
--- a/docs/report/introduction/methodology_data_plane_throughput/index.rst
+++ b/docs/report/introduction/methodology_data_plane_throughput/index.rst
@@ -7,4 +7,4 @@ Data Plane Throughput
     methodology_mlrsearch_tests
     methodology_mrr_throughput
     methodology_plrsearch
-
+    methodology_reconf
diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_reconf.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_reconf.rst
new file mode 100644
index 0000000000..8922b32b22
--- /dev/null
+++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_reconf.rst
@@ -0,0 +1,72 @@
+.. _reconf_tests:
+
+Reconf Tests
+^^^^^^^^^^^^
+
+Overview
+~~~~~~~~
+
+Reconf tests are designed to measure the impact of VPP re-configuration
+on data plane traffic.
+While VPP takes some measures against the traffic being
+entirely stopped for a prolonged time,
+the immediate forwarding rate varies during the re-configuration,
+as some configurations steps need the active dataplane worker threads
+to be stopped temporarily.
+
+As the usual methods of measuring throughput need multiple trial measurements
+with somewhat long durations, and the re-configuration process can also be long,
+finding an offered load which would result in zero loss
+during the re-configuration process would be time-consuming.
+
+Instead, reconf tests find a througput value (lower bound for NDR)
+without re-configuration, and then maintain that ofered load
+during re-configuration. The measured loss count is then assumed to be caused
+by the re-configuration process. The result published by reconf tests
+is the effective blocked time, that is
+the loss count divided by the offered load.
+
+Current Implementation
+~~~~~~~~~~~~~~~~~~~~~~
+
+Each reconf suite is based on a similar MLRsearch performance suite.
+
+MLRsearch parameters are changed to speed up the throughput discovery.
+For example, PDR is not searched for, and final trial duration is shorter.
+
+The MLRsearch suite has to contain a configuration parameter
+that can be scaled up, e.g. number of routes or number of service chains.
+Currently, only increasing the scale is supported
+as the re-configuration operation. In future, scale decrease
+or other operations can be implemented.
+
+The traffic profile is not changed, so the traffic present is processed
+only by the smaller scale configuration. The added routes / chains
+are not targetted by the traffic.
+
+For the re-configuration, the same Robot Framework and Python libraries
+are used, as were used in the initial configuration, with the exception
+of the final calls that do not interact with VPP (e.g. starting
+virtual machines) being skipped to reduce the test overall duration.
+
+Discussion
+~~~~~~~~~~
+
+Robot Framework introduces a certain overhead, which may affect timing
+of individual VPP API calls, which in turn may affect
+the number of packets lost.
+
+The exact calls executed may contain unnecessary info dumps, repeated commands,
+or commands which change a value that do not need to be changed (e.g. MTU).
+Thus, implementation details are affecting the results, even if their effect
+on the corresponding MLRsearch suite is negligible.
+
+The lower bound for NDR is the only value safe to be used when zero packets lost
+are expected without re-configuration. But different suites show different
+"jitter" in that value. For some suites, the lower bound is not tight,
+allowing full NIC buffers to drain quickly between worker pauses.
+For other suites, lower bound for NDR still has quite a large probability
+of non-zero packet loss even without re-configuration.
+
+But the results show very high effective blocked time,
+so the two objections related to NDR lower bound are negligible in comparison.
diff --git a/docs/report/vpp_performance_tests/nf_service_density/vnf_service_chains_reconf.rst b/docs/report/vpp_performance_tests/nf_service_density/vnf_service_chains_reconf.rst
index 9c52ae7d17..66b1af15d3 100644
--- a/docs/report/vpp_performance_tests/nf_service_density/vnf_service_chains_reconf.rst
+++ b/docs/report/vpp_performance_tests/nf_service_density/vnf_service_chains_reconf.rst
@@ -33,6 +33,12 @@
 Reconfiguration of VNF Service Chains
 =====================================
 
+See :ref:`reconf_tests` for methodology description of this test type.
+
+In each test, a single service chain is added, the re-configuration
+contains all the steps the initial chains got, except the last step
+(starting VMs) is skipped.
+
 Additional information about graph data:
 
 #. **Graph Title**: describes tested packet path including VNF workload