X-Git-Url: https://gerrit.fd.io/r/gitweb?a=blobdiff_plain;f=docs%2Fcontent%2Fmethodology%2Fdut_state_considerations.md;fp=docs%2Fcontent%2Fmethodology%2Fdut_state_considerations.md;h=bfd7ef6977bddf6e41ba29b755ba8c81e20cb9ff;hb=22999c2df14eb455080ff0a09bf93dc795a4049f;hp=0000000000000000000000000000000000000000;hpb=2986c774cd6520cab7e7e380e1511d521e8afe04;p=csit.git diff --git a/docs/content/methodology/dut_state_considerations.md b/docs/content/methodology/dut_state_considerations.md new file mode 100644 index 0000000000..bfd7ef6977 --- /dev/null +++ b/docs/content/methodology/dut_state_considerations.md @@ -0,0 +1,149 @@ +--- +bookToc: false +title: "DUT state considerations" +weight: 6 +--- + +# DUT state considerations + +This page discusses considerations for Device Under Test (DUT) state. +DUTs such as VPP require configuration, to be provided before the aplication +starts (via config files) or just after it starts (via API or CLI access). + +During operation DUTs gather various telemetry data, depending on configuration. +This internal state handling is part of normal operation, +so any performance impact is included in the test results. +Accessing telemetry data is additional load on DUT, +so we are not doing that in main trial measurements that affect results, +but we include separate trials specifically for gathering runtime telemetry. + +But there is one kind of state that needs specific handling. +This kind of DUT state is dynamically created based on incoming traffic, +it affects how DUT handles the traffic, and (unlike telemetry counters) +it has uneven impact on CPU load. +Typical example is NAT, where detecting new sessions takes more CPU than +forwarding packet on existing (open or recently closed) sessions. +We call DUT configurations with this kind of state "stateful", +and configurations without them "stateless". +(Even though stateless configurations contain state described in previous +paragraphs, and some configuration items may have "stateful" in their name, +such as stateful ACLs.) + +# Stateful DUT configurations + +Typically, the level of CPU impact of traffic depends on DUT state. +The first packets causing DUT state to change have higher impact, +subsequent packets matching that state have lower impact. + +From performance point of view, this is similar to traffic phases +for stateful protocols, see +[NGFW draft](https://tools.ietf.org/html/draft-ietf-bmwg-ngfw-performance-05#section-4.3.4). +In CSIT we borrow the terminology (even if it does not fit perfectly, +see discussion below). Ramp-up traffic causes the state change, +sustain traffic does not change the state. + +As the performance is different, each test has to choose which traffic +it wants to test, and manipulate the DUT state to achieve the intended impact. + +## Ramp-up trial + +Tests aiming at sustain performance need to make sure DUT state is created. +We achieve this via a ramp-up trial, specific purpose of which +is to create the state. + +Subsequent trials need no specific handling, as long as the state +remains the same. But some state can time-out, so additional ramp-up +trials are inserted whenever the code detects the state can time-out. +Note that a trial with zero loss refreshes the state, +so only the time since the last non-zero loss trial is tracked. + +For the state to be set completely, it is important both DUT and TG +do not lose any packets. We achieve this by setting the profile multiplier +(TPS from now on) to low enough value. + +It is also important each state-affecting packet is sent. +For size-limited traffic profile it is guaranteed by the size limit. +For continuous traffic, we set a long enough duration (based on TPS). + +At the end of the ramp-up trial, we check DUT state to confirm +it has been created as expected. +Test fails if the state is not (completely) created. + +## State Reset + +Tests aiming at ramp-up performance do not use ramp-up trial, +and they need to reset the DUT state before each trial measurement. +The way of resetting the state depends on test, +usually an API call is used to partially de-configure +the part that holds the state, and then re-configure it back. + +In CSIT we control the DUT state behavior via a test variable "resetter". +If it is not set, DUT state is not reset. +If it is set, each search algorithm (including MRR) will invoke it +before all trial measurements (both main and telemetry ones). +Any configuration keyword enabling a feature with DUT state +will check whether a test variable for ramp-up rate is present. +If it is present, resetter is not set. +If it is not present, the keyword sets the apropriate resetter value. +This logic makes sure either ramp-up or state reset are used. + +Notes: If both ramp-up and state reset were used, the DUT behavior +would be identical to just reset, while test would take longer to execute. +If neither were used, DUT will show different performance in subsequent trials, +violating assumptions of search algorithms. + +## DUT versus protocol ramp-up + +There are at least three different causes for bandwidth possibly increasing +within a single measurement trial. + +The first is DUT switching from state modification phase to constant phase, +it is the primary focus of this document. +Using ramp-up traffic before main trials eliminates this cause +for tests wishing to measure the performance of the next phase. +Using size-limited profiles eliminates the next phase +for tests wishing to measure performance of this phase. + +The second is protocol such as TCP ramping up their throughput to utilize +the bandwidth available. This is the original meaning of "ramp up" +in the NGFW draft (see above). +In existing tests we are not using this meaning of TCP ramp-up. +Instead we use only small transactions, and large enough initial window +so TCP acts as ramped-up already. + +The third is TCP increasing offered load due to retransmissions triggered by +packet loss. In CSIT we again try to avoid this behavior +by using small enough data to transfer, so overlap of multiple transactions +(primary cause of packet loss) is unlikely. +But in MRR tests, packet loss and non-constant offered load are still expected. + +# Stateless DUT configuratons + +These are simple configurations, which do not set any resetter value +(even if ramp-up duration is not configured). +Majority of existing tests are of this type, using continuous traffic profiles. + +In order to identify limits of Trex performance, +we have added suites with stateless DUT configuration (VPP ip4base) +subjected to size-limited ASTF traffic. +The discovered rates serve as a basis of comparison +for evaluating the results for stateful DUT configurations (VPP NAT44ed) +subjected to the same traffic profiles. + +# DUT versus TG state + +Traffic Generator profiles can be stateful (ASTF) or stateless (STL). +DUT configuration can be stateful or stateless (with respect to packet traffic). + +In CSIT we currently use all four possible configurations: + +- Regular stateless VPP tests use stateless traffic profiles. + +- Stateless VPP configuration with stateful profile is used as a base for + comparison. + +- Some stateful DUT configurations (NAT44DET, NAT44ED unidirectional) + are tested using stateless traffic profiles and continuous traffic. + +- The rest of stateful DUT configurations (NAT44ED bidirectional) + are tested using stateful traffic profiles and size limited traffic.