2 title: Multiple Loss Ratio Search
4 docname: draft-ietf-bmwg-mlrsearch-05
9 wg: Benchmarking Working Group
14 pi: # can use array (if all yes) or hash here
16 sortrefs: # defaults to yes
21 ins: M. Konstantynowicz
22 name: Maciek Konstantynowicz
24 email: mkonstan@cisco.com
29 email: vrpolak@cisco.com
39 target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf
42 target: https://csit.fd.io/cdocs/methodology/measurements/data_plane_throughput/mlr_search/
43 title: "FD.io CSIT Test Methodology - MLRsearch"
46 target: https://pypi.org/project/MLRsearch/0.4.1/
47 title: "MLRsearch 0.4.1, Python Package Index"
52 This document proposes extensions to [RFC2544] throughput search by
53 defining a new methodology called Multiple Loss Ratio search
54 (MLRsearch). The main objectives of MLRsearch are to minimize the
55 total search duration, to support searching for multiple loss ratios
56 and to improve results repeatability and comparability.
58 The main motivation behind extending [RFC2544] is the new set of challenges
59 and requirements posed by evaluating and testing software based networking
60 systems, specifically their data planes.
62 MLRsearch offers several ways to address these challenges, giving user
63 configuration options to select their preferred way.
68 As we use kramdown to convert from markdown,
69 we use this way of marking comments not to be visible in rendered draft.
70 https://stackoverflow.com/a/42323390
71 If other engine is used, convert to this way:
72 https://stackoverflow.com/a/20885980
77 The purpose of this document is to describe Multiple Loss Ratio search
78 (MLRsearch), a data plane throughput search methodology optimized for software
81 Applying vanilla [RFC2544] throughput bisection to software DUTs
82 results in a number of problems:
84 - Binary search takes too long as most of trials are done far from the
85 eventually found throughput.
86 - The required final trial duration and pauses between trials
87 prolong the overall search duration.
88 - Software DUTs show noisy trial results,
89 leading to a big spread of possible discovered throughput values.
90 - Throughput requires loss of exactly zero frames, but the industry
91 frequently allows for small but non-zero losses.
92 - The definition of throughput is not clear when trial results are inconsistent.
94 MLRsearch library aims to address these problems by applying the following set
97 - Allow multiple shorter trials instead of one big trial per load.
98 - Optionally, tolerate a percentage of trial result with higher loss.
99 - Allow searching for multiple search goals, with differing loss ratios.
100 - Any trial result can affect each search goal in principle.
101 - Insert multiple coarse targets for each search goal, earlier ones need
102 to spend less time on trials.
103 - Earlier targets also aim for lesser precision.
104 - Use Forwarding Rate (FR) at maximum offered load
105 [RFC2285] (section 3.6.2) to initialize the initial targets.
106 - Take care when dealing with inconsistent trial results.
107 - Reported throughput is smaller than smallest load with high loss.
108 - Smaller load candidates are measured first.
109 - Apply several load selection heuristics to save even more time
110 by trying hard to avoid unnecessarily narrow bounds.
112 Some of these enhancements are formalized as MLRsearch specification,
113 the remaining enhancements are treated as implementation details,
114 thus achieving high comparability without limiting future improvements.
116 MLRsearch configuration options are flexible enough to
117 support both conservative settings and aggressive settings.
118 Where the conservative settings lead to results
119 unconditionally compliant with [RFC2544],
120 but longer search duration and worse repeatability.
121 Conversely, aggressive settings lead to shorter search duration
122 and better repeatability, but the results are not compliant with [RFC2544].
124 No part of [RFC2544] is intended to be obsoleted by this document.
126 # Identified Problems
128 This chapter describes the problems affecting usability
129 of various preformance testing methodologies,
130 mainly a binary search for [RFC2544] unconditionally compliant throughput.
132 The last chapter will summarize how the problems are addressed,
133 the middle chapters provide explanations and definitions needed for that.
135 ## Long Search Duration
137 Emergence of software DUTs, with frequent software updates and a
138 number of different frame processing modes and configurations,
139 has increased both the number of performance tests requred to verify DUT update
140 and the frequency of running those tests.
141 This makes the overall test execution time even more important than before.
143 In the context of characterising particular DUT's network performance,
144 this calls for improving the time efficiency of throughput search.
145 A vanilla bisection (at 60sec trial duration for unconditional [RFC2544]
146 compliance) is slow, because most trials spend time quite far from the
149 [RFC2544] does not specify any stopping condition for throughput search,
150 so users can trade-off between search duration and achieved precision.
151 But, due to logarithmic nature of bisection, even small improvement
152 in search duration needs relatively big sacrifice in the precision of the
153 discovered throughput.
159 - The network forwarding device to which stimulus is offered and
160 response measured [RFC2285] (section 3.1.1).
162 - The collective set of network devices to which stimulus is offered
163 as a single entity and response measured [RFC2285] (section 3.1.2).
165 [RFC2544] specifies a test setup with an external tester stimulating the
166 networking system, treating it either as a single DUT, or as a system
169 In case of software networking, the SUT consists nt only of the DUT
170 as a software program processing frames, but also of
171 a server hardware and operating system functions,
172 with server hardware resources shared across all programs
173 and the operating system running on the same server.
175 The DUT is effectively nested within the rest of the SUT.
177 Due to a shared multi-tenant nature of SUT, DUT is subject to
178 possible interference coming from the operating system and any other
179 software running on the same server. Some sources of such interference
180 can be to some degree eliminated, e.g. by pinning DUT program threads
181 to specific CPU cores and isolating those cores to avoid context switching.
182 But some level of adverse effects may remain even after
183 all such reasonable precautions are applied.
184 These effects affect DUT's network performance negatively.
185 As the effects are hard to predict in general, they have impact similar to
186 what other engineering disciplines define as a noise.
187 Thus, all such effects are called an SUT noise.
189 DUT can also exhibit fluctuating performance itself, for reasons
190 not related to the rest of SUT, for example due to pauses in execution
191 as needed for internal stateful processing. In many cases this
192 may be an expected per-design behavior, as it would be observable even
193 in a hypothetical scenario where all sources of SUT noise are eliminated.
194 Such behavior affects trial results in a way similar to SUT noise.
195 As the two phenomenons are hard to destinguish,
196 this document uses the word noise as a shorthand covering both
197 this internal DUT performance fluctuations and genuine SUT noise.
199 A simple model of SUT performance consists of an idealized noiseless performance,
200 and additional noise effects. The noiseless performance is assumed to be constant,
201 all observed performance variations are due to noise.
202 The impact of the noise can vary in time, sometimes wildly,
203 even within a single trial.
204 The noise can sometimes be negligible, but frequently
205 it lowers the observed SUT performance as observed in trial results.
207 In this model, SUT does not have a single performance value, it has a spectrum.
208 One end of the spectrum is the idealized noiseless performance value,
209 the other end can be called a noiseful performance. In practice, trial results
210 close to the noiseful end of the spectrum happen only rarely.
211 The worse the performance value is, the more rarely it is seen in a trial.
212 Therefore, the extreme noiseful end of SUT spectrum is not really observable
213 among trial results. Also, the extreme noiseless end of SUT spectrum
214 is unlikely to be observable, this time because some small noise effects
215 are likely to occur multiple times during a trial.
217 Unless specified otherwise, this document talks about potentially observable
218 ends of the SUT performance spectrum, not about the extreme ones.
220 Focusing on DUT, the benchmarking effort should aim
221 at eliminating only the SUT noise from SUT measurements.
222 In practice that is not really possible, as based on authors experience
223 and available literature, there are no realistic enough models
224 able to distinguish SUT noise from DUT fluctuations.
226 However, assuming that a well-constructed SUT has the DUT as its
227 performance bottleneck, the DUT ideal noiseless performance can be defined
228 as the noiseless end of SUT performance spectrum. At least for
229 throughput. For other performance quantities such as latency there may be an
232 Note that by this definition, DUT noiseless performance
233 also minimizes the impact of DUT fluctuations, as much as realistically possible
234 for a given trial duration.
236 In this document, we reduce the DUT in SUT problem to estimating
237 the noiseless end of SUT performance spectrum from a limited number of
240 Any improvements to throughput search algorithm, aimed for better
241 dealing with software networking SUT and DUT setup, should employ
242 strategies recognizing the presence of SUT noise, and allow discovery of
243 (proxies for) DUT noiseless performance
244 at different levels of sensitivity to SUT noise.
246 ## Repeatability and Comparability
248 [RFC2544] does not suggest to repeat throughput search. And from just one
249 discovered throughput value, it cannot be determined how repeatable that value is.
250 In practice, poor repeatability is also the main cause of poor
251 comparability, that is different benchmarking teams can test the same SUT
252 but get throughput values differing more than expected from search precision.
254 [RFC2544] throughput requirements (60 seconds trial and
255 no tolerance of a single frame loss) affect the throughput results
256 in the following way.
257 The SUT behavior close to the noiseful end of its performance spectrum
258 consists of rare occasions of significantly low performance,
259 but the long trial duration makes those occasions not so rare on the trial level.
260 Therefore, the binary search results tend to wander away from the noiseless end
261 of SUT performance spectrum, more frequently and more widely than shorter
262 trials would, thus resulting in poor throughput repeatability.
264 The repeatability problem can be addressed by defining a search procedure
265 which reports more stable results,
266 even if they can no longer be called throughput in [RFC2544] sense.
267 According to the SUT performance spectrum model, better repeatability
268 will be at the noiseless end of the spectrum.
269 Therefore, solutions to the DUT in SUT problem
270 will help also with the repeatability problem.
272 Conversely, any alteration to [RFC2544] throughput search
273 that improves repeatability should be considered
274 as less dependent on the SUT noise.
276 An alternative option is to simply run a search multiple times, and report some
277 statistics (e.g. average and standard deviation). This can be used
278 for a subset of tests deemed more important,
279 but it makes the search duration problem even more pronounced.
281 ## Throughput with Non-Zero Loss
283 [RFC1242] (section 3.17) defines throughput as:
284 The maximum rate at which none of the offered frames
285 are dropped by the device.
288 Since even the loss of one frame in a
289 data stream can cause significant delays while
290 waiting for the higher level protocols to time out,
291 it is useful to know the actual maximum data
292 rate that the device can support.
294 Contrary to that, many benchmarking teams settle with small, non-zero
295 loss ratio as the goal for a their load search.
297 Motivations are many:
299 - Modern protocols tolerate frame loss better,
300 compared to the time when [RFC1242] and [RFC2544] were specified.
302 - Trials nowadays send way more frames within the same duration,
303 increasing the chance small SUT performance fluctuatios
304 is enough to cause frame loss.
306 - Small bursts of frame loss caused by noise have otherwise smaller impact
307 on the average frame loss ratio ovserved in the trial,
308 as during other parts of the same trial the SUT may work mroe closely
309 to its noiseless performance, thus perhaps lowering the trial loss ratio
310 below the goal loss ratio value.
312 - If an approximation of the SUT noise impact on the trial loss ratio is known,
313 it can be set as the goal loss ratio.
315 Regardless of validity of any and all similar motivations,
316 support for non-zero loss goals makes any search algorithm more user friendly.
317 [RFC2544] throughput is not user friendly in this regard.
319 Assuming users are allowed to specify the goal loss ratio value,
320 the usefulness is enhanced even more if users can specify multiple
321 loss ratio values, especially when a single search can find all relevant bounds.
323 Searching for multiple search goals also helps to describe the SUT performance
324 spectrum better than a single search goal result.
325 For example, repeated wide gap between zero and non-zero loss loads
326 indicates the noise has a large impact on the observed performance,
327 which is not evident from a single goal load search procedure result.
329 It is easy to modify the vanilla bisection to find a lower bound
330 for intended load that satisfies a non-zero goal loss ratio.
331 But it is not that obvious how to search for multiple goals at once,
332 hence the support for multiple search goals remains a problem.
334 ## Inconsistent Trial Results
336 While performing throughput search by executing a sequence of
337 measurement trials, there is a risk of encountering inconsistencies
338 between trial results.
340 The plain bisection never encounters inconsistent trials.
341 But [RFC2544] hints about possibility of inconsistent trial results,
342 in two places in its text.
343 The first place is section 24, where full trial durations are required,
344 presumably because they can be inconsistent with results
345 from shorter trial durations.
346 The second place is section 26.3, where two successive zero-loss trials
347 are recommended, presumably because after one zero-loss trial
348 there can be subsequent inconsistent non-zero-loss trial.
352 - A trial at the same load (same or different trial duration) results
353 in a different trial loss ratio.
354 - A trial at higher load (same or different trial duration) results
355 in a smaller trial loss ratio.
357 Any robust throughput search algorithm needs to decide how to continue
358 the search in presence of such inconsistencies.
359 Definitions of throughput in [RFC1242] and [RFC2544] are not specific enough
360 to imply a unique way of handling such inconsistencies.
362 Ideally, there will be a definition of a new quantity which both generalizes
363 throughput for non-zero-loss (and other possible repeatibility enhancements),
364 while being precise enough to force a specific way to resolve trial result
366 But until such definition is agreed upon, the correct way to handle
367 inconsistent trial results remains an open problem.
369 # MLRsearch Specification
371 This chapter focuses on technical definitions needed for evaluating
372 whether a particular test procedure adheres to MLRsearch specification.
374 For motivations, explanations, and other comments see other chapters.
376 ## MLRsearch Architecture
378 MLRsearch architecture consists of three main components:
379 the manager, the controller and the measurer.
380 Presence of other components (mainly the SUT) is also implied.
382 While the manager and the measurer can be seen a abstractions
383 present in any testing procedure, the behavior of the controller
384 is what distinguishes MLRsearch algorithms from other search procedures.
388 The measurer is the component which performs one trial
389 as described in [RFC2544] section 23, when requested by the controller.
391 Specifically, one call to the measurer accepts a trial load value
392 and trial duration value, performs the trial, and returns
393 the measured trial loss ratio, and optionally a different duration value.
395 It is responsibility of the measurer to uphold any requirements
396 and assumptions present in MLRsearch specification
397 (e.g. trial forwarding ratio not being larger than one).
398 Implementers have some freedom, for example in the way they deal with
399 duplicated frames, or what to return if tester sent zero frames towards SUT.
400 Implementations are RECOMMENDED to document their behavior
401 related to such freedoms in as detailed way as possible
403 Implemenations MUST document any deviations from RFC documents,
404 for example if the wait time around traffic
405 is shorter than what [RFC2544] section 23 specifies.
409 The controller is the component of MLRsearch architecture
410 that is called by the manager (just once), calls the measurer
411 (usually multiple times in succession),
412 and returns the result of the search to the manager.
414 The only required argument in the call to the controller
415 is a list of search goals. For the structure of the search result,
416 see subsection Search Result.
420 The manager is the component that initializes SUT, traffic generator
421 (tester in [RFC2544]), the measurer and the controller
422 with intended configurations. It then hands over the execution
423 to the controller and receives its result.
425 Creation of reports of appropriate format can also be understood
426 as the responsibility of the manager.
430 The specification deals with physical quantities, so it is assumed
431 each numeric value is accompanied by an appropriate physical unit.
433 The specification does not state which unit is appropriate,
434 but implementations MUST make it explicit which unit is used
435 for each value provided or received by the user.
437 For example, load quantities (including the conditional throughput)
438 returned by the controller are defined to be based on single-interface
439 (unidirectional) loads. For bidirectional traffic, users are likely
440 to expect bidirectional throughput quantities, so the manager is responsible
441 for making its report clear.
445 As defined in [RFC2285]:
446 The collective set of network devices to which stimulus is offered
447 as a single entity and response measured.
451 A trial is the part of test described in [RFC2544] section 23.
455 Trial load is the intended constant load for a trial.
457 Load is the quantity implied by Constant Load of [RFC1242],
458 Data Rate of [RFC2544] and Intended Load of [RFC2285].
459 All three specify this value applies to one (input or output) interface.
463 Trial duration is the intended duration of the traffic for a trial.
465 In general, this general quantity does not include any preparation nor waiting
466 described in section 23 of [RFC2544].
468 However, the measurer MAY return a duration value different
469 from the intended duration. This may be useful for users
470 who want to control the overal search duration, not just the traffic part of it.
471 The manager MUST report how does the measurer computes the returned duration
474 ### Trial Forwarding Ratio
476 Trial forwarding ratio is dimensionless floating point value,
477 assumed to be between 0.0 and 1.0, both including.
478 It is computed as the number of frames forwarded by SUT, divided by
479 the number of frames that should have been forwarded during the trial.
481 Note that, contrary to load, frame counts used to compute
482 trial forwarding ratio are aggregates over all SUT ports.
484 Questions around what is the correct number of frames
485 that should have been forwarded it outside of the scope of this document.
486 E.g. what should the measurer return when it detects
487 that the offered load differs significantly from the intended load.
489 It is RECOMMENDED implementations return an irregular goal result
490 if they detect questionable (in comparability sense) trial results
491 affecting their goal result.
495 Trial loss ratio is equal to one minus the trial forwarding ratio.
497 ### Trial Forwarding Rate
499 The trial forwarding rate is the trial load multiplied by
500 the trial forwarding ratio.
502 Note that this is very similar, but not identical to Forwarding Rate
503 as defined in [RFC2285] section 3.6.1, as that definition
504 is specific to one output interface, while trial forwarding ratio
505 is based on frame counts aggregated over all SUT interfaces.
509 Any other specifics (besides trial load and trial duration)
510 the measurer needs to perform the trial are understood as a composite
511 called the traffic profile.
512 All its attributes are assumed to be constant during the search,
513 and the composite is configured on the measurer by the manager
514 before the search starts.
516 Traffic profile is REQUIRED by [RFC2544] to contain some specific quantities,
517 for example frame size.
518 Several more specific quantities may be RECOMMENDED.
520 Depending on SUT configuration, e.g. when testing specific protocols,
521 additional values need to be included in the traffic profile
522 and in the test report. See other IETF documents.
526 A search goal is one item of the list required as an argument
527 when the manager calls the controller.
529 Each search goal is composite consisting of several attributes,
530 some of them are required.
531 Implementations are free to add their own attributes.
533 Subsections list all required attributes and one recommended attribute.
535 The meaning of the attributes is formally given only by their effect
536 on the computation of the attributes of the goal result.
537 The subsections do contain a short informal description,
538 but see other chapters for more in-depth explanations.
540 ### Goal Final Trial Duration
542 A threshold value for trial durations.
543 REQUIRED attribute, MUST be positive.
545 Informally, the conditional throughput for this goal will be computed
546 only from trial results from trials as long as this.
548 ### Goal Duration Sum
550 A threshold value for a particular sum of trial durations.
551 REQUIRED attribute, MUST be positive.
553 This uses the duration values returned by the measurer,
554 in case it returns something else than the intended durations
555 for the traffic part of the search.
557 Informally, even at looking only at trials done at this goal's
558 final trial duration, MLRsearch may spend up to this time measuring
563 A threshold value for trial loss ratios.
564 REQUIRED attribute, MUST be non-negative and smaller than one.
566 Informally, if a load causes too many trials with trial results larger than this,
567 the conditional throughput for this goal will be smaller than that load.
569 ### Goal Exceed Ratio
571 A threshold value for particular ratio of duration sums.
572 REQUIRED attribute, MUST be non-negative and smaller than one.
574 This uses the duration values returned by the measurer,
575 in case it returns something else than the intended durations
576 for the traffic part of the search.
578 Informally, this acts as the q-value for a quantile when selecting
579 the forwarding rate of which trial result becomes the conditional throughput.
580 For example, when the goal exceed ratio is 0.5 and MLRsearch
581 happened to use the whole goal duration sum when determining conditional
582 throughput, it means the conditional throughput is the median
583 of trial forwarding rates.
584 In practice, MLRsearch may stop measuring a load before the goal duration sum
585 is reached, and the conditional throughput in that case frequently
586 is the worst trial still not exceeding the goal loss ratio.
587 If the goal duration sum is no larger than the goal fina trial duration,
588 MLRsearch performs only one trial per load (unless other goals need more)
589 and the goal exceed ratio has no effect on the search result.
593 A value used as a threshold for telling when two trial load values
596 RECOMMENDED attribute, positive. Implementations without this attribute
597 MUST give the manager other ways to control the search exit condition.
599 Absolute load difference and relative load difference are two popular choices,
600 but implementations may choose a different way to specify width.
602 Informally, this acts as a stopping condition, controling the precision
603 of the search. The search stops if every goal has reached its precision.
607 Search result is a single composite object returned from the controller
609 It is a mapping from the search goals (see section Search Goal) into goal results
610 (see section Goal Result).
611 The mapping MUST map from all the search goals present in the controller input.
613 Each search goal instance is mapped to a goal result instance.
614 Multiple search goal instances may map to the same goal result instance.
618 Goal result is a composite object consisting of several attributes.
619 All attributes are related to the same search goal, the one search goal instance
620 the Search Result is mapping into this instance of the Goal Result.
622 Some of the attributes are required, some are recommended,
623 implementations are free to add their own.
625 The subsections define attributes for regular goal result.
626 Implementations are free to define their own irregular goal results,
627 but the manager MUST report them clearly as not regular according to this section.
629 A typical irregular result is when all trials at maximum offered load
630 have zero loss, as the relevant upper bound does not exist in that case.
632 ### Relevant Upper Bound
634 Relevant upper bound is the intended load value that is classified
635 at the end of the search as the relevant upper bound (see Appendix A)
637 This is a REQUIRED attribute.
639 Informally, this is the smallest intended load that failed to uphold
640 all the requirements of this search goal, mainly the goal loss ratio
641 in combination with the goal exceed ratio.
643 ### Relevant Lower Bound
645 Relevant lower bound is the intended load value that got classified
646 (after all trials) as the relevant lower bound (see Appendix A) for this goal.
647 This is a REQUIRED attribute.
649 The distance between the relevant lower bound and the relevant upper bound
650 MUST NOT be larger than the goal width, for a regular goal result,
651 if the implementation offers width as a goal attribute.
653 Informally, this is the smallest intended load that managed to uphold
654 all the requirements of this search goal, mainly the goal loss ratio
655 in combination with the goal exceed ratio, while not being larger
656 than the relevant upper bound.
658 ### Conditional Throughput
660 The conditional throughput (see Appendix B) as evaluated
661 at the relevant lower bound.
662 This is a RECOMMENDED attribute.
664 Informally, a typical forwarding rate expected to be seen
665 at the relevant lower bound. But frequently just a conservative estimate thereof,
666 as MLRsearch implementations tend to stop gathering more data
667 as soon as they confirm this estimate cannot get worse within
668 the goal duration sum.
670 # MLRsearch Explanations
672 This chapter focuses on intuitions and motivations
673 and skips over some important details.
675 Familiarity with the MLRsearch specification is not required here,
676 so this chapter can act as an introduction.
677 For example, this chapter start talking about tightest lower bounds
678 before it is ready to talk about the relevant lower bound from the specification.
680 ## MLRsearch Versions
682 The MLRsearch algorithm has been developed in a code-first approach,
683 a Python library has been created, debugged and used in production
684 before first descriptions (even informal) were published.
685 In fact, multiple versions of the library were used in production
686 over past few years, and later code was usually not compatible
687 with earlier descriptions.
689 The code in (any version of) MLRsearch library fully determines
690 the search process (for given configuration parameters),
691 leaving no space for deviations.
692 MLRsearch as a name for a broad class of possible algorithms
693 leaves plenty of space for future improvements, at the cost
694 of poor comparability of results of different MLRsearch implementations.
696 This document aspires to prescribe a MLRsearch specification
697 in a way that restricts the important parts related to comparability,
698 while leaving other parts vague enough so implementations can improve freely.
702 [RFC2544] prescribes that after performing one trial at a specific offered load,
703 the next offered load should be larger or smaller, based on frame loss.
705 The usual implementation uses binary search. Here a lossy trial becomes
706 a new upper bound, a lossless trial becomes a new lower bound.
707 The span of values between the tightest lower bound and the tightest upper bound
708 forms an interval of possible results,
709 and after each trial the width of that interval halves.
711 Usually the binary search implementation tracks only the two tightest bounds,
712 simply calling them bounds, but the old values still remain valid bounds,
713 just not as tight as the new ones.
715 After some number of trials, the tightest lower bound becomes the throughput.
716 [RFC2544] does not specify when (if ever) should the search stop.
718 MLRsearch library introduces a concept of goal width. The search stops
719 when the distance between the tightest upper bound and the tightest lower bound
720 is smaller than a user-configured value called goal width from now on.
721 In other words, interval width has to be smaller than goal width
722 at the end of the search.
724 This goal width value therefore determines the precision of the result.
725 As MLRsearch specification requires a particular structure of the result,
726 the result itself does contain enough information to determine its precision,
727 thus it is not required to report the goal width value.
729 This allows MLRsearch implementations to use exit conditions
730 different from goal width.
731 The MLRsearch specification only REQUIRES the search procedure
732 to always finish in a finite time, regardless of possible trial results.
734 ## Load Classification
736 MLRsearch keeps the basic logic of binary search (tracking tightest bounds,
737 measuring at the middle), perhaps with minor technical clarifications.
738 The algorithm chooses an intended load (as opposed to offered load),
739 the interval between bounds does not need to be split exactly in two equal halves,
740 and the final reported structure specifies both bounds
741 (optionally also the conditional throughput at the lower bound, defined later).
743 The biggest difference is that in order to classify a load
744 as an upper or lower bound, MLRsearch may need more than one trial
745 (depending on configuration options) to be performed at the same intended load.
747 As a consequence, even if a load already does have few trial results,
748 it still may be classified as undecided, neither a lower bound nor an upper bound.
750 Explanation of the classification logic is given in the next chatper,
751 as it relies heavily on other sections of this chapter.
753 For repeatability and comparability reasons, it is important that
754 given a set of trial results, all implementations of MLRsearch
755 classify the load in an equivalent way.
759 Next difference is in goals of the search. [RFC2544] has a single goal,
760 based on classifying full-length trials as either loss-less or lossy.
762 As the name suggests, MLRsearch can seach for multiple goals, differing in their
763 loss ratios. Precise definition of goal loss ratio will be given later.
764 The [RFC2544] throughput goal then simply becomes a zero goal loss ratio.
765 Different goals also may have different goal width.
767 A set of trial results for one specific intended load value
768 can classify the load as an upper bound for some goals, but a lower bound
769 for some other goals, and undecided for the rest of the goals.
771 Therefore, the load classification depends not only on ttrial results,
772 but also on the goal. The overall search procedure becomes more complicated
773 (compared to binary search with a single goal),
774 but most of the complications do not affect the final result,
775 except for one phenomenon, loss inversion.
779 In [RFC2544] throuhput search using bisection, any load with lossy trial
780 becomes a hard upper bound, meaning every subsequent trial has smaller
783 But in MLRsearch, a load that is classified as an upper bound for one goal
784 may still be a lower bound for another goal, and due to that other goal
785 MLRsearch will probably perform trials at even higher loads.
786 What to do when all such higher load trials happen to have zero loss?
787 Does it mean the earlier upper bound was not real?
788 Does it mean the later lossless trials are not considered a lower bound?
789 Surely we do not want to have an upper bound at a load smaller than a lower bound.
791 MLRsearch is conservative in these situations.
792 The upper bound is considered real, and the lossless trials at higher loads
793 are considered to be a coincidence, at least when computing the final result.
795 This is formalized using new notions, the relevant upper bound and
796 the relevant lower bound.
797 Load classification is still based just on the set of trial results
798 at a given intended load (trials at other loads are ignored),
799 making it possible to have a lower load classified as an upper bound
800 and a higher load classified as a lower bound (for the same goal).
801 The relevant upper bound (for a goal) is the smallest load classified
802 as an upper bound. But the relevant lower bound is not simply
803 the largest among lower bounds. It is the largest load among loads
804 that are lower bounds while also being smaller than the relevant upper bound.
806 With these definitions, the relevant lower bound is always smaller
807 than the relevant upper bound (if both exist), and the two relevant bounds
808 are used analogously as the two tightest bounds in the binary search.
809 When they are less than goal width apart, the relevant bounds are used in output.
811 One consequence is that every trial result can have an impact on the search result.
812 That means if your SUT (or your traffic generator) needs a warmup,
813 be sure to warm it up before starting the search.
817 The idea of performing multiple trials at the same load comes from
818 a model where some trial results (those with high loss) are affected
819 by infrequent effects, causing poor repeatability of [RFC2544] throughput results.
820 See the discussion about noiseful and noiseless ends of SUT performance spectrum.
821 Stable results are closer to the noiseless end of SUT preformance spectrum,
822 so MLRsearch may need to allow some frequency of high-loss trials
823 to ignore the reare but big effects near the noisefull end.
825 MLRsearch is able to do such trial result filtering, but it needs
826 a configuration option to tell it how much frequent can the infrequent big loss be.
827 This option is called exceed ratio. It tells MLRsearch what ratio of trials
828 (more exactly what ratio of trial seconds) can have trial loss ratio
829 larger than goal loss ratio and still be classified as a lower bound.
830 Zero exceed ratio means all trials have to have trial loss ratio
831 equal to or smaller than the goal loss ratio.
833 For explainability reasons, the RECOMMENDED value for exceed ratio is 0.5,
834 as it simplifies some later concepts by relating them to the concept of median.
838 When more than one trial is needed to classify a load,
839 MLRsearch also needs something that controlls the number of trials needed.
840 Therefore, each goal also has an attribute called duration sum.
842 The meaning of a goal duration sum is that when a load has trials
843 (at full trial duration, details later)
844 whose trial durations when summed up give a value at least this,
845 the load is guaranteed to be classified as an upper bound or a lower bound
848 As the duration sum has a big impact on the overall search duration,
849 and [RFC2544] prescibes wait intervals around trial traffic,
850 the MLRsearch algorithm may sum durations that are different
851 from the actual trial traffic durations.
855 Section 24 of [RFC2544] already anticipates possible time savings
856 when short trials (shorter than full length trials) are used.
858 MLRsearch requires each goal to specify its final trial duration.
859 Full-length trial is the short name for a trial whose intended trial duration
860 is equal to the goal final trial duration.
862 Any MLRsearch implementation may include its own configuration options
863 which control when and how MLRsearch chooses to use shorter trial durations.
865 For explainability reasons, when exceed ratio of 0.5 is used,
866 it is recommended for the goal duration sum to be an odd multiple
867 of the full trial durations, so conditional throughput becomes identical to
868 a median of a particular set of forwarding rates.
870 Presence of shorter trial results complicates the load classification logic.
871 Full details are given later. In short, results from short trials
872 may cause a load to be classified as an upper bound.
873 This may cause loss inversion, and thus lower the relevant lower bound
874 (below what would classification say when considering full-length trials only).
876 For explainability reasons, it is RECOMMENDED users use such configurations
877 that guarantee all trials have the same length.
878 Alas, such configurations are usually not compliant with [RFC2544] requirements,
879 or not time-saving enough.
881 ## Conditional Throughput
883 As testing equipment takes intended load as input parameter
884 for a trial measurement, any load search algorithm needs to deal
885 with intended load values internally.
887 But in presence of goals with non-zero loss ratio, the intended load
888 usually does not match the user intuition of what a throughput is.
889 The forwarding rate (as defined in [RFC2285] section 3.6.1) is better,
890 but it is not obvious how to generalize it
891 for loads with multiple trial results,
892 especially with non-zero goal exceed ratio.
894 MLRsearch defines one such generalization, called the conditional throughput.
895 It is the forwarding rate from one of the trials performed at the load
896 in question. Specification of which trial exactly is quite technical.
897 More detailed explanations are given in the next chapter.
899 Conditional throughput is partially related to load classification.
900 If a load is classified as a lower bound for a goal,
901 the conditional throughpt can be calculated,
902 and guaranteed to show effective loss ratio no larger than goal loss ratio.
904 While the conditional throughput gives more intuitive-looking values
905 than the relevant lower bound, especially for non-zero goal loss ratio values,
906 the actual definition is more complicated than the definition of the relevant
907 lower bound. In future, other intuitive values may become popular,
908 but they are unlikely to supersede the definition of the relevant lower bound
909 as the most fitting value for comparability purposes,
910 therefore the relevant lower bound remains a required attribute
911 of the goal result structure.
913 Note that comparing best and worst case, the same relevant lower bound value
914 may result in the conditional throughput differing up to the goal loss ratio.
915 Therefore it is rarely needed to set the goal width (if expressed
916 as relative difference of loads) below the goal loss ratio.
917 In other words, setting the goal width below the goal loss ratio
918 may cause the conditional throughput for a larger loss ratio to became smaller
919 than a conditional throughput for a goal with a smaller goal loss ratio,
920 which is counter-intuitive, considering they come from the same search.
921 Therefore it is RECOMMENDED to set the goal width to a value no smaller
922 than the goal loss ratio.
926 The main motivation for MLRsearch was to have an algorithm that spends less time
927 finding a throughput, either the [RFC2544] compliant one,
928 or some generalization thereof. The art of achieving short search times
929 is mainly in smart selection of intended loads (and intended durations)
930 for the next trial to perform.
932 While there is an indirect impact of the load selection on the reported values,
933 in practice such impact tends to be small,
934 even for SUTs with quite broad performance spectrum.
936 A typical example of two approaches to load selection leading to different
937 relevant lower bounds is when the interval is split in a very uneven way.
938 An implementation chosing loads very close to the current relevant lower bound
939 are quite likely to eventually stumble upon a trial result
940 with poor performance (due to SUT noise).
941 For an implementation chosing load very close to the current relevant upper bound
942 this is unlikely, as it examines more loads that can see a performance
943 close to the noiseless end of the SUT performance spectrum.
944 The reason why it is unlikely to have two MLRsearch implementation showing
945 this kind of preference in load selection is precisely
946 in the desire to have short searches.
947 Even splits are the best way to achive the desired precision,
948 so the more optimized a search algorithm is for the overall search duration,
949 the better the repeatability and comparability
950 of its results will be, assuming the user configuration remains the same.
952 Therefore, this document remains quite vague on load selection
953 and other optimisation details, and configuration attributes related to them.
954 Assuming users prefer libreries that achieve short overall search time,
955 the definition of the relevant lower bound
956 should be strict enough to ensure result repeatability
957 and comparability between different implementations,
958 while not restricting future implementations much.
960 Sadly, different implementations may exhibit their sweet spot of
961 best repeatability at given search duration at different goals attribute values,
962 especially with respect to optional goal attributes
963 such as initial trial duration.
964 Thus, this document does not comment much on which configurations
965 are good for comparability between different implementations.
966 For comparability between different SUTs using the same implementation,
967 refer to configurations recommended by that particular implementation.
969 ## [RFC2544] compliance
971 The following search goal ensures unconditional compliance with
972 [RFC2544] throughput search procedure:
974 - Goal loss ratio: zero.
976 - Goal final trial duration: 60 seconds.
978 - Goal duration sum: 60 seconds.
980 - Goal exceed ratio: zero.
982 Presence of other search goals does not affect compliance of this goal result.
983 The relevant lower bound and the conditional throughput are in this case
984 equal to each other, and the value is the [RFC2544] throughput.
986 If the 60 second quantity is replaced by a smaller quantity in both attributes,
987 the conditional throughput is still conditionally compliant with
988 [RFC2544] throughput.
990 # Selected Functional Details
992 This chapter continues with explanations,
993 but this time more precise definitions are needed
994 for readers to follow the explanations.
995 The definitions here are wordy, implementers can look into the next chapter
996 for more concise definitions.
998 The two areas of focus in this chapter are the load classification
999 and the conditional throughput, starting with the latter.
1001 ## Performance Spectrum
1003 There are several equivalent ways to define the conditional throughput computation.
1004 One of the ways relies on an object called the performance spectrum.
1005 First, two heavy definitions are needed.
1007 Take an intended load value, and a finite set of trial results, all trials
1008 measured at that load value. The performance spectrum is the function that maps
1009 any non-negative real number into a sum of trial durations among all trials
1010 in the set that have that number as their forwarding rate,
1011 e.g. map to zero if no trial has that particular forwarding rate.
1013 A related function, defined if there is at least one trial in the set,
1014 is the performance spectrum divided by sum of durations of all trials in the set.
1015 That function is called the performance probability function, as it satisfies
1016 all the requirements for probability mass function function
1017 of a discrete probability distribution,
1018 the one-dimensional random variable being the trial forwarding rate.
1020 These functions are related to the SUT performance spectrum,
1021 as sampled by the trials in the set.
1023 As for any other probability function, we can talk about percentiles,
1024 of the performance probability function, and bout other quantiles
1025 such as the median. The conditional throughput will be
1026 one such quantile value for a specifically chosen set of trials.
1028 Take a set of all full-length trials performed at the load in question.
1029 The sum of durations of those trials may be less than goal duration sum, or not.
1030 If it is less, add an imaginary trial result with zero forwarding rate
1031 such that the new sum of durations is equal to the goal duration sum.
1032 This is the set of trials to use. The q-value for the quantile
1033 is the goal exceed ratio. If the quantile touches two trials,
1034 the larger forwarding rate is used.
1036 First example. For zero exceed ratio when goal duration sum
1037 has been reached. The conditional throughput is the smallest forwarding
1038 rate among the trials.
1040 Second example. For zero exceed ratio when goal duration sum
1041 has not been reached yet. Due to the missing duration sum,
1042 the worst case may still happen, so the conditional througput
1043 is zero. This is not reported to the user, as this load
1044 cannot become the relevant lower bound yet.
1046 Third example. Exceed ratio 50%, goal duration sum two seconds,
1047 one trial present with duration one second and zero loss.
1048 An imaginary trial is added with duration one second and zero forwarding rate.
1049 Median would touch both trials, so the conditional throughput
1050 is the forwarding rate of the one non-imaginary trial.
1051 As that had zero loss, the value is equal to the offered load.
1053 The classification does not need the whole performance spectrum,
1054 only few duration sums.
1056 A trial is called bad (according to a goal) if its trial loss ratio
1057 is larger than the goal loss ratio. Trial that is not bad is called good.
1059 ## Single Trial Duration
1061 When goal attributes are chosen in such a way that every trial has the same
1062 intended duration, the load classification is sipler.
1064 The following description looks technical, but it follows the motivation
1065 of goal loss ratio, goal exceed ratio and goal duration sum.
1066 If sum of durations of all trials (at given load) is less than the goal
1067 duration sum, imagine best case scenario (all subsequent trials having zero loss)
1068 and worst case scenario (all subsequent trials having 100% loss).
1069 Here we assume there is as many subsequent trials as needed
1070 to make the sum of all trials to become equal to the goal duration sum.
1071 As the exceed ratio is defined just using sums of durations
1072 (number of trials does not matter), it does not matter whether
1073 the "subsequent trials" can consist of integer number of full-length trials.
1075 If even in the best case scenario the load exceed ratio would be larger
1076 than the goal exceed ratio, the load is an upper bound.
1077 If even in the worst case scenario the load exceed ratio would not be larger
1078 than the goal exceed ratio, the load is a lower bound.
1080 Even more specifically.
1081 Take all trials measured at a given load.
1082 Sum of durations of all bad full-length trials is called the bad sum.
1083 Sum of durations of all good full-length trials is called the good sum.
1084 The result of adding bad sum plus the good sum is called the measured sum.
1085 The larger of the measured sum and the goal duration sum is called the whole sum.
1086 The whole sum minus the measured sum is called the missing sum.
1087 Optimistic exceed ratio is the bad sum divided by the whole sum.
1088 Pessimistic exceed ratio is the bad sum plus the missing sum, that divided by
1090 If optimistic exceed ratio is larger than the goal exceed ratio,
1091 the load is classified as an upper bound.
1092 If pessimistic exceed ratio is not larger than the goal exceed ratio,
1093 the load is classified as a lower bound.
1094 Else, the load is classified as undecided.
1096 The definition of pessimistic exceed ratio matches the logic in
1097 the conditional throughput computation, so a load is a lower bound
1098 if and only if the conditional throughput effective loss ratio
1099 is not larger than the goal loss ratio.
1100 If it is larger, the load is either an upper bound or undecided.
1102 ## Short Trial Scenarios
1104 Trials with intended duration smaller than the goal final trial duration
1105 are called short trials.
1106 The motivation for load classification logic in presence of short trials
1107 is based around a counter-factual case: What would the trial result be
1108 if a short trial has been measured as a full-length trial instead?
1110 There are three main scenarios where human intuition guides
1111 the intended behavior of load classification.
1113 Scenario one. The user had their reason for not configuring shorter goal
1114 final trial duration. Perhaps SUT has buffers that may get full at longer
1115 trial durations. Perhaps SUT shows periodic decreases of performance
1116 the user does not want to treat as noise. In any case, many good short trials
1117 may became bad full-length trial in the counter-factual case.
1118 In extreme case, there are no bad short trials.
1119 In this scenario, we want the load classification NOT to classify the load
1120 as a lower bound, despite the abundance of good short trials.
1121 Effectively, we want the good short trials to be ignored, so they
1122 do not contribute to comparisons with the goal duration sum.
1124 Scenario two. When there is a frame loss in a short trial,
1125 the counter-factual full-length trial is expected to lose at least as many
1126 frames. And in practice, bad short trials are rarely turning into
1127 good full-length trials. In extreme case, there are no good short trials.
1128 In this scenario, we want the load classification
1129 to classify the load as an upper bound just based on abundance
1130 of short bad trials. Effectively we want the bad short trials
1131 to contribute to comparisons with the goal duration sum,
1132 so the load can be classified sooner.
1134 Scenario three. Some SUTs are quite indifferent to trial duration.
1135 Performance probability function constructed from short trial results
1136 is likely to be similar to performance probability function constructed
1137 from full-length trial results (perhaps with smaller dispersion,
1138 but overall without big impact on the median quantiles).
1139 For moderate goal exceed ratio values, this may mean there are both
1140 good short trials and bad short trials.
1141 This scenario is there just to invalidate a simple heuristic
1142 of always ignoring good short trials and never ignoring bad short trials.
1143 That simple heuristic would be too biased. Yes, the short bad trials
1144 are likely to turn into full-length bad trials in the counter-factual case,
1145 but there is no information on what would the good short trials turn into.
1146 The only way to decide is to do more trials at full length,
1147 the same as in scenario one.
1149 ## Short Trial Logic
1151 MLRsearch picks a particular logic for load classification
1152 in presence of short trials, but it is still RECOMMENDED to use configurations
1153 that imply no short trials, so the possible inefficiencies in that logic
1154 do not affect the result, and the result has better explainability.
1156 With thas said, the logic differs from the single trial duration case
1157 only in different definition of bad sum.
1158 Good sum is still the sum across all good full-length trials.
1160 Few more notions are needed for definig the new bad sum.
1161 Sum of durations of all bad full-length trials is called the bad long sum.
1162 Sum of durations of all bad short trials is called the bad short sum.
1163 Sum of durations of all good short trials is called the good short sum.
1164 One minus the goal exceed ratio is called the inceed ratio.
1165 The goal exceed ratio divided by the inceed ratio is called the exceed coefficient.
1166 The good short sum multiplied by the exceed coefficient is called the balancing sum.
1167 The bad short sum minus the balancing sum is called the excess sum.
1168 If the excess sum is negative, the bad sum is equal to the bad long sum.
1169 Else, the bad sum is equal to the bad long sum plus the excess sum.
1171 Here is how the new definition of the bad sum fares in the three scenarios,
1172 where the load is close to what would relevant bounds be
1173 if only full-length trials were used for the search.
1175 Scenario one. If duration is too short, we expect to see higher frequency
1176 of good short trials. This could lead to negative excess sum,
1177 which has no impact, hence the load classification is given just by
1179 Thus, MLRsearch using too short trials has no detrimental effect
1180 on result comparability in this scenario.
1181 But also using short trials does not help with overall search duration,
1182 proably making it worse.
1184 Scenario two. Settings with small exceed ratio have small exceed coefficient,
1185 so the impact of good short sum is small and the bad short sum
1186 is almost wholly converted into excess sum, thus bad short trials
1187 have almost as big impact as full-length bad trials.
1188 The same conclusion applies for moderate exceed ratio values
1189 when the good short sum is small.
1190 Thus, short trials can cause a load to get classified as an upper bound earlier
1191 bringing time savings (while not affecting comparability).
1193 Scenario three. Here excess sum is small in absolute value, as balancing sum
1194 is expected to be be similar to the bad short sum.
1195 Once again, full-length trials are needed for final load classification,
1196 but usage of short trials probably means MLRsearch needed shorter search time
1197 before selecting this load for measurement, bringing time savings
1198 (while not affecting comparability).
1200 ## Longer Trial Durations
1202 If there are trial results with intended duration larger
1203 than the goal trial duration, the classification logic is intentionally undefined.
1205 The implementations MAY treat such longer trials as if they were full-length.
1206 In any case, presence of such longer trials in either the relevant lower bound
1207 or the relevant upper bound SHOULD be mentioned, as for sume SUTs
1208 it is likely to affect comparability.
1211 TODO: Here will be a chapter summarizing how MLRsearch library
1212 adresses the problems from the Identified Problems chapter.
1216 # Problems after MLRsearch
1218 Now when MLRsearch is clearly specified and explained,
1219 it is possible to summarize how does MLRsearch specification help with problems.
1221 Here, "multiple trials" is a shorthand for having the goal final trial duration
1222 significantly smaller than the goal duration sum.
1223 This results in MLRsearch performing multiple trials at the same load,
1224 which may not be the case with other configurations.
1226 ## Long Test Duration
1228 As shortening the overall search duration is the main motivation
1229 of MLRsearch library development, the library implements
1230 multiple improvements on this front, both big and small.
1231 Most of implementation details are not part of the MLRsearch specification,
1232 so that future implementations may keep shortening the search duration even more.
1234 TODO: The rest is about attributes.
1236 From the required goal attributes, the goal duration sum
1237 remains the best way to get even shorter searches.
1239 Usage of multiple trials can also save time,
1240 depending on wait times around trial traffic.
1242 The farther the goal exceed ratio is from 0.5 towards either side,
1243 the less predictable the overal search duration becomes in practice.
1245 Width parameter does not change search duration much in practice
1246 (compared to other, mainly optional goal attributes).
1250 Practice shows big improvements when multiple trials
1251 and moderate exceed ratios are used. Mainly when it comes to result
1252 repeatability, as sometimes it is not easy to distinguish
1253 SUT noise from DUT instability.
1255 Conditional throughput has intuitive meaning when described
1256 using the performance spectrum, so this is an improvement,
1257 especially when compared to search procedures which use non-zero
1258 goal loss ratio but return only the intended load at a lower bound.
1260 Multiple trials can save time also when the noisy end of
1261 the preformance spectrum needs to be examined, e.g. for [RFC9004].
1263 Under some circumstances, testing the same DUT and SUT setup with different
1264 DUT configurations can give some hints on what part of noise us SUT noise
1265 and what part is DUT performance fluctuations.
1266 In practice, both types of noise tend to be too complicated for that analysis.
1267 MLRsearch does not offer additional tools in this regard,
1268 apart of giving users ability to search for more goals,
1269 hoping to get more insight at the cost of longer overall search time.
1271 ## Repeatability and Comparability
1273 Multiple trials improve repeatability, depending on exceed ratio.
1275 In practice, 1s goal final trial duration with exceed ratio 0.5
1276 is good enough for modern SUTs (but that usually requires
1277 smaller wait times around the traffic part of the trial,
1278 otherwise too much search time is wasted waiting).
1280 It is not clear whether exceed ratios higher than 0.5 are better
1282 The 0.5 value is still preferred due to explainability using median.
1283 TODO: Stress single value is for comparability, which one is due explainability.
1285 It is possible that the conditional throughput values (with non-zero
1286 goal loss ratio) are better for repeatability than the relevant
1287 lower bound values, especially for implementations
1288 which pick load from a small set of discrete values.
1290 Implementations focusing on shortening the overall search time
1291 are automatically forced to avoid comparability issues
1292 due to load selection, as they must prefer even splits wherever possible.
1293 But this conclusion only holds when the same goals are used.
1294 Larger adoption is needed before any further claims on comparability
1295 between MLRsearch implementations can be made.
1297 ## Throughput with Non-Zero Loss
1299 Suported by the goal loss ratio attribute.
1300 Improves repeatability as expected.
1302 ## Inconsistent Trial Results
1304 MLRsearch is conservative wherever possible,
1305 this is built into the definition of conditional throughput,
1306 and into the treatment of short trial results for load classification.
1308 This is consistent with [RFC2544] zero loss tolerance motivation.
1310 If the very best (noiseless) part of the SUT performance spectrum
1311 is of interest, it should be enough to set small value for
1312 the goal final trial duration, and perhaps also a large value
1313 for the goal exceed ratio.
1315 Implementations may offer other (optional) configuration attributes
1316 (and optional goal result attributes)
1317 to become less conservative, but currently it is not clear
1318 what impact would that have on repeatability.
1322 # IANA Considerations
1324 No requests of IANA.
1326 # Security Considerations
1328 Benchmarking activities as described in this memo are limited to
1329 technology characterization of a DUT/SUT using controlled stimuli in a
1330 laboratory environment, with dedicated address space and the constraints
1331 specified in the sections above.
1333 The benchmarking network topology will be an independent test setup and
1334 MUST NOT be connected to devices that may forward the test traffic into
1335 a production network or misroute traffic to the test management network.
1337 Further, benchmarking is performed on a "black-box" basis, relying
1338 solely on measurements observable external to the DUT/SUT.
1340 Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
1341 benchmarking purposes. Any implications for network security arising
1342 from the DUT/SUT SHOULD be identical in the lab and in production
1347 Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
1348 review and numerous useful comments and suggestions.
1350 Special wholehearted gratitude and thanks to late Al Morton for his
1351 thorough reviews filled with very specific feedback and constructive
1352 guidelines. Thank you Al for the close collaboration over the years,
1353 for your continuous unwavering encouragements full of empathy and
1355 Al, you are dearly missed.
1359 This is a specification of load classification.
1361 The block at the end of this appendix holds pseudocode
1362 which computes two values, stored in variables named optimistic and pessimistic.
1363 The pseudocode happens to be a valid Python code.
1365 If both values are computed to be true, the load in question
1366 is classified as a lower bound according to the goal in question.
1367 If both values are false, the load is classified as an upper bound.
1368 Otherwise, the load is classifies as undecided.
1370 The pseudocole expects the following variables hold values as follows:
1372 - goal_duration_sum: The goal duration sum value.
1374 - goal_exceed_ratio: The goal exceed ratio value.
1376 - good_long_sum: Sum of durations across trials with trial duration
1377 at least equal to the goal final trial duration and with trial loss ratio
1378 not higher than the goal loss ratio.
1380 - bad_long_sum: Sum of durations across trials with trial duration
1381 at least equal to the goal final trial duration and with trial loss ratio
1382 higher than the goal loss ratio.
1384 - good_short_sum: Sum of durations across trials with trial duration
1385 shorter than the goal final trial duration and with trial loss ratio
1386 not higher than the goal loss ratio.
1388 - bad_short_sum: Sum of durations across trials with trial duration
1389 shorter than the goal final trial duration and with trial loss ratio
1390 higher than the goal loss ratio.
1392 Here the implicit set of available trial results consists of all trials
1393 measured at given intended load at the end of search.
1395 The code works correctly also when there are no trial results at given load.
1398 balancing_sum = good_short_sum * goal_exceed_ratio / (1.0 - goal_exceed_ratio)
1399 effective_bad_sum = bad_long_sum + max(0.0, bad_short_sum - balancing_sum)
1400 effective_whole_sum = max(good_long_sum + effective_bad_sum, goal_duration_sum)
1401 quantile_duration_sum = effective_whole_sum * goal_exceed_ratio
1402 optimistic = effective_bad_sum <= quantile_duration_sum
1403 pessimistic = (effective_whole_sum - good_long_sum) <= quantile_duration_sum
1408 This is a specification of conditional throughput.
1410 The block at the end of this appendix holds pseudocode
1411 which computes a value stored as variable conditional_throughput.
1412 The pseudocode happens to be a valid Python code.
1414 The pseudocole expects the following variables hold values as follows:
1416 - goal_duration_sum: The goal duration sum value.
1418 - goal_exceed_ratio: The goal exceed ratio value.
1420 - good_long_sum: Sum of durations across trials with trial duration
1421 at least equal to the goal final trial duration and with trial loss ratio
1422 not higher than the goal loss ratio.
1424 - bad_long_sum: Sum of durations across trials with trial duration
1425 at least equal to the goal final trial duration and with trial loss ratio
1426 higher than the goal loss ratio.
1428 - long_trials: An iterable of all trial results from trials with trial duration
1429 at least equal to the goal final trial duration,
1430 sorted by increasing trial loss ratio.
1431 A trial result is a composite with the following two attributes available:
1433 - trial.loss_ratio: The trial loss ratio as measured for this trial.
1435 - trial.duration: The trial duration of this trial.
1437 Here the implicit set of available trial results consists of all trials
1438 measured at given intended load at the end of search.
1440 The code works correctly only when there if there is at leas one
1441 trial result measured at given load.
1444 all_long_sum = max(goal_duration_sum, good_long_sum + bad_long_sum)
1445 remaining = all_long_sum * (1.0 - goal_exceed_ratio)
1446 quantile_loss_ratio = None
1447 for trial in long_trials:
1448 if quantile_loss_ratio is None or remaining > 0.0:
1449 quantile_loss_ratio = trial.loss_ratio
1450 remaining -= trial.duration
1455 quantile_loss_ratio = 1.0
1457 conditional_throughput = intended_load * (1.0 - quantile_loss_ratio)