docs/ietf/draft-ietf-bmwg-mlrsearch-04.md

   1 ---
   2 title: Multiple Loss Ratio Search
   3 abbrev: MLRsearch
   4 docname: draft-ietf-bmwg-mlrsearch-04
   5 date: 2023-07-10
   6
   7 ipr: trust200902
   8 area: ops
   9 wg: Benchmarking Working Group
  10 kw: Internet-Draft
  11 cat: info
  12
  13 coding: us-ascii
  14 pi:    # can use array (if all yes) or hash here
  15   toc: yes
  16   sortrefs:   # defaults to yes
  17   symrefs: yes
  18
  19 author:
  20       -
  21         ins: M. Konstantynowicz
  22         name: Maciek Konstantynowicz
  23         org: Cisco Systems
  24         email: mkonstan@cisco.com
  25       -
  26         ins: V. Polak
  27         name: Vratko Polak
  28         org: Cisco Systems
  29         email: vrpolak@cisco.com
  30
  31 normative:
  32   RFC1242:
  33   RFC2285:
  34   RFC2544:
  35   RFC9004:
  36
  37 informative:
  38   TST009:
  39     target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf
  40     title: "TST 009"
  41   FDio-CSIT-MLRsearch:
  42     target: https://csit.fd.io/cdocs/methodology/measurements/data_plane_throughput/mlr_search/
  43     title: "FD.io CSIT Test Methodology - MLRsearch"
  44     date: 2022-11
  45   PyPI-MLRsearch:
  46     target: https://pypi.org/project/MLRsearch/0.4.0/
  47     title: "MLRsearch 0.4.0, Python Package Index"
  48     date: 2021-04
  49
  50 --- abstract
  51
  52 This document proposes improvements to [RFC2544] throughput search by
  53 defining a new methodology called Multiple Loss Ratio search
  54 (MLRsearch). The main objectives for MLRsearch are to minimize the
  55 total test duration, search for multiple loss ratios and improve
  56 results repeatibility and comparability.
  57
  58 The main motivation behind MLRsearch is the new set of challenges and
  59 requirements posed by testing Network Function Virtualization
  60 (NFV) systems and other software based network data planes.
  61
  62 MLRsearch offers several ways to address these challenges, giving user
  63 configuration options to select their preferred way.
  64
  65 --- middle
  66
  67 {::comment}
  68     As we use kramdown to convert from markdown,
  69     we use this way of marking comments not to be visible in rendered draft.
  70     https://stackoverflow.com/a/42323390
  71     If other engine is used, convert to this way:
  72     https://stackoverflow.com/a/20885980
  73 {:/comment}
  74
  75 # Purpose and Scope
  76
  77 The purpose of this document is to describe Multiple Loss Ratio search
  78 (MLRsearch), a throughput search methodology optimized for software
  79 DUTs.
  80
  81 Applying vanilla [RFC2544] throughput bisection to software DUTs
  82 results in a number of problems:
  83
  84 - Binary search takes too long as most of trials are done far from the
  85   eventually found throughput.
  86 - The required final trial duration (and pauses between trials) also
  87   prolong the overall search duration.
  88 - Software DUTs show noisy trial results (noisy neighbor problem),
  89   leading to big spread of possible discovered throughput values.
  90 - Throughput requires loss of exactly zero packets, but the industry
  91   frequently allows for small but non-zero losses.
  92 - The definition of throughput is not clear when trial results are
  93   inconsistent.
  94
  95 MLRsearch aims to address these problems by applying the following set
  96 of enhancements:
  97
  98 - Allow searching for multiple search goals, with differing goal loss ratios.
  99   - Each trial result can affect any search goal in principle
 100     (trial reuse).
 101 - Multiple preceding targets for each search goal, earlier ones need
 102   to spend less time on trials.
 103   - Earlier targets also aim at lesser precision.
 104   - Use Forwarding Rate (FR) at maximum offered load
 105     [RFC2285] (section 3.6.2) to initialize the initial targets.
 106 - Take care when dealing with inconsistent trial results.
 107   - Loss ratios goals are handled in an order that minimizes the chance
 108     of interference from later trials to earlier goals.
 109 - Apply several load selection heuristics to save even more time
 110   by trying hard to avoid unnecessarily narrow bounds.
 111
 112 MLRsearch configuration options are flexible enough to
 113 support both conservative settings (unconditionally compliant with [RFC2544],
 114 but longer search duration and worse repeatability) and aggressive
 115 settings (shorter search duration and better repeatability but not
 116 compliant with [RFC2544]).
 117
 118 No part of [RFC2544] is intended to be obsoleted by this document.
 119
 120 # Terminology
 121
 122 When a subsection is defining a term, the first paragraph
 123 acts as a definition. Other paragraphs are treated as a description,
 124 they provide additional details without being needed to define the term.
 125
 126 Definitions should form a directed acyclic graph of dependencies.
 127 If a section contains subsections, the section definition
 128 may depend on the subsection definitions.
 129 Otherwise, any definition may depend on preceding definitions.
 130 In other words, if the section definition were to come after subsections,
 131 there would be no forward dependencies for people reading just definitions
 132 from start to finish.
 133
 134 Descriptions provide motivations and explanations,
 135 they frequently reference terms defined only later.
 136 Motivations in section descriptions are the reason
 137 why section text comes before subsection text.
 138
 139 ## General notions
 140
 141 General notions are the terms defined in this section.
 142
 143 It is useful to define the following notions
 144 before delving into MLRsearch architecture,
 145 as the notions appear in multiple places
 146 with no place being special enough to host definition.
 147
 148 ### General and specific quantities
 149
 150 General quantity is a quantity that may appear multiple times
 151 in MLRsearch specification, perhaps each time in a different role.
 152 The quantity when appearing in a single role is called
 153 a specific quantity.
 154
 155 It is useful to define the general quantity,
 156 so definitions of specific quantities may refer to it.
 157 We say a specific quantity is based on a general quantity,
 158 if the specific quantity definition refers to and
 159 relies on the general quantity definition.
 160
 161 It is natural to name specific quantities by adding an adjective
 162 (or a noun) to the name of the general quantity.
 163 But existing RFCs typically explicitly define a term acting
 164 in a specific role, so the RFC name directly refers to a specific
 165 quantity, while the corresponding general quantity
 166 is defined only implicitly.
 167 Therefore this documents defines general quantities explicitly,
 168 even if the same term already appears in an RFC.
 169
 170 In practice, it is required to know which unit of measurement
 171 is used to accompany a numeric value of each quantity.
 172 The choice of a particular unit of measurement is not important
 173 for MLRsearch specification though, so specific units
 174 mentioned in this document are just examples or recommendations,
 175 not requirements.
 176
 177 When reporting, it is REQUIRED to state the units used.
 178
 179 ### Composite
 180
 181 A composite is a set of named attributes.
 182 Each attribute is either a specific quantity or a composite.
 183
 184 MLRsearch specification frequently groups multiple specific quantities
 185 into a composite. Description of such a composite brings an insight
 186 to motivations why this or other terms are defined as they are.
 187 Such insight will be harder to communicate
 188 with the specific quantities alone.
 189
 190 Also, it simplifies naming of specific quantities, as they usually can
 191 share a noun or adjective referring to their common composite.
 192 Most of relations between composites and their specific quantities
 193 can be described using plain English.
 194
 195 Perhaps the only exception involves referring to specific quantities
 196 as attributes. For example if there is a composite called 'target',
 197 and one of its specific quantities is 'target width' defined using
 198 a general quantity 'width', we can say 'width is one of target attributes'.
 199
 200 ### SUT
 201
 202 As defined in RFC 2285:
 203 The collective set of network devices to which stimulus is offered
 204 as a single entity and response measured.
 205
 206 While RFC 2544 mostly refers to DUT as a single
 207 (network interconnecting) device, section 19 makes it clear
 208 multiple DUTs can be treated as a single system,
 209 so most of RFC 2544 also applies to testing SUT.
 210
 211 MLRsearch specification only refers to SUT (not DUT),
 212 even if it consists of just a single device.
 213
 214 ### Trial
 215
 216 A trial is the part of test described in RFC 2544 section 23.
 217
 218 When traffic has been sent and SUT response has been observed,
 219 we say the trial has been performed, or the trial has been measured.
 220 Before that happens, multiple possibilities for upcoming trial
 221 may be under consideration.
 222
 223 ### Load
 224
 225 Intended, constant load for a trial, usually in frames per second.
 226
 227 Load is the general quantity implied by Constant Load of RFC 1242,
 228 Data Rate of RFC 2544 and Intended Load of RFC 2285.
 229 All three specify this value applies to one (input or output) interface,
 230 so we can talk about unidirectional load also
 231 when bidirectional or multi-port traffic is applied.
 232
 233 MLRsearch does not rely on this distinction, it works also if
 234 the load values correspond to an aggregate rate
 235 (sum over all SUT tested input or output interface unidirectional loads),
 236 as long as all loads share the same semantics.
 237
 238 Several RFCs define useful quantities based on Offered Load
 239 (instead of Intended Load), but MLRsearch specification
 240 works only with (intended) load. Those useful quantities
 241 still serve as motivations for few specific quantities used in MLRsearch
 242 specification.
 243
 244 MLRsearch assumes most load values are positive.
 245 For some (but not all) specific quantities based on load,
 246 zero may also be a valid value.
 247
 248 ### Duration
 249
 250 Intended duration of the traffic for a trial, usually in seconds.
 251
 252 This general quantity does not include any preparation nor waiting
 253 described in section 23 of RFC 2544.
 254 Section 24 of RFC 2544 places additional restrictions on duration,
 255 but those restriction apply only to some of the specific quantities based
 256 on duration.
 257
 258 Duration is always positive in MLRsearch.
 259
 260 ### Duration sum
 261
 262 For a specific set of trials, this is the sum of their durations.
 263
 264 Some of specific quantities based on duration sum are derived quantities,
 265 without a specific set of trials to sum their durations.
 266
 267 Duration sum is never negative in MLRsearch.
 268
 269 ### Width
 270
 271 General quantity defined for an ordered pair (lower and higher)
 272 of load values, which describes a distance between the two values.
 273
 274 The motivation for the name comes from binary search.
 275 The binary search tries to approximate an unknown value
 276 by repeatedly bisecting an interval of possible values,
 277 until the interval becomes narrow enough.
 278 Width of the interval is a specific quantity
 279 and the termination condition compares that
 280 to another specific quantity acting as the threshold.
 281 The threshold value does not have a specific interval associated,
 282 but corresponds to a 'size' of the compared interval.
 283 As size is a word already used in definition of frame size,
 284 a more natural word describing interval is width.
 285
 286 The MLRsearch specification does use (analogues of) upper bound
 287 and lower bound, but does not actually need to talk about intervals.
 288 Still, the intervals are implicitly there, so width is the natural name.
 289
 290 Actually, there are two popular options for defining width.
 291 Absolute width is based on load, the value is the higher load
 292 minus the lower load.
 293 Relative width is dimensionless, the value is the absolute width
 294 divided by the higher load. As intended loads for trials are positive,
 295 relative width is between 0.0 (including) and 1.0 (excluding).
 296
 297 Relative width as a threshold value may be useful for users
 298 who do not presume what is the typical performance of SUT,
 299 but absolute width may be a more familiar concept.
 300
 301 MLRsearch specification does not prescribe which width has to be used,
 302 but widths MUST be either all absolute or all relative,
 303 and it MUST be clear from report which option was used
 304 (it is implied from the unit of measurement of any width value).
 305
 306 ### Loss ratio
 307
 308 The loss ratio is a general quantity, dimensionless floating point value
 309 assumed to be between 0.0 and 1.0, both including.
 310 It is computed as the number of frames forwarded by SUT, divided by
 311 the number of frames that should have been forwarded during the trial.
 312
 313 If the number of frames that should have been forwarded is zero,
 314 the loss ratio is considered to be zero
 315 (but it is better to use high enough loads to prevent this).
 316
 317 Loss ratio is basically the same quantity as Frame Loss Rate of RFC 1242,
 318 just not expressed in percents.
 319
 320 RFC1242 Frame Loss Rate:
 321 Percentage of frames that should have been forwarded
 322 by a network device under steady state (constant)
 323 load that were not forwarded due to lack of
 324 resources.
 325
 326 (RFC2544 restricts Frame Loss Rate to a type of benchmark,
 327 for loads 100% of 'maximum rate', 90% and so on.)
 328
 329 ### Exceed ratio
 330
 331 This general quantity is a dimensionless floating point value,
 332 defined using two duration sum quantities.
 333 One duration sum is referred to as the good duration sum,
 334 the other is referred to as the bad duration sum.
 335 The exceed ratio value is computed as the bad duration sum value
 336 divided by the sum of the two sums. If both sums are zero,
 337 the exceed ratio is undefined.
 338
 339 As there are no negative duration sums in MLRsearch,
 340 exceed ratio values are between 0.0 and 1.0 (both including).
 341
 342 ## Architecture
 343
 344 MLRsearch architecture consists of three main components:
 345 the manager, the controller and the measurer.
 346
 347 The search algorithm is implemented in the controller,
 348 and it is the main focus of this document.
 349
 350 Most implementation details of the manager and the measurer are
 351 out of scope of this document, except when describing
 352 how do those components interface with the controller.
 353
 354 ### Manager
 355
 356 The manager is the component that initializes SUT, traffic generator
 357 (called tester in RFC 2544), the measurer and the controller
 358 with intended configurations. It then handles the execution
 359 to the controller and receives its result.
 360
 361 Managers can range from simple CLI utilities to complex
 362 Continuous Integration systems. From the controller point of view
 363 it is important that no additional configuration (nor warmup)
 364 is needed for SUT and the measurer to perform trials.
 365
 366 The interface between the manager and the controller
 367 is defined in the controller section.
 368
 369 One execution of the controller is called a search.
 370 Some benchmarks may execute multiple searches on the same SUT
 371 (for example when confirming the performance is stable over time),
 372 but in this document only one invocation is concerned
 373 (others may be understood as the part of SUT preparation).
 374
 375 Creation of reports of appropriate format can also be understood
 376 as the responsibility of the manager. This document places requirements
 377 on which information has to be reported.
 378
 379 ### Measurer
 380
 381 The measurer is the component which performs one trial
 382 as described in RFC 2544 section 23, when requested by the controller.
 383
 384 From the controller point of view, it is a function that accepts
 385 trial input and returns trial output.
 386
 387 This is the only way the controller can interact with SUT.
 388 In practice, the measurer has to do subtle decisions
 389 when converting the observed SUT behavior into a single
 390 trial loss ratio value. For example how to deal with
 391 out of order frames or duplicate frames.
 392
 393 On software implementation level, the measurer is a callable,
 394 injected by the manager into the controller instance.
 395
 396 The act of performing one trial (act of turning trial input
 397 to trial output) is called a measurement, or trial measurement.
 398 This way we can talk about trials that were measured already
 399 and trials that are merely planned (not measured yet).
 400
 401 #### Trial input
 402
 403 The load and duration to use in an upcoming trial.
 404
 405 This is a composite.
 406
 407 Other quantities needed by the measurer are assumed to be constant
 408 and set up by the manager before search starts (see traffic profile),
 409 so they do not count as trial input attributes.
 410
 411 ##### Trial load
 412
 413 Trial load is the intended load for the trial.
 414
 415 This is a specific quantity based on load,
 416 directly corresponding to RFC 2285 intended load.
 417
 418 ##### Trial duration
 419
 420 Trial duration is the intended duration for the trial.
 421
 422 This is a specific quantity based on duration, so it specifies
 423 only the traffic part of the trial, not the waiting parts.
 424
 425 #### Traffic profile
 426
 427 Any other configuration values needed by the measurer to perform a trial.
 428
 429 The measurer needs both trial input and traffic profile to perform the trial.
 430 As trial input contains the only values that vary during one the search,
 431 traffic profile remains constant during the search.
 432
 433 Traffic profile when understood as a composite is REQUIRED by RFC 2544
 434 to contain some specific quantities (for example frame size).
 435 Several more specific quantities may be RECOMMENDED.
 436
 437 Depending on SUT configuration (e.g. when testing specific protocols),
 438 additional values need to be included in the traffic profile
 439 and in the test report. (See other IETF documents.)
 440
 441 #### Trial ouput
 442
 443 A composite consisting of trial loss ratio
 444 and trial forwarding rate.
 445
 446 Those are the only two specific quantities (among other quantities
 447 possibly measured in the trial, for example offered load)
 448 that are important for MLRsearch.
 449
 450 ##### Trial loss ratio
 451
 452 Trial loss ratio is a specific quantity based on loss ratio.
 453 The value is related to a particular measured trial,
 454 as measured by the measurer.
 455
 456 ##### Trial forwarding rate
 457
 458 Trial forwarding rate is a derived quantity.
 459 It is computed as one minus trial loss ratio,
 460 that multiplied by trial load.
 461
 462 Despite the name, the general quantity this specific quantity
 463 corresponds to is load (not rate).
 464 The name is inspired by RFC 2285, which defines Forwarding Rate
 465 specific to one output interface.
 466
 467 As the definition of loss ratio is not neccessarily per-interface
 468 (one of details left for the measurer), using the definition above
 469 (instead of RFC 2285) makes sure trial forwarding rate
 470 is always between zero and the trial load (both including).
 471
 472 #### Trial result
 473
 474 Trial result is a composite consisting of trial input attributes
 475 and trial output attributes.
 476
 477 Those are all specific quantites related to a measured trial MLRsearch needs.
 478
 479 While distinction between trial input and output is important
 480 when defining the interface between the controller and the measurer,
 481 it is easier to talk about trial result
 482 when describing how measured trials influence the controller behavior.
 483
 484 ### Controller
 485
 486 The component of MLRsearch architecture that calls the measurer
 487 and returns conditional throughputs to the manager.
 488
 489 This component implements the search algorithm,
 490 the main content of this document.
 491
 492 Contrary to Throughput as defined in RFC 1242,
 493 the definition of conditional throughput is quite sensitive
 494 to the controller input (as provided by the manager),
 495 and its full definition needs several terms
 496 which would otherwise be hidden as internals of the controller
 497 implementation.
 498
 499 The ability of conditional throughput to be less sensitive
 500 to performance variance, and the ability of the controller
 501 to find conditional throughputs for multiple search goals
 502 within one search (and in short overall search time)
 503 are strong enough motivations for the need of increased complexity.
 504
 505 ### Controller input
 506
 507 A composite of max load, min load, and a set of search goals.
 508
 509 The search goals (as elements of the set of search goals)
 510 are usually not named and unordered.
 511
 512 It is fine if all search goals of the set have the same value
 513 of a particular attribute. In that case, the common value
 514 may be treated as a global attribute (similarly to max and min load).
 515
 516 The set of search goals MUST NOT be empty.
 517 Two search goals within the set MUST differ in at least one attribute.
 518 The manager MAY avoid both issues by presenting empty report
 519 or de-duplicating the search goals, but it is RECOMMENDED
 520 for the manager to raise an error to its caller,
 521 as the two conditions suggest the test is improperly configured.
 522
 523 #### Max load
 524
 525 Max load is a specific quantity based on load.
 526 No trial load is ever higher than this value.
 527
 528 RFC 2544 section 20 defines maximum frame rate
 529 based on theoretical maximum rate for the frame size on the media.
 530 RFC 2285 section 3.5.3 specifies Maximum offered load (MOL)
 531 which may be lower than maximum frame rate.
 532 There may be other limitations preventing high loads,
 533 for examples resources available to traffic generator.
 534
 535 The manager is expected to provide a value that is not greater
 536 than any known limitation. Alternatively, the measurer
 537 is expected to work at max load, possibly reporting as lost
 538 any frames that were not able to leave Traffic Generator.
 539
 540 From the controller point of view, this is merely a global upper limit
 541 for any trial load candidates.
 542
 543 #### Min load
 544
 545 Min load is a specific quantity based on load.
 546 No trial load is ever lower than this value.
 547
 548 The motivation of this quantity is to prevent trials
 549 with too few frames sent to SUT.
 550
 551 Also, practically if a SUT is able to reach only very small
 552 forwarding rates (min load indirectly serves as a threshold for how small),
 553 it may be considered faulty (or perhaps the test is misconfigured).
 554
 555 #### Search goal
 556
 557 A composite of 7 attributes (see subsections).
 558
 559 If not otherwise specified, 'goal' always refers to a search goal
 560 in this document.
 561
 562 The controller input may contain multiple search goals.
 563 The name Multiple Loss Ratio search was created back when
 564 goal loss ratio was the only attribute allowed to vary between goals.
 565
 566 Each goal will get its conditional throughput discovered
 567 and reported at the end of the search.
 568
 569 The definitions of the 7 attributes are not very informative by themselves.
 570 Their motivation (and naming) becomes more clear
 571 from the impact they have on conditional throughput.
 572
 573 ##### Goal loss ratio
 574
 575 A specific quantity based on loss ratio.
 576 A threshold value for trial loss ratios.
 577 MUST be lower than one.
 578
 579 Trial loss ratio values will be compared to this value,
 580 a trial will be considered bad if its loss ratio is higher than this.
 581
 582 For example, RFC 2544 throughput has goal loss ratio of zero,
 583 a trial is bad once a sigle frame is lost.
 584
 585 Loss ratio of one would classify each trial as good (regardless of loss),
 586 which is not useful.
 587
 588 ##### Goal initial trial duration
 589
 590 A specific quantity based on duration.
 591 A threshold value for trial durations.
 592 MUST be positive.
 593
 594 MLRsearch is allowed to use trials as short as this when focusing
 595 on this goal.
 596 The conditional throughput may be influenced by shorter trials,
 597 (measured when focusing on other search goals).
 598
 599 {::comment}
 600     FIXME: Should shorter trials be explicitly ignored?
 601 {:/comment}
 602
 603 ##### Goal final trial duration
 604
 605 A specific quantity based on duration.
 606 A threshold value for trial durations.
 607 MUST be no smaller than goal initial trial duration.
 608
 609 MLRsearch is allowed to use trials as long as this when focusing
 610 on this goal. If more data is needed, repeated trials
 611 at the same load and duration are requested by the controller.
 612
 613 ##### Goal min duration sum
 614
 615 A specific quantity based on duration sum.
 616 A threshold value for a particular duration sum.
 617
 618 MLRsearch requires at least this amount of (effective) trials
 619 for a particular load to become part of MLRsearch outputs.
 620
 621 It is possible (though maybe not prectical) for goal min duration sum
 622 to be smaller than goal final trial duration.
 623
 624 In practice, the sum of durations actually spent on trial measurement
 625 can be smaller (when trial results are quite one-sided) or even larger
 626 (in presence of shorter-than-final trial duration results at the same load).
 627
 628 If the sum of all (good and bad) long trials is at least this,
 629 and there are no short trials, then the load is guaranteed
 630 to be classified as either an upper or a lower bound.
 631
 632 In some cases, the classification is known sooner,
 633 when the 'missing' trials cannot change the outcome.
 634
 635 When short trials are present, the logic is more complicated.
 636
 637 ##### Goal exceed ratio
 638
 639 A specific quantity based on exceed ratio.
 640 A threshold value for particulat sets of trials.
 641
 642 An attribute used for classifying loads into upper and lower bounds.
 643
 644 If the duration sum of all (current duration) trials is at least
 645 min duration sum, and more than this percentage of the duration sum
 646 comes from bad trials, this load is an upper bound.
 647
 648 If there are shorter duration trials, the logic is more complicated.
 649
 650 ##### Goal width
 651
 652 A specific quantity based on width.
 653 A threshold value for a particular width.
 654 MUST be positive.
 655
 656 This defines the exit condition for this search goal.
 657
 658 Relevant bounds (of the final target) need to be this close
 659 before conditional throughput can be reported.
 660
 661 ##### Preceding targets
 662
 663 A non-negative integer affecting the behavior of the controller.
 664
 665 How many additional non-final targets to add.
 666 Each next preceding target has double width
 667 and min duration sum geometrically closer to initial trial duration.
 668
 669 The usage of preceding targets is an important source
 670 of MLRsearch time savings (compared to simpler search algorithms).
 671
 672 Having this value configurable lets the manager
 673 tweak the overall search duration based on presumed knowledge
 674 of SUT performance stability.
 675
 676 ### Controller internals
 677
 678 Terms not directly corresponding to the controller's input nor output,
 679 but needed indirectly as dependencies of the conditional throughput
 680 definition.
 681
 682 Following these definitions specifies virtually all of the controller
 683 (MLRsearch algorithm) logic.
 684
 685 #### Pre-initial trials
 686
 687 Up to three special trials executed at the start of the search.
 688 The first trial load is max load,
 689 subsequent trial load are computed from preceding trial
 690 forwarding rate.
 691
 692 The main loop of the controller logic needs at least one trial result,
 693 and time is saved if the trial results are close to future conditional
 694 throughput values.
 695
 696 The exact way to compute load for second and third trial
 697 (and whether even measure second or third trial)
 698 are not specified here, as the implementation details
 699 have negligible effect on the reported conditional throughput.
 700
 701 {::comment}
 702     TODO: Still, recommend something like this:
 703     Loads need to fit several different initial targets at once.
 704     Duration is the largest among initial trial durations,
 705     loads are computed from forwarding rate an smallest loss ratio goal.
 706     Also, the initial target current width is set based on these.
 707 {:/comment}
 708
 709 #### Search target
 710
 711 A composite of 5 specific quantites (see subsections).
 712 Frequently called just target.
 713
 714 Similar to (but distinct from) the search goal.
 715
 716 Each search goal prescribes a final target,
 717 probably with a chain of preceding targets.
 718
 719 More details in the Derived targets section.
 720
 721 ##### Target loss ratio
 722
 723 Same as loss ratio of the corresponding goal.
 724
 725 ##### Target exceed ratio
 726
 727 Same as exceed ratio of the corresponding goal.
 728
 729 ##### Target width
 730
 731 Similar to goal width attribute.
 732 Doubled from goal width for each level of preceding target.
 733
 734 ##### Target min duration sum
 735
 736 Similar to goal min duration sum attribute.
 737 Geometrically interpolated between
 738 initial target duration and goal min duration sum.
 739
 740 ##### Target trial duration
 741
 742 When MLRsearch focuses on this target, it measures trials
 743 with this duration.
 744 The value is equal to the minimum of goal final trial duration
 745 and target min duration sum.
 746
 747 Also, this value is used to classify trial results
 748 as short (if trial duration is shorter than this) or long.
 749
 750 #### Derived targets
 751
 752 After receiving the set of search goals,
 753 MLRsearch internally derives a set of search targets.
 754
 755 The derived targets can be seen as forming a chain,
 756 from initial target to final target.
 757 The chain is linked by a reference from a target to its preceding
 758 (towarsds initial) target.
 759
 760 The reference may be implemented as 6th attribute od target.
 761
 762 ##### Final target
 763
 764 The final target is the target where the most of attribute values
 765 are directly copied from the coresponding search goal.
 766 Final target width is the same as goal width,
 767 final target trial duration is the same as goal final trial duration,
 768 and final target min duration sum is the same
 769 as the goal min duration sum.
 770
 771 The conditional throughput is found when focusing on the final target.
 772 All non-final targets do not directly affect the conditional throughput,
 773 they are there just as an optimization.
 774
 775 ##### Preceding target
 776
 777 Each target may have a preceding target.
 778 Goal attribute Preceding targets governs how many targets are created
 779 in addition to the final target corresponding to the search goal.
 780
 781 Any preceding target has double width, meaning one balanced bisection
 782 is needed to reduce preceding target width to the next target width.
 783
 784 Preceding target min duration sum is exponentially smaller,
 785 aiming for prescribed initial target min duration sum.
 786
 787 Preceding target trial duration is either its min duration sum,
 788 or the corresponding goal's final trial duration, whichever is smaller.
 789
 790 As the preceding min duration sum is shorter than the next duration sum,
 791 MLRsearch is able to achieve the preceding target width
 792 sooner (than with the next target min duration sum).
 793
 794 This way an approximation of the conditional throughput is found,
 795 with the next target needing not as much time to improve the approximation
 796 (compared to not starting with the approximation).
 797
 798 ##### Initial target
 799
 800 Initial target is a target without any other target preceding it.
 801 Initial target min duration sum is equal to the corresponding goal's
 802 initial trial duration.
 803
 804 As a consequence, initial target trial duration is equal to its min duration sum.
 805
 806 #### Trial classification
 807
 808 Any trial result can be classified according to any target along two axes.
 809
 810 The two classifications are independent.
 811
 812 This classification is important for defining the conditional throughput.
 813
 814 ##### Short trial
 815
 816 If the (measured) trial duration is shorter than
 817 the target trial duration, the trial is called long.
 818
 819 ##### Long trial
 820
 821 If the (measured) trial duration is not shorter than
 822 the target trial duration, the trial is called long.
 823
 824 ##### Bad trial
 825
 826 If the (measured) trial loss ratio is larger than the target loss ratio,
 827 the trial is called bad.
 828
 829 For example, if the target loss ratio is zero,
 830 a trial is bad as soon as one frame was lost.
 831
 832 ##### Good trial
 833
 834 If the (measured) trial loss ratio is not larger than the target loss ratio,
 835 the trial is called good.
 836
 837 For example, if the target loss ratio is zero,
 838 a trial is good only when there were no frames lost.
 839
 840 #### Load stat
 841
 842 A composite of 8 quantities (see subsections)
 843 The quantites depend on a target and a load,
 844 and are computed from all trials measured at that load so far.
 845
 846 The MLRsearch output is the conditional througput,
 847 which is a specific quantity based on load.
 848 As MLRsearch may measure multiple trials at the same load,
 849 and those trials may not have the same duration,
 850 we need a way to classify a set of trial results at the same load.
 851
 852 As the logic is not as straightforward as in other parts
 853 of MLRsearch algorithm, it is best defined using the following
 854 derived quantities.
 855
 856 Load stat is the composite for one load and one target.
 857 Set of load stats for one load an all targets is commonly called load stats.
 858
 859 ##### Long good duration sum
 860
 861 Sum of durations of all long good trials
 862 (at this load, according to this target).
 863
 864 ##### Long bad duration sum
 865
 866 Sum of durations of all long bad trials
 867 (at this load, according to this target).
 868
 869 ##### Short good duration sum
 870
 871 Sum of durations of all short good trials
 872 (at this load, according to this target).
 873
 874 ##### Short bad duration sum
 875
 876 Sum of durations of all short bad trials
 877 (at this load, according to this target).
 878
 879 ##### Effective bad duration sum
 880
 881 One divided by tagret exceed ratio, that plus one.
 882 Short good duration sum divided by that.
 883 Short bad duration sum minus that, or zero if that would be negative.
 884 Long bad duration sum plus that is the effective bad duration sum.
 885
 886 Effective bad duration sum is the long bad duration sum
 887 plus some fraction of short bad duration sum.
 888 The fraction is between zero and one (both possibly including).
 889
 890 If there are no short good trials, effective bad duration sum
 891 becomes the duration sum of all bad trials (long or short).
 892
 893 If an exceed ratio computed from short good duration sum
 894 and short bad duration sum is equal or smaller than the target exceed ratio,
 895 effective bad duration sum is equal to just long bad duration sum.
 896
 897 Basically, short good trials can only lessen the impact
 898 of short bad trials, while short bad trials directly contribute
 899 (unless lessened).
 900
 901 A typical example of why a goal needs higher final trial duration
 902 than initial trial duration is when SUT is expected to have large buffers,
 903 so a trial may be too short to see frame losses due to
 904 a buffer becoming full. So a short good trial does not give strong information.
 905 On the other hand, short bad trial is a strong hint SUT would lose many frames
 906 at that load and long duration.
 907 But if there is a mix of short bad and short good trials,
 908 MLRsearch should not cherry-pick only the short bad ones.
 909
 910 The presented way of computing the effective bad duration sum
 911 aims to be a fair treatment of short good trials.
 912
 913 If the target exceed ratio is zero, the given definition contains
 914 positive infinty as an intermediate value, but still simplifies
 915 to a finite result (long bad duration sum plus short bad duration sum).
 916
 917 ##### Missing duration sum
 918
 919 The target min duration sum minus effective bad duration sum
 920 and minus long good duration sum, or zero if that would be negative.
 921
 922 MLRsearch may need up to this duration sum of additional long trials
 923 before classifing the load.
 924
 925 ##### Optimistic exceed ratio
 926
 927 The specific quantity based on exceed ratio, where bad duration sum is
 928 the effective bad duration sum, and good duration sum is
 929 the long good duration sum plus the missing duration sum.
 930
 931 This is the value MLRsearch would compare to target exceed ratio
 932 assuming all of the missing duration sum ends up consisting of long good trials.
 933
 934 If there was a bad long trial, optimistic exceed ratio
 935 becomes larger than zero.
 936 Additionally, if the target exceed ratio is zero, optimistic exceed ratio
 937 becomes larger than zero even on one short bad trial.
 938
 939 ##### Pessimistic exceed ratio
 940
 941 The specific quantity based on exceed ratio, where bad duration sum is
 942 the effective bad duration sum plus the missing duration sum,
 943 and good duration sum is the long good duration sum.
 944
 945 This is the value MLRsearch would compare to target exceed ratio
 946 assuming all of the missing duration sum ends up consisting of bad good trials.
 947
 948 Note that if the missing duration sum is zero,
 949 optimistic exceed ratio becomes equal to pessimistic exceed ratio.
 950
 951 This is the role target min duration sum has,
 952 it guarantees the two load exceed ratios eventually become the same.
 953 Otherwise, pessimistic exceed ratio
 954 is always bigger than the optimistic exceed ratio.
 955
 956 Depending on trial results, the missing duration sum may not be large enough
 957 to change optimistic (or pessimistic) exceed ratio
 958 to move to the other side compared to target exceed ratio.
 959 In that case, MLRsearch does not need to measure more trials
 960 at this load when focusing on this target.
 961
 962 #### Target bounds
 963
 964 With respect to a target, some loads may be classified as upper or lower bound,
 965 and some of the bounds are treated as relevant.
 966
 967 The subsequent parts of MLRsearch rely only on relevant bounds,
 968 without the need to classify other loads.
 969
 970 ##### Upper bound
 971
 972 A load is classified as an upper bound for a target,
 973 if and only if both optimistic exceed ratio
 974 and pessimstic load exceed ratio are larger than the target exceed ratio.
 975
 976 During the search, it is possible there is no upper bound,
 977 for example because every measured load still has too high
 978 missing duration sum.
 979
 980 If the target exceed ratio is zero, and the load has at least one bad trial
 981 (short or long), the load becomes an upper bound.
 982
 983 ##### Lower bound
 984
 985 A load is classified as a lower bound for a target,
 986 if and only if both optimistic exceed ratio
 987 and pessimstic load exceed ratio are no larger than the target exceed ratio.
 988
 989 During the search, it is possible there is no lower bound,
 990 for example because every measured load still has too high
 991 missing duration sum.
 992
 993 If the target exceed ratio is zero, all trials at the load of
 994 a lower bound must be good trials (short or long).
 995
 996 Note that so far it is possible for a lower bound to be higher
 997 than an upper bound.
 998
 999 ##### Relevant upper bound
1000
1001 For a target, a load is the relevant upper bound,
1002 if and only if it is an upper bound, and all other upper bounds
1003 are larger (as loads).
1004
1005 In some cases, the max load when classified as a lower bound
1006 is also effectively treated as the relevant upper bound.
1007 (In that case both relevant bounds are equal.)
1008
1009 If that happens for a final target at the end of the search,
1010 the controller output may contain max load as the relevant upper bound
1011 (even if the goal exceed ratio was not exceeded),
1012 signalling SUT performs well even at max load.
1013
1014 If the target exceed ratio is zero, the relevant upper bound
1015 is the smallest load where a bad trial (short or long) has been measured.
1016
1017 ##### Relevant lower bound
1018
1019 For a target, a load is the relevant lower bound if two conditions hold.
1020 Both optimistic exceed ratio and pessimstic load exceed ratio
1021 are no larger than the target exceed ratio,
1022 and there is no smaller load classified as an upper bound.
1023
1024 This is a second place where MLRsearch is not symmetric
1025 (the first place was effective bad duration sum).
1026
1027 While it is not likely for a MLRsearch to find a smaller upper bound
1028 and a larger load satisfying first condition for the lower bound,
1029 it still may happen and MLRsearch has to deal with it.
1030 The second condition makes sure the relevant lower bound
1031 is smaller than the relevant upper bound.
1032
1033 In some cases, the min load when classified as an upper bound
1034 is also effectively treated as the relevant lower bound.
1035 (In that case both relevant bounds are equal.)
1036
1037 If that happens for a final target at the end of the search,
1038 the controller output may contain min load as the relevant lower bound
1039 even if the exceed ratio was 'overstepped',
1040 signalizing the SUT does not even reach the minimal required performance.
1041
1042 The manager has to make sure this is distingushed in report
1043 from cases where min rate is a legitimate conditional throughput
1044 (e.g. the exceed ratio was not overstepped at the min load).
1045
1046 ##### Relevant bounds
1047
1048 The pair of the relevant lower bound and the relevant upper bound.
1049
1050 Useful for determining the width of the relevant bounds.
1051 Any of the bounds may be the effective one (max load or min load).
1052
1053 A goal is achieved (at the end of the search) when the final target's
1054 relevant bounds have width no larger than the goal width.
1055
1056 #### Candidate selector
1057
1058 A stateful object (a finite state machine)
1059 focusing on a single target, used to determine next trial input.
1060
1061 Initialized for a pair of targets:
1062 the current target and its preceding target (if any).
1063
1064 Private state (not shared with other selectors) consists of mode and flags.
1065 Public state (shared with all selectors) is the actual relevant bounds
1066 for both targets (current and precedinig).
1067
1068 After accepting a trial result, each selector can nominate
1069 one candidate (or no candidate) for the next trial measurement.
1070
1071 ##### Current target
1072
1073 This is the target this selector tries to achieve.
1074
1075 ##### Preceding target
1076
1077 The target (if any) preceding to the current target.
1078
1079 While this selector does not focus on the preceding target,
1080 the relevant bounds for the preceding target are used as hints
1081 when the current bound does not have enough of its relevant bounds.
1082
1083 ##### Candidate
1084
1085 The trial input (if any) this selecor nominates.
1086
1087 The trial duration attribute is always the current target trial duration.
1088 The trial load attribute depends on the selector state.
1089
1090 Candidates have defined ordering, to simplify finding the winner.
1091 If load differs, the candidate with lower load is preferred.
1092 If load is the same but duration differs, the candidate
1093 with larger duration is preferred.
1094
1095 ##### Selector mode
1096
1097 During its lifetime, selector proceeds through the following modes.
1098 In order, but some modes may be skipped or revisited.
1099
1100 Each mode has its own strategy of determining the candidate load (if any).
1101
1102 ###### Waiting
1103
1104 Not enough relevant bounds (even for the preceding target).
1105 In this mode, the selector abstains from nominating a candidate.
1106
1107 This selector leaves this mode when preceding target's selector is done.
1108
1109 ###### Halving
1110
1111 Candidate is in the middle of the relevant bounds of the preceding target.
1112
1113 If the relevant bounds are narrow enough already, this mode is skipped.
1114 As the preceding target had double width, just one halving load
1115 needs to be measured.
1116
1117 Selector uses a flag to avoid re-entering this mode
1118 once it finished measuring the halved load.
1119
1120 ###### Upgrading
1121
1122 This mode activates when one relevant bound for the current target is present
1123 and there is a matching relevant bound of the preceding target
1124 within the current target width.
1125 Candidate is the load of the matching bound from the preceding target.
1126
1127 At most one bound load is measured, depending on halving outcome.
1128 Private flags are used to avoid upgrading at later times
1129 once selector finished measuring the upgraded load.
1130
1131 ###### Extending
1132
1133 Refined already but the other relevant bound for the current target
1134 is still missing.
1135 Nominate new candidate according to external search.
1136 Initial target selectors skip all previous modes.
1137
1138 A private value is used to track the width to be used in next load extension
1139 (increasing geometrically).
1140 For initial target selectors, the starting width may be chosen
1141 based on pre-initial trial results.
1142
1143 If both relevant bounds are present at the current load,
1144 but the lower bound is far away (compared to tracked width),
1145 the candidate from this mode is preferred (as long as the load
1146 is larger than the candidate load of bisecting mode).
1147
1148 ###### Bisecting
1149
1150 Both relevant bounds for the current target are available, but they are too far
1151 from each other. Candidate is in the middle.
1152
1153 Contrary to halving, the candidate load does not need to be at the exact middle.
1154 For example if the width of the current relevant bounds
1155 is three times as large as the target width,
1156 it is advantageous to split the interval in 1:2 ratio
1157 (choosing the lower candidate load), as it can save one bisect.
1158
1159 ###### Done
1160
1161 Both relevant bounds for the current target are available,
1162 the width is no larger than the target width.
1163 No candidate.
1164
1165 If a selector reaches the done state,
1166 it is still possible later trials invalidate its relevant lower bound
1167 (by proving a lower load is in fact a new uper bound),
1168 making the selector transition into extending or bisecting mode.
1169
1170 ##### Active selector
1171
1172 Derived from a common goal, the earliest selector which nominates a candidate
1173 is considered to be the active selector for this goal.
1174 Candidates from other selectors of the same goal are ignored.
1175
1176 It is quite possible selectors focusing on other goals
1177 have already found a lower bound relevant to multiple targets in a chain.
1178 In that case, we want the most-initial of the target selectors
1179 (not already in done mode) to have the nomination.
1180
1181 Otherwise (when in extending mode and missun relevant upper bound)
1182 the closer-to-final selectors would nominate candidates
1183 at lower load but at too high duration sum,
1184 preventing some of the time savings.
1185
1186 ##### Winner
1187
1188 If the candidate previously nominated by a selector was the one
1189 that got measured, the candidate is called a winner.
1190
1191 A selector observing its previous candidate was a winer
1192 can use simplified logic when determining the mode,
1193 as it knows no other selectors may have changed the relevant loads unexpectedly.
1194
1195 ### Controller output
1196
1197 The output object the controller returns to the manager
1198 is a mapping assigning each search goal its conditional output (if it exists).
1199
1200 The controller MAY include more information (if manager accepts it),
1201 for example load stat at relevant bounds.
1202
1203 There MAY be several ways how to communicate the fact a conditional output
1204 does not exist (e.g. min load is classified as an upper bound).
1205 The manager MUST NOT present min load as a conditional output in that case.
1206
1207 If max load is a lower bound, it leads to a valid conditional output value.
1208
1209 #### Conditional throughput
1210
1211 The conditional throughput is the average of trial forwarding rates
1212 across long good trials measured at the (offered load classified as)
1213 relevant lower bound (for the goal, at the end of the search).
1214 The average is the weighted arithmetic mean, weighted by trial duration.
1215
1216 If the goal exceed ratio is zero, the definition of the relevant bounds
1217 simplifies significantly.
1218 If additionally the goal loss ratio is zero,
1219 and the goal min duration sum is equal to goal final trial duration,
1220 conditional throughput becomes conditionally compliant with RFC 2544 throughput.
1221 If the goal final trial duration is at least 60 seconds,
1222 the conditional througput becomes unconditionally compliant
1223 with RFC 2544 throughput.
1224
1225 # Problems
1226
1227 ## Long Test Duration
1228
1229 Emergence of software DUTs, with frequent software updates and a
1230 number of different packet processing modes and configurations, drives
1231 the requirement of continuous test execution and bringing down the test
1232 execution time.
1233
1234 In the context of characterising particular DUT's network performance, this
1235 calls for improving the time efficiency of throughput search.
1236 A vanilla bisection (at 60sec trial duration for unconditional [RFC2544]
1237 compliance) is slow, because most trials spend time quite far from the
1238 eventual throughput.
1239
1240 [RFC2544] does not specify any stopping condition for throughput search,
1241 so users can trade-off between search duration and achieved precision.
1242 But, due to exponential behavior of bisection, small improvement
1243 in search duration needs relatively big sacrifice in the throughput precision.
1244
1245 ## DUT within SUT
1246
1247 [RFC2285] defines:
1248 - *DUT* as
1249   - The network forwarding device to which stimulus is offered and
1250     response measured [RFC2285] (section 3.1.1).
1251 - *SUT* as
1252   - The collective set of network devices to which stimulus is offered
1253     as a single entity and response measured [RFC2285] (section 3.1.2).
1254
1255 [RFC2544] specifies a test setup with an external tester stimulating the
1256 networking system, treating it either as a single DUT, or as a system
1257 of devices, an SUT.
1258
1259 In case of software networking, the SUT consists of a software program
1260 processing packets (device of interest, the DUT),
1261 running on a server hardware and using operating system functions as appropriate,
1262 with server hardware resources shared across all programs
1263 and the operating system.
1264
1265 DUT is effectively "nested" within SUT.
1266
1267 Due to a shared multi-tenant nature of SUT, DUT is subject to
1268 interference (noise) coming from the operating system and any other
1269 software running on the same server. Some sources of noise can be
1270 eliminated (e.g. by pinning DUT program threads to specific CPU cores
1271 and isolating those cores to avoid context switching). But some
1272 noise remains after all such reasonable precautions are applied. This
1273 noise does negatively affect DUT's network performance. We refer to it
1274 as an *SUT noise*.
1275
1276 DUT can also exhibit fluctuating performance itself, e.g. while performing
1277 some "stop the world" internal stateful processing. In many cases this
1278 may be an expected per-design behavior, as it would be observable even
1279 in a hypothetical scenario where all sources of SUT noise are
1280 eliminated. Such behavior affects trial results in a way similar to SUT
1281 noise. We use *noise* as a shorthand covering both *DUT fluctuations* and
1282 genuine SUT noise.
1283
1284 A simple model of SUT performance consists of a baseline *noiseless performance*,
1285 and an additional noise. The baseline is assumed to be constant (enough).
1286 The noise varies in time, sometimes wildly. The noise can sometimes be negligible,
1287 but frequently it lowers the observed SUT performance in a trial.
1288
1289 In this model, SUT does not have a single performance value, it has a spectrum.
1290 One end of the spectrum is the noiseless baseline,
1291 the other end is a *noiseful performance*. In practice, trial results
1292 close to the noiseful end of the spectrum happen only rarely.
1293 The worse performance, the more rarely it is seen in a trial.
1294
1295 Focusing on DUT, the benchmarking effort should aim
1296 at eliminating only the SUT noise from SUT measurement.
1297 But that is not really possible, as there are no realistic enough models
1298 able to distinguish SUT noise from DUT fluctuations.
1299
1300 However, assuming that a well-constructed SUT has the DUT as its
1301 performance bottleneck, the "DUT noiseless performance" can be defined
1302 as the noiseless end of SUT performance spectrum. (At least for
1303 throughput. For other quantities such as latency there will be an
1304 additive difference.) By this definition, DUT noiseless performance
1305 also minimizes the impact of DUT fluctuations.
1306
1307 In this document, we reduce the "DUT within SUT" problem to estimating
1308 the noiseless end of SUT performance spectrum from a limited number of
1309 trial results.
1310
1311 Any improvements to throughput search algorithm, aimed for better
1312 dealing with software networking SUT and DUT setup, should employ
1313 strategies recognizing the presence of SUT noise, and allow discovery of
1314 (proxies for) DUT noiseless performance
1315 at different levels of sensitivity to SUT noise.
1316
1317 ## Repeatability and Comparability
1318
1319 [RFC2544] does not suggest to repeat throughput search. And from just one
1320 throughput value, it cannot be determined how repeatable that value is.
1321 In practice, poor repeatability is also the main cause of poor
1322 comparability, e.g. different benchmarking teams can test the same SUT
1323 but get different throughput values.
1324
1325 [RFC2544] throughput requirements (60s trial, no tolerance to single frame loss)
1326 force the search to fluctuate close the noiseful end of SUT performance
1327 spectrum. As that end is affected by rare trials of significantly low
1328 performance, the resulting throughput repeatability is poor.
1329
1330 The repeatability problem is the problem of defining a search procedure
1331 which reports more stable results
1332 (even if they can no longer be called "throughput" in [RFC2544] sense).
1333 According to baseline (noiseless) and noiseful model, better repeatability
1334 will be at the noiseless end of the spectrum.
1335 Therefore, solutions to the "DUT within SUT" problem
1336 will help also with the repeatability problem.
1337
1338 Conversely, any alteration to [RFC2544] throughput search
1339 that improves repeatability should be considered
1340 as less dependent on the SUT noise.
1341
1342 An alternative option is to simply run a search multiple times, and report some
1343 statistics (e.g. average and standard deviation). This can be used
1344 for "important" tests, but it makes the search duration problem even
1345 more pronounced.
1346
1347 ## Throughput with Non-Zero Loss
1348
1349 [RFC1242] (section 3.17) defines throughput as:
1350     The maximum rate at which none of the offered frames
1351     are dropped by the device.
1352
1353 and then it says:
1354     Since even the loss of one frame in a
1355     data stream can cause significant delays while
1356     waiting for the higher level protocols to time out,
1357     it is useful to know the actual maximum data
1358     rate that the device can support.
1359
1360 Contrary to that, many benchmarking teams settle with non-zero
1361 (small) loss ratio as the goal for a "throughput rate".
1362
1363 Motivations are many: modern protocols tolerate frame loss better;
1364 trials nowadays send way more frames within the same duration;
1365 impact of rare noise bursts is smaller as the baseline performance
1366 can compensate somewhat by keeping the loss ratio below the goal;
1367 if SUT noise with "ideal DUT" is known, it can be set as the loss ratio goal.
1368
1369 Regardless of validity of any and all similar motivations,
1370 support for non-zero loss goals makes any search algorithm more user-friendly.
1371 [RFC2544] throughput is not friendly in this regard.
1372
1373 Searching for multiple goal loss ratios also helps to describe the SUT
1374 performance better than a single goal result. Repeated wide gap between
1375 zero and non-zero loss conditional throughputs indicates
1376 the noise has a large impact on the overall SUT performance.
1377
1378 It is easy to modify the vanilla bisection to find a lower bound
1379 for intended load that satisfies a non-zero-loss goal,
1380 but it is not that obvious how to search for multiple goals at once,
1381 hence the support for multiple loss goals remains a problem.
1382
1383 ## Inconsistent Trial Results
1384
1385 While performing throughput search by executing a sequence of
1386 measurement trials, there is a risk of encountering inconsistencies
1387 between trial results.
1388
1389 The plain bisection never encounters inconsistent trials.
1390 But [RFC2544] hints about possibility if inconsistent trial results in two places.
1391 The first place is section 24 where full trial durations are required, presumably
1392 because they can be inconsistent with results from shorter trial durations.
1393 The second place is section 26.3 where two successive zero-loss trials
1394 are recommended, presumably because after one zero-loss trial
1395 there can be subsequent inconsistent non-zero-loss trial.
1396
1397 Examples include:
1398
1399 - a trial at the same load (same or different trial duration) results
1400   in a different packet loss ratio.
1401 - a trial at higher load (same or different trial duration) results
1402   in a smaller packet loss ratio.
1403
1404 Any robust throughput search algorithm needs to decide how to continue
1405 the search in presence of such inconsistencies.
1406 Definitions of throughput in [RFC1242] and [RFC2544] are not specific enough
1407 to imply a unique way of handling such inconsistencies.
1408
1409 Ideally, there will be a definition of a quantity which both generalizes
1410 throughput for non-zero-loss (and other possible repeatibility enhancements),
1411 while being precise enough to force a specific way to resolve trial
1412 inconsistencies.
1413 But until such definition is agreed upon, the correct way to handle
1414 inconsistent trial results remains an open problem.
1415
1416 # How the problems are addressed
1417
1418 Configurable loss ratio in MLRsearch search goals are there
1419 in direct support for non-zero-loss conditional throughput.
1420 In practice the conditional throughput results' stability
1421 increases with higher loss ratio goals.
1422
1423 Multiple trials with noise tolerance enhancement,
1424 as implemented in MLRsearch using non-zero goal exceed ratio value,
1425 also indirectly increases the result stability.
1426 That allows MLRsearch to achieve all the benefits
1427 of Binary Search with Loss Verification,
1428 as recommended in [RFC9004] (section 6.2)
1429 and specified in [TST009] (section 12.3.3).
1430
1431 The main factor improving the overall search time is the introduction
1432 of preceding targets. Less impactful time savings
1433 are achieved by pre-initial trials, halving mode
1434 and smart splitting in bisecting mode.
1435
1436 In several places, MLRsearch is "conservative" when handling
1437 (potentially) inconsistent results. This includes the requirement
1438 for the relevant lower bound to be smaller than any upper bound,
1439 the unequal handling of good and bad short trials,
1440 and preference to lower load when choosing the winner among candidates.
1441
1442 While this does no guarantee good search stability
1443 (goals focusing on higher loads may still invalidate existing bounds
1444 simply by requiring larger min duration sums),
1445 it lowers the change of SUT having an area of poorer performance
1446 below the reported conditional througput loads.
1447 In any case, the definition of conditional throughput
1448 is precise enough to dictate "conservative" handling
1449 of trial inconsistencies.
1450
1451 # IANA Considerations
1452
1453 No requests of IANA.
1454
1455 # Security Considerations
1456
1457 Benchmarking activities as described in this memo are limited to
1458 technology characterization of a DUT/SUT using controlled stimuli in a
1459 laboratory environment, with dedicated address space and the constraints
1460 specified in the sections above.
1461
1462 The benchmarking network topology will be an independent test setup and
1463 MUST NOT be connected to devices that may forward the test traffic into
1464 a production network or misroute traffic to the test management network.
1465
1466 Further, benchmarking is performed on a "black-box" basis, relying
1467 solely on measurements observable external to the DUT/SUT.
1468
1469 Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
1470 benchmarking purposes. Any implications for network security arising
1471 from the DUT/SUT SHOULD be identical in the lab and in production
1472 networks.
1473
1474 # Acknowledgements
1475
1476 Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
1477 review and numerous useful comments and suggestions.
1478
1479 --- back