docs/ietf/draft-ietf-bmwg-mlrsearch-03.md

   1 ---
   2 title: Multiple Loss Ratio Search
   3 abbrev: MLRsearch
   4 docname: draft-ietf-bmwg-mlrsearch-03
   5 date: 2022-11-09
   6
   7 ipr: trust200902
   8 area: ops
   9 wg: Benchmarking Working Group
  10 kw: Internet-Draft
  11 cat: info
  12
  13 coding: us-ascii
  14 pi:    # can use array (if all yes) or hash here
  15   toc: yes
  16   sortrefs:   # defaults to yes
  17   symrefs: yes
  18
  19 author:
  20       -
  21         ins: M. Konstantynowicz
  22         name: Maciek Konstantynowicz
  23         org: Cisco Systems
  24         email: mkonstan@cisco.com
  25       -
  26         ins: V. Polak
  27         name: Vratko Polak
  28         org: Cisco Systems
  29         email: vrpolak@cisco.com
  30
  31 normative:
  32   RFC1242:
  33   RFC2285:
  34   RFC2544:
  35   RFC9004:
  36
  37 informative:
  38   TST009:
  39     target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf
  40     title: "TST 009"
  41   FDio-CSIT-MLRsearch:
  42     target: https://s3-docs.fd.io/csit/rls2110/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.html#mlrsearch-tests
  43     title: "FD.io CSIT Test Methodology - MLRsearch"
  44     date: 2021-11
  45   PyPI-MLRsearch:
  46     target: https://pypi.org/project/MLRsearch/0.3.0/
  47     title: "MLRsearch 0.3.0, Python Package Index"
  48     date: 2021-04
  49
  50 --- abstract
  51
  52 This document proposes improvements to [RFC2544] throughput search by
  53 defining a new methodology called Multiple Loss Ratio search
  54 (MLRsearch). The main objectives for MLRsearch are to minimize the
  55 total test duration, search for multiple loss ratios and improve
  56 results repeatibility and comparability.
  57
  58 The main motivation behind MLRsearch is the new set of challenges and
  59 requirements posed by testing Network Function Virtualization
  60 (NFV) systems and other software based network data planes.
  61
  62 MLRsearch offers several ways to address these challenges, giving user
  63 configuration options to select their way.
  64
  65 --- middle
  66
  67 {::comment}
  68     As we use kramdown to convert from markdown,
  69     we use this way of marking comments not to be visible in rendered draft.
  70     https://stackoverflow.com/a/42323390
  71     If other engine is used, convert to this way:
  72     https://stackoverflow.com/a/20885980
  73 {:/comment}
  74
  75 # Purpose and Scope
  76
  77 The purpose of this document is to describe Multiple Loss Ratio search
  78 (MLRsearch), a throughput search methodology optimized for software
  79 DUTs.
  80
  81 Applying vanilla [RFC2544] throughput bisection to software DUTs
  82 results in a number of problems:
  83
  84 - Binary search takes too long as most of trials are done far from the
  85   eventually found throughput.
  86 - The required final trial duration and pauses between trials also
  87   prolong the overall search duration.
  88 - Software DUTs show noisy trial results (noisy neighbor problem),
  89   leading to big spread of possible discovered throughput values.
  90 - Throughput requires loss of exactly zero packets, but the industry
  91   frequently allows for small but non-zero losses.
  92 - The definition of throughput is not clear when trial results are
  93   inconsistent.
  94
  95 MLRsearch aims to address these problems by applying the following set
  96 of enhancements:
  97
  98 - Allow searching with multiple loss ratio goals.
  99   - Each trial result can affect any search goal in principle
 100     (trial reuse).
 101 - Multiple phases within one loss ratio goal search, middle ones need
 102   to spend less time on trials.
 103   - Middle phases also aim at lesser precision.
 104   - Use Forwarding Rate (FR) at maximum offered load
 105     [RFC2285] (section 3.6.2) to initialize the first middle phase.
 106 - Take care when dealing with inconsistent trial results.
 107   - Loss ratios goals are handled in an order that precludes any
 108     interference from later trials to earlier goals.
 109 - Apply several load selection heuristics to save even more time
 110   by trying hard to avoid unnecessarily narrow intervals.
 111
 112 MLRsearch configuration options are flexible enough to
 113 support both conservative settings (unconditionally compliant with [RFC2544],
 114 but longer search duration and worse repeatability) and aggressive
 115 settings (shorter search duration and better repeatability but not
 116 compliant with [RFC2544]).
 117
 118 No part of [RFC2544] is intended to be obsoleted by this document.
 119
 120 # Problems
 121
 122 ## Long Test Duration
 123
 124 Emergence of software DUTs, with frequent software updates and a
 125 number of different packet processing modes and configurations, drives
 126 the requirement of continuous test execution and bringing down the test
 127 execution time.
 128
 129 In the context of characterising particular DUT's network performance, this
 130 calls for improving the time efficiency of throughput search.
 131 A vanilla bisection (at 60sec trial duration for unconditional [RFC2544]
 132 compliance) is slow, because most trials spend time quite far from the
 133 eventual throughput.
 134
 135 [RFC2544] does not specify any stopping condition for throughput search,
 136 so users can trade-off between search duration and precision goal.
 137 But, due to exponential behavior of bisection, small improvement
 138 in search duration needs relatively big sacrifice in the result precision.
 139
 140 ## DUT within SUT
 141
 142 [RFC2285] defines:
 143 - *DUT* as
 144   - The network forwarding device to which stimulus is offered and
 145     response measured [RFC2285] (section 3.1.1).
 146 - *SUT* as
 147   - The collective set of network devices to which stimulus is offered
 148     as a single entity and response measured [RFC2285] (section 3.1.2).
 149
 150 [RFC2544] specifies a test setup with an external tester stimulating the
 151 networking system, treating it either as a single DUT, or as a system
 152 of devices, an SUT.
 153
 154 In case of software networking, the SUT consists of a software program
 155 processing packets (device of interest, the DUT),
 156 running on a server hardware and using operating system functions as appropriate,
 157 with server hardware resources shared across all programs
 158 and the operating system.
 159
 160 DUT is effectively "nested" within SUT.
 161
 162 Due to a shared multi-tenant nature of SUT, DUT is subject to
 163 interference (noise) coming from the operating system and any other
 164 software running on the same server. Some sources of noise can be
 165 eliminated (e.g. by pinning DUT program threads to specific CPU cores
 166 and isolating those cores to avoid context switching). But some
 167 noise remains after all such reasonable precautions are applied. This
 168 noise does negatively affect DUT's network performance. We refer to it
 169 as an *SUT noise*.
 170
 171 DUT can also exhibit fluctuating performance itself, e.g. while performing
 172 some "stop the world" internal stateful processing. In many cases this
 173 may be an expected per-design behavior, as it would be observable even
 174 in a hypothetical scenario where all sources of SUT noise are
 175 eliminated. Such behavior affects trial results in a way similar to SUT
 176 noise. We use *noise* as a shorthand covering both *DUT fluctuations* and
 177 genuine SUT noise.
 178
 179 A simple model of SUT performance consists of a baseline *noiseless performance*,
 180 and an additional noise. The baseline is assumed to be constant (enough).
 181 The noise varies in time, sometimes wildly. The noise can sometimes be negligible,
 182 but frequently it lowers the observed SUT performance in a trial.
 183
 184 In this model, SUT does not have a single performance value, it has a spectrum.
 185 One end of the spectrum is the noiseless baseline,
 186 the other end is a *noiseful performance*. In practice, trial results
 187 close to the noiseful end of the spectrum happen only rarely.
 188 The worse performance, the more rarely it is seen.
 189
 190 Focusing on DUT, the benchmarking effort should aim
 191 at eliminating only the SUT noise from SUT measurement.
 192 But that is not really possible, as there are no realistic enough models
 193 able to distinguish SUT noise from DUT fluctuations.
 194
 195 However, assuming that a well-constructed SUT has the DUT as its
 196 performance bottleneck, the "DUT noiseless performance" can be defined
 197 as the noiseless end of SUT performance spectrum. (At least for
 198 throughput. For other quantities such as latency there will be an
 199 additive difference.) By this definition, DUT noiseless performance
 200 also minimizes the impact of DUT fluctuations.
 201
 202 In this document, we reduce the "DUT within SUT" problem to estimating
 203 the noiseless end of SUT performance spectrum from a limited number of
 204 trial results.
 205
 206 Any improvements to throughput search algorithm, aimed for better
 207 dealing with software networking SUT and DUT setup, should employ
 208 strategies recognizing the presence of SUT noise, and allow discovery of
 209 (proxies for) DUT noiseless performance
 210 at different levels of sensitivity to SUT noise.
 211
 212 ## Repeatability and Comparability
 213
 214 [RFC2544] does not suggest to repeat throughput search, and from just one
 215 throughput value, it cannot be determined how repeatable that value is.
 216 In practice, poor repeatability is also the main cause of poor
 217 comparability, e.g. different benchmarking teams can test the same DUT
 218 but get different throughput values.
 219
 220 [RFC2544] throughput requirements (60s trial, no tolerance to single frame loss)
 221 force the search to converge around the noiseful end of SUT performance
 222 spectrum. As that end is affected by rare trials of significantly low
 223 performance, the resulting throughput repeatability is poor.
 224
 225 The repeatability problem is the problem of defining a search procedure
 226 which reports more stable results
 227 (even if they can no longer be called "throughput" in [RFC2544] sense).
 228 According to baseline (noiseless) and noiseful model, better repeatability
 229 will be at the noiseless end of the spectrum.
 230 Therefore, solutions to the "DUT within SUT" problem
 231 will help also with the repeatability problem.
 232
 233 Conversely, any alteration to [RFC2544] throughput search
 234 that improves repeatability should be considered
 235 as less dependent on the SUT noise.
 236
 237 An alternative option is to simply run a search multiple times, and report some
 238 statistics (e.g. average and standard deviation). This can be used
 239 for "important" tests, but it makes the search duration problem even
 240 bigger.
 241
 242 ## Throughput with Non-Zero Loss
 243
 244 [RFC1242] (section 3.17) defines throughput as:
 245     The maximum rate at which none of the offered frames
 246     are dropped by the device.
 247
 248 and then it says:
 249     Since even the loss of one frame in a
 250     data stream can cause significant delays while
 251     waiting for the higher level protocols to time out,
 252     it is useful to know the actual maximum data
 253     rate that the device can support.
 254
 255 Contrary to that, many benchmarking teams settle with non-zero
 256 (small) loss ratio as the goal for a "throughput rate".
 257
 258 Motivations are many: modern protocols tolerate frame loss better;
 259 trials nowadays send way more frames within the same duration;
 260 impact of rare noise bursts is smaller as the baseline performance
 261 can compensate somewhat by keeping the loss ratio below the goal;
 262 if SUT noise with "ideal DUT" is known, it can be set as the loss ratio goal.
 263
 264 Regardless of validity of any and all similar motivations,
 265 support for non-zero loss goals makes any search algorithm more user-friendly.
 266 [RFC2544] throughput is not friendly in this regard.
 267
 268 Searching for multiple loss ratio goals also helps to describe the SUT
 269 performance better than a single goal result. Repeated wide gap between
 270 zero and non-zero loss loads indicates the noise has a large impact on
 271 the overall SUT performance.
 272
 273 It is easy to modify the vanilla bisection to find a lower bound
 274 for intended load that satisfies a non-zero-loss goal,
 275 but it is not that obvious how to search for multiple goals at once,
 276 hence the support for multiple loss goals remains a problem.
 277
 278 ## Inconsistent Trial Results
 279
 280 While performing throughput search by executing a sequence of
 281 measurement trials, there is a risk of encountering inconsistencies
 282 between trial results.
 283
 284 The plain bisection never encounters inconsistent trials.
 285 But [RFC2544] hints about possibility if inconsistent trial results in two places.
 286 The first place is section 24 where full trial durations are required, presumably
 287 because they can be inconsistent with results from shorter trial durations.
 288 The second place is section 26.3 where two successive zero-loss trials
 289 are recommended, presumably because after one zero-loss trial
 290 there can be subsequent inconsistent non-zero-loss trial.
 291
 292 Examples include:
 293
 294 - a trial at the same load (same or different trial duration) results
 295   in a different packet loss ratio.
 296 - a trial at higher load (same or different trial duration) results
 297   in a smaller packet loss ratio.
 298
 299 Any robust throughput search algorithm needs to decide how to continue
 300 the search in presence of such inconsistencies.
 301 Definitions of throughput in [RFC1242] and [RFC2544] are not specific enough
 302 to imply a unique way of handling such inconsistencies.
 303
 304 Ideally, there will be a definition of a quantity which both generalizes
 305 throughput for non-zero-loss (and other possible repeatibility enhancements),
 306 while being precise enough to force a specific way to resolve trial
 307 inconsistencies.
 308 But until such definition is agreed upon, the correct way to handle
 309 inconsistent trial results remains an open problem.
 310
 311 # MLRsearch Approach
 312
 313 The following description intentionally leaves out some important implementation
 314 details. This is both to hide complexity that is not important for overall
 315 understanding, and to allow future improvements in the implementation.
 316
 317 ## Terminology
 318
 319 - *trial duration*: Amount of time over which frames are transmitted
 320   towards SUT and DUT in a single measurement step.
 321   - **MLRsearch input parameter** for final MLRsearch measurements.
 322 - *loss ratio*: Ratio of the count of frames lost to the count of frames
 323   transmitted over a trial duration, a.k.a. packet loss ratio. Related
 324   to packet loss rate [RFC1242] (section 3.6).
 325   In MLRsearch loss ratio can mean either a trial result or a goal:
 326   - *trial loss ratio*: Loss ratio measured during a trial.
 327   - *loss ratio goal*: **MLRsearch input parameter**.
 328     - If *trial loss ratio* is smaller or equal to this,
 329       the trial **satisfies** the loss ratio goal.
 330 - *load*: Constant offered load stimulating the SUT and DUT. Consistent
 331   with offered load [RFC2285] (section 3.5.2).
 332   - MLRsearch works with intended load instead, as it cannot deal with
 333     situations where the offered load is considerably different than
 334     intended load.
 335 - *throughput*: The maximum load at which none of the offered frames are
 336   dropped by the SUT and DUT. Consistent with [RFC1242] (section 3.17).
 337 - *conditional throughput*: The forwarding rate measured at the maximum
 338   load at which a list of specified conditions are met i.e. loss ratio
 339   goal and trial duration.
 340   - Throughput is then a special case of conditional throughput
 341     for zero loss ratio goal and long enough trial duration.
 342   - Conditional throughput is aligned with forwarding rate (FR)
 343     [RFC2285] (section 3.6.1), adding trial duration to offered load
 344     required when reporting FR.
 345 - *lower bound*: One of values tracked by MLRsearch during the search runtime.
 346   It is specific to the current trial duration and current loss ratio goal.
 347   It represents a load value with at least one trial result available.
 348   If the trial satisfies the current loss ratio goal,
 349   it is a *valid* bound (else *invalid*).
 350 - *upper bound*: One of values tracked by MLRsearch during the search runtime.
 351   It is specific to the current trial duration and current loss ratio goal.
 352   It represents a load value with at least one trial result available.
 353   If the trial satisfies the current loss ratio goal,
 354   it is an *invalid* bound (else *valid*).
 355 - *interval*: The span between lower and upper bound loads.
 356 - *precision goal*: **MLRsearch input parameter**, acting as a search
 357   stop condition, given as either absolute or relative width goal. An
 358   interval meets precision goal if:
 359   - The difference of upper and lower bound loads (in pps)
 360     is not more than the absolute width goal.
 361   - The difference as above, divided by upper bound load (in pps)
 362     is not more than the relative width goal.
 363
 364 ## Description
 365
 366 The MLRsearch approach to address the identified problems is based
 367 on the following main strategies:
 368
 369 - MLRsearch main inputs include the following search goals and parameters:
 370   - One or more **loss ratio goals**.
 371     - e.g. a zero-loss goal and one (or more) non-zero-loss goals.
 372   - **Target trial duration** condition governing required trial duration
 373     for final measurements.
 374   - **Target precision** condition governing how close final lower and
 375     upper bound load values must be to each other for final
 376     measurements.
 377 - Search is executed as a sequence of phases:
 378   - *Initial phase* initializes bounds for the first middle phase.
 379   - *Middle phase*s narrow down the bounds, using shorter trial
 380     durations and lower precision goals. Several middle phases can
 381     precede each final phase.
 382   - *Final phase* (one per loss ratio goal) finds bounds matching input
 383     goals and parameters to serve as the overal search output.
 384 - Each search phase produces its *ending* upper bound and lower bound:
 385   - Initial phase may produce invalid bounds.
 386   - Middle and final phases produce valid bounds.
 387   - Middle or final phases needs at least two values to act as
 388     *starting* bounds (may be invalid).
 389   - Each phase may perform several trial measurements, until phase's
 390     ending conditions are all met.
 391   - Trial results from previous phases may be re-used.
 392 - Initial phase establishes the starting values for bounds, using
 393   forwarding rates (FR) [RFC2285] (section 3.6.1)
 394   from a few trials of minimal duration, as follows:
 395   - 1st trial is done at *maximum offered load (MOL)* [RFC2285] (section 3.5.3),
 396     resulting in Forwarding rate at maximum offered load (FRMOL)
 397     [RFC2285] (section 3.6.2).
 398   - 2nd trial is done at *FRMOL*, resulting in forwarding rate at FRMOL (FRFRMOL),
 399     newly defined here.
 400   - 3rd trial is done at *FRFRMOL*, so its results are available for the next phase.
 401   - By default, FRMOL is used as an upper bound, FRFRMOL as a lower bound.
 402     - Adjustments may apply here for some cases e.g. when 2nd trial got
 403       zero loss or if FRFRMOL is too close to FRMOL.
 404 - Middle phases are producing ending bounds by improving upon starting bounds:
 405   - Each middle phase uses the same loss ratio goal as the final phase it precedes.
 406     - Called *current loss ratio goal* for upper and lower bound purposes.
 407   - Each middle phase has its own *current trial duration*
 408     and *current precision goal* parameters, computed from
 409     MLRsearch input parameters.
 410     As phases progress, these parameters approach MLRsearch main input values.
 411     - Current trial duration starts from a configurable minimum (e.g. 1 sec)
 412       and increases in a geometric sequence.
 413     - Current precision goal always allows twice as wide intervals
 414       as the following phase.
 415   - The starting bounds are usually the ending bounds from the preceding phase.
 416     - Unless there are many previous trial results that are more promising.
 417   - Each middle phase operates in a sequence of four actions:
 418     1. Perform trial at the load between the starting bounds.
 419       - Depending on the trial result this becomes the first
 420         new valid upper or lower bound for current phase.
 421     2. Re-measure at the remaining starting lower or upper (respectively) bound.
 422     3. If that did not result in a valid bound, start an *external search*.
 423       - That is a variant of exponential search.
 424         - The "growth" is given by input parameter *expansion_coefficient*.
 425       - This action ends when a new valid bound is found.
 426         - Or if an already existing valid bound becomes close enough.
 427     4. Repeatedly bisect the current interval until the bounds are close enough.
 428 - Final search phase operates in exactly the same way as middle phases.
 429   There are two reasons why it is named differently:
 430   - The current trial duration and current precision goal within the phase
 431     are equal to the target trial duration and target precision input parameters.
 432   - The forwarding rates of the ending bounds become the output of MLRsearch.
 433     - Specifically, the forwarding rates of the final lower bounds
 434       are the conditional throughput values per given loss ratio goals.
 435
 436 ## Enhancement: Multiple trials per load
 437
 438 An enhancement of MLRsearch is to introduce a *noise tolerance* input parameter.
 439 The idea is to perform several medium-length trials (instead of a single long trial)
 440 and tolerate a configurable fraction of them to not-satisfy the loss ratio goal.
 441
 442 MLRsearch implementation with this enhancement exists in FD.io CSIT project
 443 and test results of VPP and DPDK (testpmd, l3fwd) DUTs look promising.
 444
 445 This enhancement would make the description of MLRsearch approach
 446 considerably more complicated, so this document version only describes
 447 MLRsearch without this enhancement.
 448
 449 # How the problems are addressed
 450
 451 Configurable loss ratio goals are in direct support for non-zero-loss conditional througput.
 452 In practice the conditional throughput results' stability
 453 increases with higher loss ratio goals.
 454
 455 Multiple trials with noise tolerance enhancement will also indirectly
 456 increase result stability and it will allow MLRsearch
 457 to add all the benefits of Binary Search with Loss Verification,
 458 as recommended in [RFC9004] (section 6.2)
 459 and specified in [TST009] (section 12.3.3).
 460
 461 The main factor improving the overall search time is the introduction
 462 of middle phases. The full implementation can bring a large number of
 463 heuristics related to how exactly should the next trial load be chosen,
 464 but the impact of those is not as big.
 465
 466 The Description subsection lacks any details on how to handle inconsistent
 467 trial results. In practice, there tend to be a three-way trade-off
 468 between i) short overall search time, ii) result stability
 469 and iii) how simple the definition of the returned conditional throughput can be.
 470 The third one is important for comparability between different MLRsearch
 471 implementations.
 472
 473 # IANA Considerations
 474
 475 No requests of IANA.
 476
 477 # Security Considerations
 478
 479 Benchmarking activities as described in this memo are limited to
 480 technology characterization of a DUT/SUT using controlled stimuli in a
 481 laboratory environment, with dedicated address space and the constraints
 482 specified in the sections above.
 483
 484 The benchmarking network topology will be an independent test setup and
 485 MUST NOT be connected to devices that may forward the test traffic into
 486 a production network or misroute traffic to the test management network.
 487
 488 Further, benchmarking is performed on a "black-box" basis, relying
 489 solely on measurements observable external to the DUT/SUT.
 490
 491 Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
 492 benchmarking purposes. Any implications for network security arising
 493 from the DUT/SUT SHOULD be identical in the lab and in production
 494 networks.
 495
 496 # Acknowledgements
 497
 498 Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
 499 review and numerous useful comments and suggestions.
 500
 501 --- back