docs/ietf/draft-vpolak-bmwg-plrsearch-02.md

   1 ---
   2 title: Probabilistic Loss Ratio Search for Packet Throughput (PLRsearch)
   3 # abbrev: PLRsearch
   4 docname: draft-vpolak-bmwg-plrsearch-02
   5 date: 2019-07-08
   6
   7 ipr: trust200902
   8 area: ops
   9 wg: Benchmarking Working Group
  10 kw: Internet-Draft
  11 cat: info
  12
  13 coding: us-ascii
  14 pi:    # can use array (if all yes) or hash here
  15 #  - toc
  16 #  - sortrefs
  17 #  - symrefs
  18   toc: yes
  19   sortrefs:   # defaults to yes
  20   symrefs: yes
  21
  22 author:
  23       -
  24         ins: M. Konstantynowicz
  25         name: Maciek Konstantynowicz
  26         org: Cisco Systems
  27         role: editor
  28         email: mkonstan@cisco.com
  29       -
  30         ins: V. Polak
  31         name: Vratko Polak
  32         org: Cisco Systems
  33         role: editor
  34         email: vrpolak@cisco.com
  35
  36 normative:
  37   RFC2544:
  38   RFC8174:
  39
  40 informative:
  41   draft-vpolak-mkonstan-bmwg-mlrsearch:
  42     target: https://tools.ietf.org/html/draft-vpolak-mkonstan-bmwg-mlrsearch
  43     title: "Multiple Loss Ratio Search for Packet Throughput (MLRsearch)"
  44     date: 2019-07
  45
  46 --- abstract
  47
  48 This document addresses challenges while applying methodologies
  49 described in [RFC2544] to benchmarking software based NFV (Network
  50 Function Virtualization) data planes over an extended period of time,
  51 sometimes referred to as "soak testing". Packet throughput search
  52 approach proposed by this document assumes that system under test is
  53 probabilistic in nature, and not deterministic.
  54
  55 --- middle
  56
  57 # Motivation
  58
  59 Network providers are interested in throughput a system can sustain.
  60
  61 [RFC2544] assumes loss ratio is given by a deterministic function of
  62 offered load. But NFV software systems are not deterministic enough.
  63 This makes deterministic algorithms (such as Binary Search per [RFC2544]
  64 and [draft-vpolak-mkonstan-bmwg-mlrsearch] with single trial) to return
  65 results, which when repeated show relatively high standard deviation,
  66 thus making it harder to tell what "the throughput" actually is.
  67
  68 We need another algorithm, which takes this indeterminism into account.
  69
  70 # Relation To RFC2544
  71
  72 The aim of this document is to become an extension of [RFC2544] suitable
  73 for benchmarking networking setups such as software based NFV systems.
  74
  75 # Terms And Assumptions
  76
  77 ## Device Under Test
  78
  79 In software networking, "device" denotes a specific piece of software
  80 tasked with packet processing. Such device is surrounded with other
  81 software components (such as operating system kernel). It is not
  82 possible to run devices without also running the other components, and
  83 hardware resources are shared between both.
  84
  85 For purposes of testing, the whole set of hardware and software
  86 components is called "system under test" (SUT). As SUT is the part of
  87 the whole test setup performance of which can be measured by [RFC2544]
  88 methods, this document uses SUT instead of [RFC2544] DUT.
  89
  90 Device under test (DUT) can be re-introduced when analysing test results
  91 using whitebox techniques, but that is outside the scope of this document.
  92
  93 ## System Under Test
  94
  95 System under test (SUT) is a part of the whole test setup whose
  96 performance is to be benchmarked. The complete methodology contains
  97 other parts, whose performance is either already established, or not
  98 affecting the benchmarking result.
  99
 100 ## SUT Configuration
 101
 102 Usually, system under test allows different configurations, affecting
 103 its performance. The rest of this document assumes a single
 104 configuration has been chosen.
 105
 106 ## SUT Setup
 107
 108 Similarly to [RFC2544], it is assumed that the system under test has
 109 been updated with all the packet forwarding information it needs, before
 110 the trial measurements (see below) start.
 111
 112 ## Network Traffic
 113
 114 Network traffic is a type of interaction between system under test and
 115 the rest of the system (traffic generator), used to gather information
 116 about the system under test performance. PLRsearch is applicable only to
 117 areas where network traffic consists of packets.
 118
 119 ## Packet
 120
 121 Unit of interaction between traffic generator and the system under test.
 122 Term "packet" is used also as an abstraction of Ethernet frames.
 123
 124 ### Packet Offered
 125
 126 Packet can be offered, which means it is sent from traffic generator
 127 to the system under test.
 128
 129 Each offered packet is assumed to become received or lost in a short
 130 time.
 131
 132 ### Packet Received
 133
 134 Packet can be received, which means the traffic generator verifies it
 135 has been processed. Typically, when it is succesfully sent from the
 136 system under test to traffic generator.
 137
 138 It is assumed that each received packet has been caused by an offered
 139 packet, so the number of packets received cannot be larger than the
 140 number of packets offered.
 141
 142 ### Packet Lost
 143
 144 Packet can be lost, which means sent but not received in a timely
 145 manner.
 146
 147 It is assumed that each lost packet has been caused by an offered
 148 packet, so the number of packets lost cannot be larger than the number
 149 of packets offered.
 150
 151 Usually, the number of packets lost is computed as the number of packets
 152 offered, minus the number of packets received.
 153
 154 ###  Other Packets
 155
 156 PLRsearch is not considering other packet behaviors known from
 157 networking (duplicated, reordered, greatly delayed), assuming the test
 158 specification reclassifies those behaviors to fit into the first three
 159 categories.
 160
 161 ## Traffic Profile
 162
 163 Usually, the performance of the system under test depends on a "type" of
 164 a particular packet (for example size), and "composition" if the network
 165 traffic consists of a mixture of different packet types.
 166
 167 Also, some systems under test contain multiple "ports" packets can be
 168 offered to and received from.
 169
 170 All such qualities together (but not including properties of trial
 171 measurements) are called traffic profile.
 172
 173 Similarly to system under test configuration, this document assumes only
 174 one traffic profile has been chosen for a particular test.
 175
 176 ##  Traffic Generator
 177
 178 Traffic generator is the part of the whole test setup, distinct from the
 179 system under test, responsible both for offering packets in a highly
 180 predictable manner (so the number of packets offered is known), and for
 181 counting received packets in a precise enough way (to distinguish lost
 182 packets from tolerably delayed packets).
 183
 184 Traffic generator must offer only packets compatible with the traffic
 185 profile, and only count similarly compatible packets as received.
 186
 187 Criteria defining which received packets are compatible are left
 188 for test specification to decide.
 189
 190 ## Offered Load
 191
 192 Offered load is an aggregate rate (measured in packets per second) of
 193 network traffic offered to the system under test, the rate is kept
 194 constant for the duration of trial measurement.
 195
 196 ## Trial Measurement
 197
 198 Trial measurement is a process of stressing (previously setup) system
 199 under test by offering traffic of a particular offered load, for a
 200 particular duration.
 201
 202 After that, the system has a short time to become idle, while the
 203 traffic generator decides how many packets were lost.
 204
 205 After that, another trial measurement (possibly with different offered
 206 load and duration) can be immediately performed. Traffic generator
 207 should ignore received packets caused by packets offered in previous
 208 trial measurements.
 209
 210 ## Trial Duration
 211
 212 Duration for which the traffic generator was offering packets at
 213 constant offered load.
 214
 215 In theory, care has to be taken to ensure the offered load and trial
 216 duration predict integer number of packets to offer, and that the
 217 traffic generator really sends appropriate number of packets within
 218 precisely enough timed duration. In practice, such consideration do not
 219 change PLRsearch result in any significant way.
 220
 221 ## Packet Loss
 222
 223 Packet loss is any quantity describing a result of trial measurement.
 224
 225 It can be loss count, loss rate or loss ratio. Packet loss is zero (or
 226 non-zero) if either of the three quantities are zero (or non-zero,
 227 respecively).
 228
 229 ### Loss Count
 230
 231 Number of packets lost (or delayed too much) at a trial measurement by
 232 the system under test as determined by packet generator. Measured in
 233 packets.
 234
 235 ### Loss Rate
 236
 237 Loss rate is computed as loss count divided by trial duration. Measured
 238 in packets per second.
 239
 240 ### Loss Ratio
 241
 242 Loss ratio is computed as loss count divided by number of packets
 243 offered. Measured as a real (in practice rational) number between zero
 244 or one (including).
 245
 246 ## Trial Order Independent System
 247
 248 Trial order independent system is a system under test, proven (or just
 249 assumed) to produce trial measurement results that display trial order
 250 independence.
 251
 252 That means when a pair of consequent trial measurements are performed,
 253 the probability to observe a pair of specific results is the same, as
 254 the probability to observe the reversed pair of results whe performing
 255 the reversed pair of consequent measurements.
 256
 257 PLRsearch assumes the system under test is trial order independent.
 258
 259 In practice, most system under test are not entirely trial order
 260 independent, but it is not easy to devise an algorithm taking that into
 261 account.
 262
 263 ## Trial Measurement Result Distribution
 264
 265 When a trial order independent system is subjected to repeated trial
 266 measurements of constant duration and offered load, Law of Large Numbers
 267 implies the observed loss count frequencies will converge to a specific
 268 probability distribution over possible loss counts.
 269
 270 This probability distribution is called trial measurement result
 271 distribution, and it depends on all properties fixed when defining it.
 272 That includes the system under test, its chosen configuration, the
 273 chosen traffic profile, the offered load and the trial duration.
 274
 275 As the system is trial order independent, trial measurement result
 276 distribution does not depend on results of few initial trial
 277 measurements, of any offered load or (finite) duration.
 278
 279 ## Average Loss Ratio
 280
 281 Probability distribution over some (finite) set of states enables
 282 computation of probability-weighted average of any quantity evaluated on
 283 the states (called the expected value of the quantity).
 284
 285 Average loss ratio is simply the expected value of loss ratio for a
 286 given trial measurement result distribution.
 287
 288 ## Duration Independent System
 289
 290 Duration independent system is a trial order independent system, whose
 291 trial measurement result distribution is proven (or just assumed) to
 292 display practical independence from trial duration. See definition of
 293 trial duration for discussion on practical versus theoretical.
 294
 295 The only requirement is for average loss ratio to be independent of
 296 trial duration.
 297
 298 In theory, that would necessitate each trial measurement result
 299 distribution to be a binomial distribution. In practice, more
 300 distributions are allowed.
 301
 302 PLRsearch assumes the system under test is duration independent, at
 303 least for trial durations typically chosen for trial measurements
 304 initiated by PLRsearch.
 305
 306 ## Load Regions
 307
 308 For a duration independent system, trial measurement result distribution
 309 depends only on offered load.
 310
 311 It is convenient to name some areas of offered load space by possible
 312 trial results.
 313
 314 ### Zero Loss Region
 315
 316 A particular offered load value is said to belong to zero loss region,
 317 if the probability of seeing non-zero loss trial measurement result is
 318 exactly zero, or at least practically indistinguishable from zero.
 319
 320 ### Guaranteed Loss Region
 321
 322 A particular offered load value is said to belong to guaranteed loss
 323 region, if the probability of seeing zero loss trial measurement result
 324 (for non-negligible count of packets offered) is exactly zero, or at
 325 least practically indistinguishable from zero.
 326
 327 ### Non-Deterministic Region
 328
 329 A particular offered load value is said to belong to non-deterministic
 330 region, if the probability of seeing zero loss trial measurement result
 331 (for non-negligible count of packets offered) is practically
 332 distinguishable from both zero and one.
 333
 334 ### Normal Region Ordering
 335
 336 Although theoretically the three regions can be arbitrary sets, this
 337 document assumes they are intervals, where zero loss region contains
 338 values smaller than non-deterministic region, which in turn contains
 339 values smaller than guaranteed loss region.
 340
 341 ## Deterministic System
 342
 343 A hypothetical duration independent system with normal region ordering,
 344 whose non-deterministic region is extremely narrow (only present due to
 345 "practical distinguishibility" and cases when the expected number of
 346 packets offered is not and integer).
 347
 348 A duration independent system which is not deterministic is called non-
 349 deterministic system.
 350
 351 ## Througphput
 352
 353 Throughput is the highest offered load provably causing zero packet loss
 354 for trial measurements of duration at least 60 seconds.
 355
 356 For duration independent systems with normal region ordering, the
 357 throughput is the highest value within the zero loss region.
 358
 359 ## Deterministic Search
 360
 361 Any algorithm that assumes each measurement is a proof of the offered
 362 load belonging to zero loss region (or not) is called deterministic
 363 search.
 364
 365 This definition includes algorithms based on "composite measurements"
 366 which perform multiple trial measurements, somehow re-classifying
 367 results pointing at non-deterministic region.
 368
 369 Binary Search is an example of deterministic search.
 370
 371 Single run of a deterministic search launched against a deterministic
 372 system is guaranteed to find the throughput with any prescribed
 373 precision (not better than non-deterministic region width).
 374
 375 Multiple runs of a deterministic search launched against a non-
 376 deterministic system can return varied results within non-deterministic
 377 region. The exact distribution of deterministic search results depends
 378 on the algorithm used.
 379
 380 ## Probabilistic Search
 381
 382 Any algorithm which performs probabilistic computations based on
 383 observed results of trial measurements, and which does not assume that
 384 non-deterministic region is practically absent, is called probabilistic
 385 search.
 386
 387 A probabilistic search algorithm, which would assume that non-
 388 deterministic region is practically absent, does not really need to
 389 perform probabilistic computations, so it would become a deterministic
 390 search.
 391
 392 While probabilistic search for estimating throughput is possible, it
 393 would need a careful model for boundary between zero loss region and
 394 non-deterministic region, and it would need a lot of measurements of
 395 almost surely zero loss to reach good precision.
 396
 397 ## Loss Ratio Function
 398
 399 For any duration independent system, the average loss ratio depends only
 400 on offered load (for a particular test setup).
 401
 402 Loss ratio function is the name used for the function mapping offered
 403 load to average loss ratio.
 404
 405 This function is initially unknown.
 406
 407 ## Target Loss Ratio
 408
 409 Input parameter of PLRsearch. The average loss ratio the output of
 410 PLRsearch aims to achieve.
 411
 412 ## Critical Load
 413
 414 Aggregate rate of network traffic, which would lead to average loss
 415 ratio exactly matching target loss ratio, if used as the offered load
 416 for infinite many trial measurement.
 417
 418 ## Critical Load Estimate
 419
 420 Any quantitative description of the possible critical load PLRsearch is
 421 able to give after observing finite amount of trial measurements.
 422
 423 ## Fitting Function
 424
 425 Any function PLRsearch uses internally instead of the unknown loss ratio
 426 function. Typically chosen from small set of formulas (shapes) with few
 427 parameters to tweak.
 428
 429 ## Shape of Fitting Function
 430
 431 Any formula with few undetermined parameters.
 432
 433 ## Parameter Space
 434
 435 A subset of Real Coordinate Space. A point of parameter space is a
 436 vector of real numbers. Fitting function is defined by shape (a formula
 437 with parameters) and point of parameter space (specifying values for the
 438 parameters).
 439
 440 # Abstract Algorithm
 441
 442 ## High level description
 443
 444 PLRsearch accepts some input arguments, then iteratively performs trial
 445 measurements at varying offered loads (and durations), and returns some
 446 estimates of critical load.
 447
 448 PLRsearch input arguments form three groups.
 449
 450 First group has a single argument: measurer. This is a callback
 451 (function) accepting offered load and duration, and returning the
 452 measured loss count.
 453
 454 Second group consists of load related arguments required for measurer to
 455 work correctly, typically minimal and maximal load to offer. Also,
 456 target loss ratio (if not hardcoded) is a required argument.
 457
 458 Third group consists of time related arguments. Typically the duration
 459 for the first trial measurement, duration increment per subsequent trial
 460 measurement, and total time for search. Some PLRsearch implementation may
 461 use estimation accuracy parameters as an exit condition instead of total
 462 search time.
 463
 464 The returned quantities should describe the final (or best) estimate of
 465 critical load. Implementers can chose any description that suits their
 466 users, typically it is average and standard deviation, or lower and
 467 upper boundary.
 468
 469 ## Main Ideas
 470
 471 The search tries to perform measurements at offered load close to the
 472 critical load, because measurement results at offered loads far from the
 473 critical load give less information on precise location of the critical
 474 load. As virtually every trial measurement result alters the estimate of
 475 the critical load, offered loads vary as they approach the critical
 476 load.
 477
 478 The only quantity of trial measurement result affecting the computation
 479 is loss count. No latency (or other information) is taken into account.
 480
 481 PLRsearch uses Bayesian Inference, computed using numerical integration,
 482 which takes long time to get reliable enough results. Therefore it takes
 483 some time before the most recent measurement result starts affecting
 484 subsequent offered loads and critical rate estimates.
 485
 486 During the search, PLRsearch spawns few processes that perform numerical
 487 computations, the main process is calling the measurer to perform trial
 488 measurements, without any significant delays between them. The durations
 489 of the trial measurements are increasing linearly, as higher number of
 490 trial measurement results take longer to process.
 491
 492 ### Trial Durations
 493
 494 [RFC2544] motivates the usage of at least 60 second duration by the idea
 495 of the system under test slowly running out of resources (such as memory
 496 buffers).
 497
 498 Practical results when measuring NFV software systems show that relative
 499 change of trial duration has negligible effects on average loss ratio,
 500 compared to relative change in offered load.
 501
 502 While the standard deviation of loss ratio usually shows some effects of
 503 trial duration, they are hard to model. So PLRsearch assumes SUT
 504 is duration independent, and chooses trial durations only based on
 505 numeric integration requirements.
 506
 507 ### Target Loss Ratio
 508
 509 (TODO: Link to why we think 1e-7 is acceptable loss ratio.)
 510
 511 ## PLRsearch Building Blocks
 512
 513 Here we define notions used by PLRsearch which are not applicable to
 514 other search methods, nor probabilistic systems under test in general.
 515
 516 ### Bayesian Inference
 517
 518 PLRsearch uses a fixed set of fitting function shapes,
 519 and uses Bayesian inference to track posterior distribution
 520 on each fitting function parameter space.
 521
 522 Specifically, the few parameters describing a fitting function
 523 become the model space. Given a prior over the model space, and trial
 524 duration results, a posterior distribution is computed, together
 525 with quantities describing the critical load estimate.
 526
 527 Likelihood of a particular loss count is computed using Poisson distribution
 528 of average loss rate given by the fitting function (at specific point of
 529 parameter space).
 530
 531 Side note: Binomial Distribution is a better fit compared to Poisson
 532 distribution (acknowledging that the number of packets lost cannot be
 533 higher than the number of packets offered), but the difference tends to
 534 be relevant only in high loss region. Using Poisson distribution lowers
 535 the impact of measurements in high loss region, thus helping the
 536 algorithm to converge towards critical load faster.
 537
 538 ### Iterative Search
 539
 540 The idea PLRsearch is to iterate trial measurements, using Bayesian
 541 inference to compute both the current estimate of the critical load and
 542 the next offered load to measure at.
 543
 544 The required numerical computations are done in parallel with the trial
 545 measurements.
 546
 547 This means the result of measurement "n" comes as an (additional) input
 548 to the computation running in parallel with measurement "n+1", and the
 549 outputs of the computation are used for determining the offered load for
 550 measurement "n+2".
 551
 552 Other schemes are possible, aimed to increase the number of measurements
 553 (by decreasing their duration), which would have even higher number of
 554 measurements run before a result of a measurement affects offered load.
 555
 556 ### Fitting Functions
 557
 558 To make the space of possible loss ratio functions more tractable
 559 the algorithm uses only few fitting function shapes for its predicitons.
 560 As the search algorithm needs to evaluate the function also far away
 561 from the critical load, the fitting function have to be reasonably behaved
 562 for every positive offered load, specifically cannot cannot predict
 563 non-positive packet loss ratio.
 564
 565 ### Measurement Impact
 566
 567 Results from trials far from the critical load are likely to affect
 568 the critical load estimate negatively, as the fitting functions do not
 569 need to be good approximations there. This is true mainly for guaranteed loss
 570 region, as in zero loss region even badly behaved fitting function
 571 predicts loss count to be "almost zero", so seeing a measurement
 572 confirming the loss has been zero indeed has small impact.
 573
 574 Discarding some results, or "suppressing" their impact with ad-hoc
 575 methods (other than using Poisson distribution instead of binomial) is
 576 not used, as such methods tend to make the overall search unstable. We
 577 rely on most of measurements being done (eventually) near the critical load,
 578 and overweighting far-off measurements (eventually) for well-behaved
 579 fitting functions.
 580
 581 ### Fitting Function Coefficients Distribution
 582
 583 To accomodate systems with different behaviours, a fitting function is
 584 expected to have few numeric parameters affecting its shape (mainly
 585 affecting the linear approximation in the critical region).
 586
 587 The general search algorithm can use whatever increasing fitting
 588 function, some specific functions are described later.
 589
 590 It is up to implementer to chose a fitting function and prior
 591 distribution of its parameters. The rest of this document assumes each
 592 parameter is independently and uniformly distributed over a common
 593 interval. Implementers are to add non-linear transformations into their
 594 fitting functions if their prior is different.
 595
 596 ### Exit Condition
 597
 598 Exit condition for the search is either the standard deviation of the
 599 critical load estimate becoming small enough (or similar), or overal
 600 search time becoming long enough.
 601
 602 The algorithm should report both average and standard deviation for its
 603 critical load posterior.
 604
 605 ### Integration
 606
 607 The posterior distributions for fitting function parameters are not be
 608 integrable in general.
 609
 610 The search algorithm utilises the fact that trial measurement takes some
 611 time, so this time can be used for numeric integration (using suitable
 612 method, such as Monte Carlo) to achieve sufficient precision.
 613
 614 ### Optimizations
 615
 616 After enough trials, the posterior distribution will be concentrated in
 617 a narrow area of the parameter space. The integration method should take
 618 advantage of that.
 619
 620 Even in the concentrated area, the likelihood can be quite small, so the
 621 integration algorithm should avoid underflow errors by some means,
 622 for example by tracking the logarithm of the likelihood.
 623
 624 ### Offered Load Selection
 625
 626 The simplest rule is to set offered load for next trial measurememnt
 627 equal to the current average (both over posterio and over
 628 fitting function shapes) of the critical load estimate.
 629
 630 Contrary to critical load estimate computation, heuristic algorithms
 631 affecting offered load selection do not introduce instability,
 632 and can help with convergence speed.
 633
 634 ### Trend Analysis
 635
 636 If the reported averages follow a trend (maybe without reaching equilibrium),
 637 average and standard deviation COULD refer to the equilibrium estimates
 638 based on the trend, not to immediate posterior values.
 639
 640 But such post-processing is discouraged, unless a clear reason
 641 for the trend is known. Frequently, presence of such a trend is a sign
 642 of some of PLRsearch assumption being violated (usually trial order
 643 independence or duration independence).
 644
 645 It is RECOMMENDED to report any trend quantification together with
 646 direct critical load estimate, so users can draw their own conclusion.
 647 Alternatively, trend analysis may be a part of exit conditions,
 648 requiring longer searches for systems displaying trends.
 649
 650 # Sample Implementation Specifics: FD.io CSIT
 651
 652 The search receives min_rate and max_rate values, to avoid measurements
 653 at offered loads not supporeted by the traffic generator.
 654
 655 The implemented tests cases use bidirectional traffic. The algorithm
 656 stores each rate as bidirectional rate (internally, the algorithm is
 657 agnostic to flows and directions, it only cares about overall counts of
 658 packets sent and packets lost), but debug output from traffic generator
 659 lists unidirectional values.
 660
 661 ## Measurement Delay
 662
 663 In a sample implemenation in FD.io CSIT project, there is roughly 0.5
 664 second delay between trials due to restrictons imposed by packet traffic
 665 generator in use (T-Rex).
 666
 667 As measurements results come in, posterior distribution computation
 668 takes more time (per sample), although there is a considerable constant
 669 part (mostly for inverting the fitting functions).
 670
 671 Also, the integrator needs a fair amount of samples to reach the region
 672 the posterior distribution is concentrated at.
 673
 674 And of course, speed of the integrator depends on computing power of the
 675 CPUs the algorithm is able to use.
 676
 677 All those timing related effects are addressed by arithmetically
 678 increasing trial durations with configurable coefficients (currently 5.1
 679 seconds for the first trial, each subsequent trial being 0.1 second
 680 longer).
 681
 682 ## Rounding Errors and Underflows
 683
 684 In order to avoid them, the current implementation tracks natural
 685 logarithm (instead of the original quantity) for any quantity which is
 686 never negative. Logarithm of zero is minus infinity (not supported by
 687 Python), so special value "None" is used instead. Specific functions for
 688 frequent operations (such as "logarithm of sum of exponentials") are
 689 defined to handle None correctly.
 690
 691 ## Fitting Functions
 692
 693 Current implementation uses two fitting functions. In general, their
 694 estimates for critical rate differ, which adds a simple source of
 695 systematic error, on top of posterior dispersion reported by integrator.
 696 Otherwise the reported stdev of critical rate estimate is
 697 unrealistically low.
 698
 699 Both functions are not only increasing, but also convex (meaning the
 700 rate of increase is also increasing).
 701
 702 As Primitive Function to any positive function is an increasing
 703 function, and Primitive Function to any increasing function is convex
 704 function; both fitting functions were constructed as double Primitive
 705 Function to a positive function (even though the intermediate increasing
 706 function is easier to describe).
 707
 708 As not any function is integrable, some more realistic functions
 709 (especially with respect to behavior at very small offered loads) are
 710 not easily available.
 711
 712 Both fitting functions have a "central point" and a "spread", varied by
 713 simply shifting and scaling (in x-axis, the offered load direction) the
 714 function to be doubly integrated. Scaling in y-axis (the loss rate
 715 direction) is fixed by the requirement of transfer rate staying nearly
 716 constant in very high offered loads.
 717
 718 In both fitting functions (as they are a double Primitive Function to a
 719 symmetric function), the "central point" turns out to be equal to the
 720 aforementioned limiting transfer rate, so the fitting function parameter
 721 is named "mrr", the same quantity CSIT Maximum Receive Rate tests are
 722 designed to measure.
 723
 724 Both fitting functions return logarithm of loss rate, to avoid rounding
 725 errors and underflows. Parameters and offered load are not given as
 726 logarithms, as they are not expected to be extreme, and the formulas are
 727 simpler that way.
 728
 729 Both fitting functions have several mathematically equivalent formulas,
 730 each can lead to an overflow or underflow in different places. Overflows
 731 can be eliminated by using different exact formulas for different
 732 argument ranges. Underflows can be avoided by using approximate formulas
 733 in affected argument ranges, such ranges have their own formulas to
 734 compute. At the end, both fitting function implementations contain
 735 multiple "if" branches, discontinuities are a possibility at range
 736 boundaries.
 737
 738 ### Stretch Function
 739
 740 The original function (before applying logarithm) is Primitive Function
 741 to Logistic Function. The name "stretch" is used for related a function
 742 in context of neural networks with sigmoid activation function.
 743
 744 Formula for stretch fitting function: average loss rate (r) computed from
 745 offered load (b), mrr parameter (m) and spread parameter (a),
 746 given as InputForm of Wolfram language:
 747
 748     r = (a*(1 + E^(m/a))*Log[(E^(b/a) + E^(m/a))/(1 + E^(m/a))])/E^(m/a)
 749
 750 ### Erf Function
 751
 752 The original function is double Primitive Function to Gaussian Function.
 753 The name "erf" comes from error function, the first primitive to
 754 Gaussian.
 755
 756 Formula for erf fitting function: average loss rate (r) computed from
 757 offered load (b), mrr parameter (m) and spread parameter (a),
 758 given as InputForm of Wolfram language:
 759
 760     r = ((a*(E^(-((b - m)^2/a^2)) - E^(-(m^2/a^2))))/Sqrt[Pi] + m*Erfc[m/a]
 761         + (b - m)*Erfc[(-b + m)/a])/(1 + Erf[m/a])
 762
 763 ## Prior Distributions
 764
 765 The numeric integrator expects all the parameters to be distributed
 766 (independently and) uniformly on an interval (-1, 1).
 767
 768 As both "mrr" and "spread" parameters are positive and not
 769 dimensionless, a transformation is needed. Dimentionality is inherited
 770 from max_rate value.
 771
 772 The "mrr" parameter follows a Lomax Distribution with alpha equal to
 773 one, but shifted so that mrr is always greater than 1 packet per second.
 774
 775 The "stretch" parameter is generated simply as the "mrr" value raised to
 776 a random power between zero and one; thus it follows a Reciprocal
 777 Distribution.
 778
 779 ## Integrator
 780
 781 After few measurements, the posterior distribution of fitting function
 782 arguments gets quite concentrated into a small area. The integrator is
 783 using Monte Carlo with Importance Sampling where the biased distribution
 784 is Bivariate Gaussian distribution, with deliberately larger variance.
 785 If the generated sample falls outside (-1, 1) interval, another sample
 786 is generated.
 787
 788 The the center and the covariance matrix for the biased distribution is
 789 based on the first and second moments of samples seen so far (within the
 790 computation), with the following additional features designed to avoid
 791 hyper-focused distributions.
 792
 793 Each computation starts with the biased distribution inherited from the
 794 previous computation (zero point and unit covariance matrix is used in
 795 the first computation), but the overal weight of the data is set to the
 796 weight of the first sample of the computation. Also, the center is set
 797 to the first sample point. When additional samples come, their weight
 798 (including the importance correction) is compared to the weight of data
 799 seen so far (within the computation). If the new sample is more than one
 800 e-fold more impactful, both weight values (for data so far and for the
 801 new sample) are set to (geometric) average if the two weights. Finally,
 802 the actual sample generator uses covariance matrix scaled up by a
 803 configurable factor (8.0 by default).
 804
 805 This combination showed the best behavior, as the integrator usually
 806 follows two phases. First phase (where inherited biased distribution or
 807 single big sasmples are dominating) is mainly important for locating the
 808 new area the posterior distribution is concentrated at. The second phase
 809 (dominated by whole sample population) is actually relevant for the
 810 critical rate estimation.
 811
 812 ## Offered Load Selection
 813
 814 First two measurements are hardcoded to happen at the middle of rate interval
 815 and at max_rate. Next two measurements follow MRR-like logic,
 816 offered load is decreased so that it would reach target loss ratio
 817 if offered load decrease lead to equal decrease of loss rate.
 818
 819 Basis for offered load for next trial measurements is the integrated average
 820 of current critical rate estimate, averaged over fitting function.
 821
 822 There is one workaround implemented, aimed at reducing the number of consequent
 823 zero loss measurements. The workaround first stores every measurement
 824 result which loss ratio was the targed loss ratio or higher.
 825 Sorted list (called lossy loads) of such results is maintained.
 826
 827 When a sequence of one or more zero loss measurement results is encountered,
 828 a smallest of lossy loads is drained from the list.
 829 If the estimate average is smaller than the drained value,
 830 a weighted average of this estimate and the drained value is used
 831 as the next offered load. The weight of the drained value doubles
 832 with each additional consecutive zero loss results.
 833
 834 This behavior helps the algorithm with convergence speed,
 835 as it does not need so many zero loss result to get near critical load.
 836 Using the smallest (not drained yet) of lossy loads makes it sure
 837 the new offered load is unlikely to result in big loss region.
 838 Draining even if the estimate is large enough helps to discard
 839 early measurements when loss hapened at too low offered load.
 840 Current implementation adds 4 copies of lossy loads and drains 3 of them,
 841 which leads to fairly stable behavior even for somewhat inconsistent SUTs.
 842
 843 # IANA Considerations
 844
 845 No requests of IANA.
 846
 847 # Security Considerations
 848
 849 Benchmarking activities as described in this memo are limited to
 850 technology characterization of a DUT/SUT using controlled stimuli in a
 851 laboratory environment, with dedicated address space and the constraints
 852 specified in the sections above.
 853
 854 The benchmarking network topology will be an independent test setup and
 855 MUST NOT be connected to devices that may forward the test traffic into
 856 a production network or misroute traffic to the test management network.
 857
 858 Further, benchmarking is performed on a "black-box" basis, relying
 859 solely on measurements observable external to the DUT/SUT.
 860
 861 Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
 862 benchmarking purposes.  Any implications for network security arising
 863 from the DUT/SUT SHOULD be identical in the lab and in production
 864 networks.
 865
 866 # Acknowledgements
 867
 868 ..
 869
 870 --- back