From ee28e8ae476c6b0c098cd3895c586316feb4bdb9 Mon Sep 17 00:00:00 2001
From: Vratko Polak <vrpolak@cisco.com>
Date: Tue, 24 Oct 2023 11:48:29 +0200
Subject: [PATCH] feat(MLRsearch): version 05 of the IETF draft

Editing not finished, but we ran out of time.

Change-Id: I169c672eec731c5bce083af590be01222f99fb4c
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
---
 docs/ietf/draft-ietf-bmwg-mlrsearch-04.md | 1479 -----------------------------
 docs/ietf/draft-ietf-bmwg-mlrsearch-05.md | 1460 ++++++++++++++++++++++++++++
 docs/ietf/process.txt                     |    2 +-
 3 files changed, 1461 insertions(+), 1480 deletions(-)
 delete mode 100644 docs/ietf/draft-ietf-bmwg-mlrsearch-04.md
 create mode 100644 docs/ietf/draft-ietf-bmwg-mlrsearch-05.md

diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-04.md b/docs/ietf/draft-ietf-bmwg-mlrsearch-04.md
deleted file mode 100644
index 4db8506131..0000000000
--- a/docs/ietf/draft-ietf-bmwg-mlrsearch-04.md
+++ /dev/null
@@ -1,1479 +0,0 @@
----
-title: Multiple Loss Ratio Search
-abbrev: MLRsearch
-docname: draft-ietf-bmwg-mlrsearch-04
-date: 2023-07-10
-
-ipr: trust200902
-area: ops
-wg: Benchmarking Working Group
-kw: Internet-Draft
-cat: info
-
-coding: us-ascii
-pi:    # can use array (if all yes) or hash here
-  toc: yes
-  sortrefs:   # defaults to yes
-  symrefs: yes
-
-author:
-      -
-        ins: M. Konstantynowicz
-        name: Maciek Konstantynowicz
-        org: Cisco Systems
-        email: mkonstan@cisco.com
-      -
-        ins: V. Polak
-        name: Vratko Polak
-        org: Cisco Systems
-        email: vrpolak@cisco.com
-
-normative:
-  RFC1242:
-  RFC2285:
-  RFC2544:
-  RFC9004:
-
-informative:
-  TST009:
-    target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf
-    title: "TST 009"
-  FDio-CSIT-MLRsearch:
-    target: https://csit.fd.io/cdocs/methodology/measurements/data_plane_throughput/mlr_search/
-    title: "FD.io CSIT Test Methodology - MLRsearch"
-    date: 2022-11
-  PyPI-MLRsearch:
-    target: https://pypi.org/project/MLRsearch/0.4.0/
-    title: "MLRsearch 0.4.0, Python Package Index"
-    date: 2021-04
-
---- abstract
-
-This document proposes improvements to [RFC2544] throughput search by
-defining a new methodology called Multiple Loss Ratio search
-(MLRsearch). The main objectives for MLRsearch are to minimize the
-total test duration, search for multiple loss ratios and improve
-results repeatibility and comparability.
-
-The main motivation behind MLRsearch is the new set of challenges and
-requirements posed by testing Network Function Virtualization
-(NFV) systems and other software based network data planes.
-
-MLRsearch offers several ways to address these challenges, giving user
-configuration options to select their preferred way.
-
---- middle
-
-{::comment}
-    As we use kramdown to convert from markdown,
-    we use this way of marking comments not to be visible in rendered draft.
-    https://stackoverflow.com/a/42323390
-    If other engine is used, convert to this way:
-    https://stackoverflow.com/a/20885980
-{:/comment}
-
-# Purpose and Scope
-
-The purpose of this document is to describe Multiple Loss Ratio search
-(MLRsearch), a throughput search methodology optimized for software
-DUTs.
-
-Applying vanilla [RFC2544] throughput bisection to software DUTs
-results in a number of problems:
-
-- Binary search takes too long as most of trials are done far from the
-  eventually found throughput.
-- The required final trial duration (and pauses between trials) also
-  prolong the overall search duration.
-- Software DUTs show noisy trial results (noisy neighbor problem),
-  leading to big spread of possible discovered throughput values.
-- Throughput requires loss of exactly zero packets, but the industry
-  frequently allows for small but non-zero losses.
-- The definition of throughput is not clear when trial results are
-  inconsistent.
-
-MLRsearch aims to address these problems by applying the following set
-of enhancements:
-
-- Allow searching for multiple search goals, with differing goal loss ratios.
-  - Each trial result can affect any search goal in principle
-    (trial reuse).
-- Multiple preceding targets for each search goal, earlier ones need
-  to spend less time on trials.
-  - Earlier targets also aim at lesser precision.
-  - Use Forwarding Rate (FR) at maximum offered load
-    [RFC2285] (section 3.6.2) to initialize the initial targets.
-- Take care when dealing with inconsistent trial results.
-  - Loss ratios goals are handled in an order that minimizes the chance
-    of interference from later trials to earlier goals.
-- Apply several load selection heuristics to save even more time
-  by trying hard to avoid unnecessarily narrow bounds.
-
-MLRsearch configuration options are flexible enough to
-support both conservative settings (unconditionally compliant with [RFC2544],
-but longer search duration and worse repeatability) and aggressive
-settings (shorter search duration and better repeatability but not
-compliant with [RFC2544]).
-
-No part of [RFC2544] is intended to be obsoleted by this document.
-
-# Terminology
-
-When a subsection is defining a term, the first paragraph
-acts as a definition. Other paragraphs are treated as a description,
-they provide additional details without being needed to define the term.
-
-Definitions should form a directed acyclic graph of dependencies.
-If a section contains subsections, the section definition
-may depend on the subsection definitions.
-Otherwise, any definition may depend on preceding definitions.
-In other words, if the section definition were to come after subsections,
-there would be no forward dependencies for people reading just definitions
-from start to finish.
-
-Descriptions provide motivations and explanations,
-they frequently reference terms defined only later.
-Motivations in section descriptions are the reason
-why section text comes before subsection text.
-
-## General notions
-
-General notions are the terms defined in this section.
-
-It is useful to define the following notions
-before delving into MLRsearch architecture,
-as the notions appear in multiple places
-with no place being special enough to host definition.
-
-### General and specific quantities
-
-General quantity is a quantity that may appear multiple times
-in MLRsearch specification, perhaps each time in a different role.
-The quantity when appearing in a single role is called
-a specific quantity.
-
-It is useful to define the general quantity,
-so definitions of specific quantities may refer to it.
-We say a specific quantity is based on a general quantity,
-if the specific quantity definition refers to and
-relies on the general quantity definition.
-
-It is natural to name specific quantities by adding an adjective
-(or a noun) to the name of the general quantity.
-But existing RFCs typically explicitly define a term acting
-in a specific role, so the RFC name directly refers to a specific
-quantity, while the corresponding general quantity
-is defined only implicitly.
-Therefore this documents defines general quantities explicitly,
-even if the same term already appears in an RFC.
-
-In practice, it is required to know which unit of measurement
-is used to accompany a numeric value of each quantity.
-The choice of a particular unit of measurement is not important
-for MLRsearch specification though, so specific units
-mentioned in this document are just examples or recommendations,
-not requirements.
-
-When reporting, it is REQUIRED to state the units used.
-
-### Composite
-
-A composite is a set of named attributes.
-Each attribute is either a specific quantity or a composite.
-
-MLRsearch specification frequently groups multiple specific quantities
-into a composite. Description of such a composite brings an insight
-to motivations why this or other terms are defined as they are.
-Such insight will be harder to communicate
-with the specific quantities alone.
-
-Also, it simplifies naming of specific quantities, as they usually can
-share a noun or adjective referring to their common composite.
-Most of relations between composites and their specific quantities
-can be described using plain English.
-
-Perhaps the only exception involves referring to specific quantities
-as attributes. For example if there is a composite called 'target',
-and one of its specific quantities is 'target width' defined using
-a general quantity 'width', we can say 'width is one of target attributes'.
-
-### SUT
-
-As defined in RFC 2285:
-The collective set of network devices to which stimulus is offered
-as a single entity and response measured.
-
-While RFC 2544 mostly refers to DUT as a single
-(network interconnecting) device, section 19 makes it clear
-multiple DUTs can be treated as a single system,
-so most of RFC 2544 also applies to testing SUT.
-
-MLRsearch specification only refers to SUT (not DUT),
-even if it consists of just a single device.
-
-### Trial
-
-A trial is the part of test described in RFC 2544 section 23.
-
-When traffic has been sent and SUT response has been observed,
-we say the trial has been performed, or the trial has been measured.
-Before that happens, multiple possibilities for upcoming trial
-may be under consideration.
-
-### Load
-
-Intended, constant load for a trial, usually in frames per second.
-
-Load is the general quantity implied by Constant Load of RFC 1242,
-Data Rate of RFC 2544 and Intended Load of RFC 2285.
-All three specify this value applies to one (input or output) interface,
-so we can talk about unidirectional load also
-when bidirectional or multi-port traffic is applied.
-
-MLRsearch does not rely on this distinction, it works also if
-the load values correspond to an aggregate rate
-(sum over all SUT tested input or output interface unidirectional loads),
-as long as all loads share the same semantics.
-
-Several RFCs define useful quantities based on Offered Load
-(instead of Intended Load), but MLRsearch specification
-works only with (intended) load. Those useful quantities
-still serve as motivations for few specific quantities used in MLRsearch
-specification.
-
-MLRsearch assumes most load values are positive.
-For some (but not all) specific quantities based on load,
-zero may also be a valid value.
-
-### Duration
-
-Intended duration of the traffic for a trial, usually in seconds.
-
-This general quantity does not include any preparation nor waiting
-described in section 23 of RFC 2544.
-Section 24 of RFC 2544 places additional restrictions on duration,
-but those restriction apply only to some of the specific quantities based
-on duration.
-
-Duration is always positive in MLRsearch.
-
-### Duration sum
-
-For a specific set of trials, this is the sum of their durations.
-
-Some of specific quantities based on duration sum are derived quantities,
-without a specific set of trials to sum their durations.
-
-Duration sum is never negative in MLRsearch.
-
-### Width
-
-General quantity defined for an ordered pair (lower and higher)
-of load values, which describes a distance between the two values.
-
-The motivation for the name comes from binary search.
-The binary search tries to approximate an unknown value
-by repeatedly bisecting an interval of possible values,
-until the interval becomes narrow enough.
-Width of the interval is a specific quantity
-and the termination condition compares that
-to another specific quantity acting as the threshold.
-The threshold value does not have a specific interval associated,
-but corresponds to a 'size' of the compared interval.
-As size is a word already used in definition of frame size,
-a more natural word describing interval is width.
-
-The MLRsearch specification does use (analogues of) upper bound
-and lower bound, but does not actually need to talk about intervals.
-Still, the intervals are implicitly there, so width is the natural name.
-
-Actually, there are two popular options for defining width.
-Absolute width is based on load, the value is the higher load
-minus the lower load.
-Relative width is dimensionless, the value is the absolute width
-divided by the higher load. As intended loads for trials are positive,
-relative width is between 0.0 (including) and 1.0 (excluding).
-
-Relative width as a threshold value may be useful for users
-who do not presume what is the typical performance of SUT,
-but absolute width may be a more familiar concept.
-
-MLRsearch specification does not prescribe which width has to be used,
-but widths MUST be either all absolute or all relative,
-and it MUST be clear from report which option was used
-(it is implied from the unit of measurement of any width value).
-
-### Loss ratio
-
-The loss ratio is a general quantity, dimensionless floating point value
-assumed to be between 0.0 and 1.0, both including.
-It is computed as the number of frames forwarded by SUT, divided by
-the number of frames that should have been forwarded during the trial.
-
-If the number of frames that should have been forwarded is zero,
-the loss ratio is considered to be zero
-(but it is better to use high enough loads to prevent this).
-
-Loss ratio is basically the same quantity as Frame Loss Rate of RFC 1242,
-just not expressed in percents.
-
-RFC1242 Frame Loss Rate:
-Percentage of frames that should have been forwarded
-by a network device under steady state (constant)
-load that were not forwarded due to lack of
-resources.
-
-(RFC2544 restricts Frame Loss Rate to a type of benchmark,
-for loads 100% of 'maximum rate', 90% and so on.)
-
-### Exceed ratio
-
-This general quantity is a dimensionless floating point value,
-defined using two duration sum quantities.
-One duration sum is referred to as the good duration sum,
-the other is referred to as the bad duration sum.
-The exceed ratio value is computed as the bad duration sum value
-divided by the sum of the two sums. If both sums are zero,
-the exceed ratio is undefined.
-
-As there are no negative duration sums in MLRsearch,
-exceed ratio values are between 0.0 and 1.0 (both including).
-
-## Architecture
-
-MLRsearch architecture consists of three main components:
-the manager, the controller and the measurer.
-
-The search algorithm is implemented in the controller,
-and it is the main focus of this document.
-
-Most implementation details of the manager and the measurer are
-out of scope of this document, except when describing
-how do those components interface with the controller.
-
-### Manager
-
-The manager is the component that initializes SUT, traffic generator
-(called tester in RFC 2544), the measurer and the controller
-with intended configurations. It then handles the execution
-to the controller and receives its result.
-
-Managers can range from simple CLI utilities to complex
-Continuous Integration systems. From the controller point of view
-it is important that no additional configuration (nor warmup)
-is needed for SUT and the measurer to perform trials.
-
-The interface between the manager and the controller
-is defined in the controller section.
-
-One execution of the controller is called a search.
-Some benchmarks may execute multiple searches on the same SUT
-(for example when confirming the performance is stable over time),
-but in this document only one invocation is concerned
-(others may be understood as the part of SUT preparation).
-
-Creation of reports of appropriate format can also be understood
-as the responsibility of the manager. This document places requirements
-on which information has to be reported.
-
-### Measurer
-
-The measurer is the component which performs one trial
-as described in RFC 2544 section 23, when requested by the controller.
-
-From the controller point of view, it is a function that accepts
-trial input and returns trial output.
-
-This is the only way the controller can interact with SUT.
-In practice, the measurer has to do subtle decisions
-when converting the observed SUT behavior into a single
-trial loss ratio value. For example how to deal with
-out of order frames or duplicate frames.
-
-On software implementation level, the measurer is a callable,
-injected by the manager into the controller instance.
-
-The act of performing one trial (act of turning trial input
-to trial output) is called a measurement, or trial measurement.
-This way we can talk about trials that were measured already
-and trials that are merely planned (not measured yet).
-
-#### Trial input
-
-The load and duration to use in an upcoming trial.
-
-This is a composite.
-
-Other quantities needed by the measurer are assumed to be constant
-and set up by the manager before search starts (see traffic profile),
-so they do not count as trial input attributes.
-
-##### Trial load
-
-Trial load is the intended load for the trial.
-
-This is a specific quantity based on load,
-directly corresponding to RFC 2285 intended load.
-
-##### Trial duration
-
-Trial duration is the intended duration for the trial.
-
-This is a specific quantity based on duration, so it specifies
-only the traffic part of the trial, not the waiting parts.
-
-#### Traffic profile
-
-Any other configuration values needed by the measurer to perform a trial.
-
-The measurer needs both trial input and traffic profile to perform the trial.
-As trial input contains the only values that vary during one the search,
-traffic profile remains constant during the search.
-
-Traffic profile when understood as a composite is REQUIRED by RFC 2544
-to contain some specific quantities (for example frame size).
-Several more specific quantities may be RECOMMENDED.
-
-Depending on SUT configuration (e.g. when testing specific protocols),
-additional values need to be included in the traffic profile
-and in the test report. (See other IETF documents.)
-
-#### Trial ouput
-
-A composite consisting of trial loss ratio
-and trial forwarding rate.
-
-Those are the only two specific quantities (among other quantities
-possibly measured in the trial, for example offered load)
-that are important for MLRsearch.
-
-##### Trial loss ratio
-
-Trial loss ratio is a specific quantity based on loss ratio.
-The value is related to a particular measured trial,
-as measured by the measurer.
-
-##### Trial forwarding rate
-
-Trial forwarding rate is a derived quantity.
-It is computed as one minus trial loss ratio,
-that multiplied by trial load.
-
-Despite the name, the general quantity this specific quantity
-corresponds to is load (not rate).
-The name is inspired by RFC 2285, which defines Forwarding Rate
-specific to one output interface.
-
-As the definition of loss ratio is not neccessarily per-interface
-(one of details left for the measurer), using the definition above
-(instead of RFC 2285) makes sure trial forwarding rate
-is always between zero and the trial load (both including).
-
-#### Trial result
-
-Trial result is a composite consisting of trial input attributes
-and trial output attributes.
-
-Those are all specific quantites related to a measured trial MLRsearch needs.
-
-While distinction between trial input and output is important
-when defining the interface between the controller and the measurer,
-it is easier to talk about trial result
-when describing how measured trials influence the controller behavior.
-
-### Controller
-
-The component of MLRsearch architecture that calls the measurer
-and returns conditional throughputs to the manager.
-
-This component implements the search algorithm,
-the main content of this document.
-
-Contrary to Throughput as defined in RFC 1242,
-the definition of conditional throughput is quite sensitive
-to the controller input (as provided by the manager),
-and its full definition needs several terms
-which would otherwise be hidden as internals of the controller
-implementation.
-
-The ability of conditional throughput to be less sensitive
-to performance variance, and the ability of the controller
-to find conditional throughputs for multiple search goals
-within one search (and in short overall search time)
-are strong enough motivations for the need of increased complexity.
-
-### Controller input
-
-A composite of max load, min load, and a set of search goals.
-
-The search goals (as elements of the set of search goals)
-are usually not named and unordered.
-
-It is fine if all search goals of the set have the same value
-of a particular attribute. In that case, the common value
-may be treated as a global attribute (similarly to max and min load).
-
-The set of search goals MUST NOT be empty.
-Two search goals within the set MUST differ in at least one attribute.
-The manager MAY avoid both issues by presenting empty report
-or de-duplicating the search goals, but it is RECOMMENDED
-for the manager to raise an error to its caller,
-as the two conditions suggest the test is improperly configured.
-
-#### Max load
-
-Max load is a specific quantity based on load.
-No trial load is ever higher than this value.
-
-RFC 2544 section 20 defines maximum frame rate
-based on theoretical maximum rate for the frame size on the media.
-RFC 2285 section 3.5.3 specifies Maximum offered load (MOL)
-which may be lower than maximum frame rate.
-There may be other limitations preventing high loads,
-for examples resources available to traffic generator.
-
-The manager is expected to provide a value that is not greater
-than any known limitation. Alternatively, the measurer
-is expected to work at max load, possibly reporting as lost
-any frames that were not able to leave Traffic Generator.
-
-From the controller point of view, this is merely a global upper limit
-for any trial load candidates.
-
-#### Min load
-
-Min load is a specific quantity based on load.
-No trial load is ever lower than this value.
-
-The motivation of this quantity is to prevent trials
-with too few frames sent to SUT.
-
-Also, practically if a SUT is able to reach only very small
-forwarding rates (min load indirectly serves as a threshold for how small),
-it may be considered faulty (or perhaps the test is misconfigured).
-
-#### Search goal
-
-A composite of 7 attributes (see subsections).
-
-If not otherwise specified, 'goal' always refers to a search goal
-in this document.
-
-The controller input may contain multiple search goals.
-The name Multiple Loss Ratio search was created back when
-goal loss ratio was the only attribute allowed to vary between goals.
-
-Each goal will get its conditional throughput discovered
-and reported at the end of the search.
-
-The definitions of the 7 attributes are not very informative by themselves.
-Their motivation (and naming) becomes more clear
-from the impact they have on conditional throughput.
-
-##### Goal loss ratio
-
-A specific quantity based on loss ratio.
-A threshold value for trial loss ratios.
-MUST be lower than one.
-
-Trial loss ratio values will be compared to this value,
-a trial will be considered bad if its loss ratio is higher than this.
-
-For example, RFC 2544 throughput has goal loss ratio of zero,
-a trial is bad once a sigle frame is lost.
-
-Loss ratio of one would classify each trial as good (regardless of loss),
-which is not useful.
-
-##### Goal initial trial duration
-
-A specific quantity based on duration.
-A threshold value for trial durations.
-MUST be positive.
-
-MLRsearch is allowed to use trials as short as this when focusing
-on this goal.
-The conditional throughput may be influenced by shorter trials,
-(measured when focusing on other search goals).
-
-{::comment}
-    FIXME: Should shorter trials be explicitly ignored?
-{:/comment}
-
-##### Goal final trial duration
-
-A specific quantity based on duration.
-A threshold value for trial durations.
-MUST be no smaller than goal initial trial duration.
-
-MLRsearch is allowed to use trials as long as this when focusing
-on this goal. If more data is needed, repeated trials
-at the same load and duration are requested by the controller.
-
-##### Goal min duration sum
-
-A specific quantity based on duration sum.
-A threshold value for a particular duration sum.
-
-MLRsearch requires at least this amount of (effective) trials
-for a particular load to become part of MLRsearch outputs.
-
-It is possible (though maybe not prectical) for goal min duration sum
-to be smaller than goal final trial duration.
-
-In practice, the sum of durations actually spent on trial measurement
-can be smaller (when trial results are quite one-sided) or even larger
-(in presence of shorter-than-final trial duration results at the same load).
-
-If the sum of all (good and bad) long trials is at least this,
-and there are no short trials, then the load is guaranteed
-to be classified as either an upper or a lower bound.
-
-In some cases, the classification is known sooner,
-when the 'missing' trials cannot change the outcome.
-
-When short trials are present, the logic is more complicated.
-
-##### Goal exceed ratio
-
-A specific quantity based on exceed ratio.
-A threshold value for particulat sets of trials.
-
-An attribute used for classifying loads into upper and lower bounds.
-
-If the duration sum of all (current duration) trials is at least
-min duration sum, and more than this percentage of the duration sum
-comes from bad trials, this load is an upper bound.
-
-If there are shorter duration trials, the logic is more complicated.
-
-##### Goal width
-
-A specific quantity based on width.
-A threshold value for a particular width.
-MUST be positive.
-
-This defines the exit condition for this search goal.
-
-Relevant bounds (of the final target) need to be this close
-before conditional throughput can be reported.
-
-##### Preceding targets
-
-A non-negative integer affecting the behavior of the controller.
-
-How many additional non-final targets to add.
-Each next preceding target has double width
-and min duration sum geometrically closer to initial trial duration.
-
-The usage of preceding targets is an important source
-of MLRsearch time savings (compared to simpler search algorithms).
-
-Having this value configurable lets the manager
-tweak the overall search duration based on presumed knowledge
-of SUT performance stability.
-
-### Controller internals
-
-Terms not directly corresponding to the controller's input nor output,
-but needed indirectly as dependencies of the conditional throughput
-definition.
-
-Following these definitions specifies virtually all of the controller
-(MLRsearch algorithm) logic.
-
-#### Pre-initial trials
-
-Up to three special trials executed at the start of the search.
-The first trial load is max load,
-subsequent trial load are computed from preceding trial
-forwarding rate.
-
-The main loop of the controller logic needs at least one trial result,
-and time is saved if the trial results are close to future conditional
-throughput values.
-
-The exact way to compute load for second and third trial
-(and whether even measure second or third trial)
-are not specified here, as the implementation details
-have negligible effect on the reported conditional throughput.
-
-{::comment}
-    TODO: Still, recommend something like this:
-    Loads need to fit several different initial targets at once.
-    Duration is the largest among initial trial durations,
-    loads are computed from forwarding rate an smallest loss ratio goal.
-    Also, the initial target current width is set based on these.
-{:/comment}
-
-#### Search target
-
-A composite of 5 specific quantites (see subsections).
-Frequently called just target.
-
-Similar to (but distinct from) the search goal.
-
-Each search goal prescribes a final target,
-probably with a chain of preceding targets.
-
-More details in the Derived targets section.
-
-##### Target loss ratio
-
-Same as loss ratio of the corresponding goal.
-
-##### Target exceed ratio
-
-Same as exceed ratio of the corresponding goal.
-
-##### Target width
-
-Similar to goal width attribute.
-Doubled from goal width for each level of preceding target.
-
-##### Target min duration sum
-
-Similar to goal min duration sum attribute.
-Geometrically interpolated between
-initial target duration and goal min duration sum.
-
-##### Target trial duration
-
-When MLRsearch focuses on this target, it measures trials
-with this duration.
-The value is equal to the minimum of goal final trial duration
-and target min duration sum.
-
-Also, this value is used to classify trial results
-as short (if trial duration is shorter than this) or long.
-
-#### Derived targets
-
-After receiving the set of search goals,
-MLRsearch internally derives a set of search targets.
-
-The derived targets can be seen as forming a chain,
-from initial target to final target.
-The chain is linked by a reference from a target to its preceding
-(towarsds initial) target.
-
-The reference may be implemented as 6th attribute od target.
-
-##### Final target
-
-The final target is the target where the most of attribute values
-are directly copied from the coresponding search goal.
-Final target width is the same as goal width,
-final target trial duration is the same as goal final trial duration,
-and final target min duration sum is the same
-as the goal min duration sum.
-
-The conditional throughput is found when focusing on the final target.
-All non-final targets do not directly affect the conditional throughput,
-they are there just as an optimization.
-
-##### Preceding target
-
-Each target may have a preceding target.
-Goal attribute Preceding targets governs how many targets are created
-in addition to the final target corresponding to the search goal.
-
-Any preceding target has double width, meaning one balanced bisection
-is needed to reduce preceding target width to the next target width.
-
-Preceding target min duration sum is exponentially smaller,
-aiming for prescribed initial target min duration sum.
-
-Preceding target trial duration is either its min duration sum,
-or the corresponding goal's final trial duration, whichever is smaller.
-
-As the preceding min duration sum is shorter than the next duration sum,
-MLRsearch is able to achieve the preceding target width
-sooner (than with the next target min duration sum).
-
-This way an approximation of the conditional throughput is found,
-with the next target needing not as much time to improve the approximation
-(compared to not starting with the approximation).
-
-##### Initial target
-
-Initial target is a target without any other target preceding it.
-Initial target min duration sum is equal to the corresponding goal's
-initial trial duration.
-
-As a consequence, initial target trial duration is equal to its min duration sum.
-
-#### Trial classification
-
-Any trial result can be classified according to any target along two axes.
-
-The two classifications are independent.
-
-This classification is important for defining the conditional throughput.
-
-##### Short trial
-
-If the (measured) trial duration is shorter than
-the target trial duration, the trial is called long.
-
-##### Long trial
-
-If the (measured) trial duration is not shorter than
-the target trial duration, the trial is called long.
-
-##### Bad trial
-
-If the (measured) trial loss ratio is larger than the target loss ratio,
-the trial is called bad.
-
-For example, if the target loss ratio is zero,
-a trial is bad as soon as one frame was lost.
-
-##### Good trial
-
-If the (measured) trial loss ratio is not larger than the target loss ratio,
-the trial is called good.
-
-For example, if the target loss ratio is zero,
-a trial is good only when there were no frames lost.
-
-#### Load stat
-
-A composite of 8 quantities (see subsections)
-The quantites depend on a target and a load,
-and are computed from all trials measured at that load so far.
-
-The MLRsearch output is the conditional througput,
-which is a specific quantity based on load.
-As MLRsearch may measure multiple trials at the same load,
-and those trials may not have the same duration,
-we need a way to classify a set of trial results at the same load.
-
-As the logic is not as straightforward as in other parts
-of MLRsearch algorithm, it is best defined using the following
-derived quantities.
-
-Load stat is the composite for one load and one target.
-Set of load stats for one load an all targets is commonly called load stats.
-
-##### Long good duration sum
-
-Sum of durations of all long good trials
-(at this load, according to this target).
-
-##### Long bad duration sum
-
-Sum of durations of all long bad trials
-(at this load, according to this target).
-
-##### Short good duration sum
-
-Sum of durations of all short good trials
-(at this load, according to this target).
-
-##### Short bad duration sum
-
-Sum of durations of all short bad trials
-(at this load, according to this target).
-
-##### Effective bad duration sum
-
-One divided by tagret exceed ratio, that plus one.
-Short good duration sum divided by that.
-Short bad duration sum minus that, or zero if that would be negative.
-Long bad duration sum plus that is the effective bad duration sum.
-
-Effective bad duration sum is the long bad duration sum
-plus some fraction of short bad duration sum.
-The fraction is between zero and one (both possibly including).
-
-If there are no short good trials, effective bad duration sum
-becomes the duration sum of all bad trials (long or short).
-
-If an exceed ratio computed from short good duration sum
-and short bad duration sum is equal or smaller than the target exceed ratio,
-effective bad duration sum is equal to just long bad duration sum.
-
-Basically, short good trials can only lessen the impact
-of short bad trials, while short bad trials directly contribute
-(unless lessened).
-
-A typical example of why a goal needs higher final trial duration
-than initial trial duration is when SUT is expected to have large buffers,
-so a trial may be too short to see frame losses due to
-a buffer becoming full. So a short good trial does not give strong information.
-On the other hand, short bad trial is a strong hint SUT would lose many frames
-at that load and long duration.
-But if there is a mix of short bad and short good trials,
-MLRsearch should not cherry-pick only the short bad ones.
-
-The presented way of computing the effective bad duration sum
-aims to be a fair treatment of short good trials.
-
-If the target exceed ratio is zero, the given definition contains
-positive infinty as an intermediate value, but still simplifies
-to a finite result (long bad duration sum plus short bad duration sum).
-
-##### Missing duration sum
-
-The target min duration sum minus effective bad duration sum
-and minus long good duration sum, or zero if that would be negative.
-
-MLRsearch may need up to this duration sum of additional long trials
-before classifing the load.
-
-##### Optimistic exceed ratio
-
-The specific quantity based on exceed ratio, where bad duration sum is
-the effective bad duration sum, and good duration sum is
-the long good duration sum plus the missing duration sum.
-
-This is the value MLRsearch would compare to target exceed ratio
-assuming all of the missing duration sum ends up consisting of long good trials.
-
-If there was a bad long trial, optimistic exceed ratio
-becomes larger than zero.
-Additionally, if the target exceed ratio is zero, optimistic exceed ratio
-becomes larger than zero even on one short bad trial.
-
-##### Pessimistic exceed ratio
-
-The specific quantity based on exceed ratio, where bad duration sum is
-the effective bad duration sum plus the missing duration sum,
-and good duration sum is the long good duration sum.
-
-This is the value MLRsearch would compare to target exceed ratio
-assuming all of the missing duration sum ends up consisting of bad good trials.
-
-Note that if the missing duration sum is zero,
-optimistic exceed ratio becomes equal to pessimistic exceed ratio.
-
-This is the role target min duration sum has,
-it guarantees the two load exceed ratios eventually become the same.
-Otherwise, pessimistic exceed ratio
-is always bigger than the optimistic exceed ratio.
-
-Depending on trial results, the missing duration sum may not be large enough
-to change optimistic (or pessimistic) exceed ratio
-to move to the other side compared to target exceed ratio.
-In that case, MLRsearch does not need to measure more trials
-at this load when focusing on this target.
-
-#### Target bounds
-
-With respect to a target, some loads may be classified as upper or lower bound,
-and some of the bounds are treated as relevant.
-
-The subsequent parts of MLRsearch rely only on relevant bounds,
-without the need to classify other loads.
-
-##### Upper bound
-
-A load is classified as an upper bound for a target,
-if and only if both optimistic exceed ratio
-and pessimstic load exceed ratio are larger than the target exceed ratio.
-
-During the search, it is possible there is no upper bound,
-for example because every measured load still has too high
-missing duration sum.
-
-If the target exceed ratio is zero, and the load has at least one bad trial
-(short or long), the load becomes an upper bound.
-
-##### Lower bound
-
-A load is classified as a lower bound for a target,
-if and only if both optimistic exceed ratio
-and pessimstic load exceed ratio are no larger than the target exceed ratio.
-
-During the search, it is possible there is no lower bound,
-for example because every measured load still has too high
-missing duration sum.
-
-If the target exceed ratio is zero, all trials at the load of
-a lower bound must be good trials (short or long).
-
-Note that so far it is possible for a lower bound to be higher
-than an upper bound.
-
-##### Relevant upper bound
-
-For a target, a load is the relevant upper bound,
-if and only if it is an upper bound, and all other upper bounds
-are larger (as loads).
-
-In some cases, the max load when classified as a lower bound
-is also effectively treated as the relevant upper bound.
-(In that case both relevant bounds are equal.)
-
-If that happens for a final target at the end of the search,
-the controller output may contain max load as the relevant upper bound
-(even if the goal exceed ratio was not exceeded),
-signalling SUT performs well even at max load.
-
-If the target exceed ratio is zero, the relevant upper bound
-is the smallest load where a bad trial (short or long) has been measured.
-
-##### Relevant lower bound
-
-For a target, a load is the relevant lower bound if two conditions hold.
-Both optimistic exceed ratio and pessimstic load exceed ratio
-are no larger than the target exceed ratio,
-and there is no smaller load classified as an upper bound.
-
-This is a second place where MLRsearch is not symmetric
-(the first place was effective bad duration sum).
-
-While it is not likely for a MLRsearch to find a smaller upper bound
-and a larger load satisfying first condition for the lower bound,
-it still may happen and MLRsearch has to deal with it.
-The second condition makes sure the relevant lower bound
-is smaller than the relevant upper bound.
-
-In some cases, the min load when classified as an upper bound
-is also effectively treated as the relevant lower bound.
-(In that case both relevant bounds are equal.)
-
-If that happens for a final target at the end of the search,
-the controller output may contain min load as the relevant lower bound
-even if the exceed ratio was 'overstepped',
-signalizing the SUT does not even reach the minimal required performance.
-
-The manager has to make sure this is distingushed in report
-from cases where min rate is a legitimate conditional throughput
-(e.g. the exceed ratio was not overstepped at the min load).
-
-##### Relevant bounds
-
-The pair of the relevant lower bound and the relevant upper bound.
-
-Useful for determining the width of the relevant bounds.
-Any of the bounds may be the effective one (max load or min load).
-
-A goal is achieved (at the end of the search) when the final target's
-relevant bounds have width no larger than the goal width.
-
-#### Candidate selector
-
-A stateful object (a finite state machine)
-focusing on a single target, used to determine next trial input.
-
-Initialized for a pair of targets:
-the current target and its preceding target (if any).
-
-Private state (not shared with other selectors) consists of mode and flags.
-Public state (shared with all selectors) is the actual relevant bounds
-for both targets (current and precedinig).
-
-After accepting a trial result, each selector can nominate
-one candidate (or no candidate) for the next trial measurement.
-
-##### Current target
-
-This is the target this selector tries to achieve.
-
-##### Preceding target
-
-The target (if any) preceding to the current target.
-
-While this selector does not focus on the preceding target,
-the relevant bounds for the preceding target are used as hints
-when the current bound does not have enough of its relevant bounds.
-
-##### Candidate
-
-The trial input (if any) this selecor nominates.
-
-The trial duration attribute is always the current target trial duration.
-The trial load attribute depends on the selector state.
-
-Candidates have defined ordering, to simplify finding the winner.
-If load differs, the candidate with lower load is preferred.
-If load is the same but duration differs, the candidate
-with larger duration is preferred.
-
-##### Selector mode
-
-During its lifetime, selector proceeds through the following modes.
-In order, but some modes may be skipped or revisited.
-
-Each mode has its own strategy of determining the candidate load (if any).
-
-###### Waiting
-
-Not enough relevant bounds (even for the preceding target).
-In this mode, the selector abstains from nominating a candidate.
-
-This selector leaves this mode when preceding target's selector is done.
-
-###### Halving
-
-Candidate is in the middle of the relevant bounds of the preceding target.
-
-If the relevant bounds are narrow enough already, this mode is skipped.
-As the preceding target had double width, just one halving load
-needs to be measured.
-
-Selector uses a flag to avoid re-entering this mode
-once it finished measuring the halved load.
-
-###### Upgrading
-
-This mode activates when one relevant bound for the current target is present
-and there is a matching relevant bound of the preceding target
-within the current target width.
-Candidate is the load of the matching bound from the preceding target.
-
-At most one bound load is measured, depending on halving outcome.
-Private flags are used to avoid upgrading at later times
-once selector finished measuring the upgraded load.
-
-###### Extending
-
-Refined already but the other relevant bound for the current target
-is still missing.
-Nominate new candidate according to external search.
-Initial target selectors skip all previous modes.
-
-A private value is used to track the width to be used in next load extension
-(increasing geometrically).
-For initial target selectors, the starting width may be chosen
-based on pre-initial trial results.
-
-If both relevant bounds are present at the current load,
-but the lower bound is far away (compared to tracked width),
-the candidate from this mode is preferred (as long as the load
-is larger than the candidate load of bisecting mode).
-
-###### Bisecting
-
-Both relevant bounds for the current target are available, but they are too far
-from each other. Candidate is in the middle.
-
-Contrary to halving, the candidate load does not need to be at the exact middle.
-For example if the width of the current relevant bounds
-is three times as large as the target width,
-it is advantageous to split the interval in 1:2 ratio
-(choosing the lower candidate load), as it can save one bisect.
-
-###### Done
-
-Both relevant bounds for the current target are available,
-the width is no larger than the target width.
-No candidate.
-
-If a selector reaches the done state,
-it is still possible later trials invalidate its relevant lower bound
-(by proving a lower load is in fact a new uper bound),
-making the selector transition into extending or bisecting mode.
-
-##### Active selector
-
-Derived from a common goal, the earliest selector which nominates a candidate
-is considered to be the active selector for this goal.
-Candidates from other selectors of the same goal are ignored.
-
-It is quite possible selectors focusing on other goals
-have already found a lower bound relevant to multiple targets in a chain.
-In that case, we want the most-initial of the target selectors
-(not already in done mode) to have the nomination.
-
-Otherwise (when in extending mode and missun relevant upper bound)
-the closer-to-final selectors would nominate candidates
-at lower load but at too high duration sum,
-preventing some of the time savings.
-
-##### Winner
-
-If the candidate previously nominated by a selector was the one
-that got measured, the candidate is called a winner.
-
-A selector observing its previous candidate was a winer
-can use simplified logic when determining the mode,
-as it knows no other selectors may have changed the relevant loads unexpectedly.
-
-### Controller output
-
-The output object the controller returns to the manager
-is a mapping assigning each search goal its conditional output (if it exists).
-
-The controller MAY include more information (if manager accepts it),
-for example load stat at relevant bounds.
-
-There MAY be several ways how to communicate the fact a conditional output
-does not exist (e.g. min load is classified as an upper bound).
-The manager MUST NOT present min load as a conditional output in that case.
-
-If max load is a lower bound, it leads to a valid conditional output value.
-
-#### Conditional throughput
-
-The conditional throughput is the average of trial forwarding rates
-across long good trials measured at the (offered load classified as)
-relevant lower bound (for the goal, at the end of the search).
-The average is the weighted arithmetic mean, weighted by trial duration.
-
-If the goal exceed ratio is zero, the definition of the relevant bounds
-simplifies significantly.
-If additionally the goal loss ratio is zero,
-and the goal min duration sum is equal to goal final trial duration,
-conditional throughput becomes conditionally compliant with RFC 2544 throughput.
-If the goal final trial duration is at least 60 seconds,
-the conditional througput becomes unconditionally compliant
-with RFC 2544 throughput.
-
-# Problems
-
-## Long Test Duration
-
-Emergence of software DUTs, with frequent software updates and a
-number of different packet processing modes and configurations, drives
-the requirement of continuous test execution and bringing down the test
-execution time.
-
-In the context of characterising particular DUT's network performance, this
-calls for improving the time efficiency of throughput search.
-A vanilla bisection (at 60sec trial duration for unconditional [RFC2544]
-compliance) is slow, because most trials spend time quite far from the
-eventual throughput.
-
-[RFC2544] does not specify any stopping condition for throughput search,
-so users can trade-off between search duration and achieved precision.
-But, due to exponential behavior of bisection, small improvement
-in search duration needs relatively big sacrifice in the throughput precision.
-
-## DUT within SUT
-
-[RFC2285] defines:
-- *DUT* as
-  - The network forwarding device to which stimulus is offered and
-    response measured [RFC2285] (section 3.1.1).
-- *SUT* as
-  - The collective set of network devices to which stimulus is offered
-    as a single entity and response measured [RFC2285] (section 3.1.2).
-
-[RFC2544] specifies a test setup with an external tester stimulating the
-networking system, treating it either as a single DUT, or as a system
-of devices, an SUT.
-
-In case of software networking, the SUT consists of a software program
-processing packets (device of interest, the DUT),
-running on a server hardware and using operating system functions as appropriate,
-with server hardware resources shared across all programs
-and the operating system.
-
-DUT is effectively "nested" within SUT.
-
-Due to a shared multi-tenant nature of SUT, DUT is subject to
-interference (noise) coming from the operating system and any other
-software running on the same server. Some sources of noise can be
-eliminated (e.g. by pinning DUT program threads to specific CPU cores
-and isolating those cores to avoid context switching). But some
-noise remains after all such reasonable precautions are applied. This
-noise does negatively affect DUT's network performance. We refer to it
-as an *SUT noise*.
-
-DUT can also exhibit fluctuating performance itself, e.g. while performing
-some "stop the world" internal stateful processing. In many cases this
-may be an expected per-design behavior, as it would be observable even
-in a hypothetical scenario where all sources of SUT noise are
-eliminated. Such behavior affects trial results in a way similar to SUT
-noise. We use *noise* as a shorthand covering both *DUT fluctuations* and
-genuine SUT noise.
-
-A simple model of SUT performance consists of a baseline *noiseless performance*,
-and an additional noise. The baseline is assumed to be constant (enough).
-The noise varies in time, sometimes wildly. The noise can sometimes be negligible,
-but frequently it lowers the observed SUT performance in a trial.
-
-In this model, SUT does not have a single performance value, it has a spectrum.
-One end of the spectrum is the noiseless baseline,
-the other end is a *noiseful performance*. In practice, trial results
-close to the noiseful end of the spectrum happen only rarely.
-The worse performance, the more rarely it is seen in a trial.
-
-Focusing on DUT, the benchmarking effort should aim
-at eliminating only the SUT noise from SUT measurement.
-But that is not really possible, as there are no realistic enough models
-able to distinguish SUT noise from DUT fluctuations.
-
-However, assuming that a well-constructed SUT has the DUT as its
-performance bottleneck, the "DUT noiseless performance" can be defined
-as the noiseless end of SUT performance spectrum. (At least for
-throughput. For other quantities such as latency there will be an
-additive difference.) By this definition, DUT noiseless performance
-also minimizes the impact of DUT fluctuations.
-
-In this document, we reduce the "DUT within SUT" problem to estimating
-the noiseless end of SUT performance spectrum from a limited number of
-trial results.
-
-Any improvements to throughput search algorithm, aimed for better
-dealing with software networking SUT and DUT setup, should employ
-strategies recognizing the presence of SUT noise, and allow discovery of
-(proxies for) DUT noiseless performance
-at different levels of sensitivity to SUT noise.
-
-## Repeatability and Comparability
-
-[RFC2544] does not suggest to repeat throughput search. And from just one
-throughput value, it cannot be determined how repeatable that value is.
-In practice, poor repeatability is also the main cause of poor
-comparability, e.g. different benchmarking teams can test the same SUT
-but get different throughput values.
-
-[RFC2544] throughput requirements (60s trial, no tolerance to single frame loss)
-force the search to fluctuate close the noiseful end of SUT performance
-spectrum. As that end is affected by rare trials of significantly low
-performance, the resulting throughput repeatability is poor.
-
-The repeatability problem is the problem of defining a search procedure
-which reports more stable results
-(even if they can no longer be called "throughput" in [RFC2544] sense).
-According to baseline (noiseless) and noiseful model, better repeatability
-will be at the noiseless end of the spectrum.
-Therefore, solutions to the "DUT within SUT" problem
-will help also with the repeatability problem.
-
-Conversely, any alteration to [RFC2544] throughput search
-that improves repeatability should be considered
-as less dependent on the SUT noise.
-
-An alternative option is to simply run a search multiple times, and report some
-statistics (e.g. average and standard deviation). This can be used
-for "important" tests, but it makes the search duration problem even
-more pronounced.
-
-## Throughput with Non-Zero Loss
-
-[RFC1242] (section 3.17) defines throughput as:
-    The maximum rate at which none of the offered frames
-    are dropped by the device.
-
-and then it says:
-    Since even the loss of one frame in a
-    data stream can cause significant delays while
-    waiting for the higher level protocols to time out,
-    it is useful to know the actual maximum data
-    rate that the device can support.
-
-Contrary to that, many benchmarking teams settle with non-zero
-(small) loss ratio as the goal for a "throughput rate".
-
-Motivations are many: modern protocols tolerate frame loss better;
-trials nowadays send way more frames within the same duration;
-impact of rare noise bursts is smaller as the baseline performance
-can compensate somewhat by keeping the loss ratio below the goal;
-if SUT noise with "ideal DUT" is known, it can be set as the loss ratio goal.
-
-Regardless of validity of any and all similar motivations,
-support for non-zero loss goals makes any search algorithm more user-friendly.
-[RFC2544] throughput is not friendly in this regard.
-
-Searching for multiple goal loss ratios also helps to describe the SUT
-performance better than a single goal result. Repeated wide gap between
-zero and non-zero loss conditional throughputs indicates
-the noise has a large impact on the overall SUT performance.
-
-It is easy to modify the vanilla bisection to find a lower bound
-for intended load that satisfies a non-zero-loss goal,
-but it is not that obvious how to search for multiple goals at once,
-hence the support for multiple loss goals remains a problem.
-
-## Inconsistent Trial Results
-
-While performing throughput search by executing a sequence of
-measurement trials, there is a risk of encountering inconsistencies
-between trial results.
-
-The plain bisection never encounters inconsistent trials.
-But [RFC2544] hints about possibility if inconsistent trial results in two places.
-The first place is section 24 where full trial durations are required, presumably
-because they can be inconsistent with results from shorter trial durations.
-The second place is section 26.3 where two successive zero-loss trials
-are recommended, presumably because after one zero-loss trial
-there can be subsequent inconsistent non-zero-loss trial.
-
-Examples include:
-
-- a trial at the same load (same or different trial duration) results
-  in a different packet loss ratio.
-- a trial at higher load (same or different trial duration) results
-  in a smaller packet loss ratio.
-
-Any robust throughput search algorithm needs to decide how to continue
-the search in presence of such inconsistencies.
-Definitions of throughput in [RFC1242] and [RFC2544] are not specific enough
-to imply a unique way of handling such inconsistencies.
-
-Ideally, there will be a definition of a quantity which both generalizes
-throughput for non-zero-loss (and other possible repeatibility enhancements),
-while being precise enough to force a specific way to resolve trial
-inconsistencies.
-But until such definition is agreed upon, the correct way to handle
-inconsistent trial results remains an open problem.
-
-# How the problems are addressed
-
-Configurable loss ratio in MLRsearch search goals are there
-in direct support for non-zero-loss conditional throughput.
-In practice the conditional throughput results' stability
-increases with higher loss ratio goals.
-
-Multiple trials with noise tolerance enhancement,
-as implemented in MLRsearch using non-zero goal exceed ratio value,
-also indirectly increases the result stability.
-That allows MLRsearch to achieve all the benefits
-of Binary Search with Loss Verification,
-as recommended in [RFC9004] (section 6.2)
-and specified in [TST009] (section 12.3.3).
-
-The main factor improving the overall search time is the introduction
-of preceding targets. Less impactful time savings
-are achieved by pre-initial trials, halving mode
-and smart splitting in bisecting mode.
-
-In several places, MLRsearch is "conservative" when handling
-(potentially) inconsistent results. This includes the requirement
-for the relevant lower bound to be smaller than any upper bound,
-the unequal handling of good and bad short trials,
-and preference to lower load when choosing the winner among candidates.
-
-While this does no guarantee good search stability
-(goals focusing on higher loads may still invalidate existing bounds
-simply by requiring larger min duration sums),
-it lowers the change of SUT having an area of poorer performance
-below the reported conditional througput loads.
-In any case, the definition of conditional throughput
-is precise enough to dictate "conservative" handling
-of trial inconsistencies.
-
-# IANA Considerations
-
-No requests of IANA.
-
-# Security Considerations
-
-Benchmarking activities as described in this memo are limited to
-technology characterization of a DUT/SUT using controlled stimuli in a
-laboratory environment, with dedicated address space and the constraints
-specified in the sections above.
-
-The benchmarking network topology will be an independent test setup and
-MUST NOT be connected to devices that may forward the test traffic into
-a production network or misroute traffic to the test management network.
-
-Further, benchmarking is performed on a "black-box" basis, relying
-solely on measurements observable external to the DUT/SUT.
-
-Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
-benchmarking purposes. Any implications for network security arising
-from the DUT/SUT SHOULD be identical in the lab and in production
-networks.
-
-# Acknowledgements
-
-Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
-review and numerous useful comments and suggestions.
-
---- back
diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-05.md b/docs/ietf/draft-ietf-bmwg-mlrsearch-05.md
new file mode 100644
index 0000000000..937e632413
--- /dev/null
+++ b/docs/ietf/draft-ietf-bmwg-mlrsearch-05.md
@@ -0,0 +1,1460 @@
+---
+title: Multiple Loss Ratio Search
+abbrev: MLRsearch
+docname: draft-ietf-bmwg-mlrsearch-05
+date: 2023-10-23
+
+ipr: trust200902
+area: ops
+wg: Benchmarking Working Group
+kw: Internet-Draft
+cat: info
+
+coding: us-ascii
+pi:    # can use array (if all yes) or hash here
+  toc: yes
+  sortrefs:   # defaults to yes
+  symrefs: yes
+
+author:
+      -
+        ins: M. Konstantynowicz
+        name: Maciek Konstantynowicz
+        org: Cisco Systems
+        email: mkonstan@cisco.com
+      -
+        ins: V. Polak
+        name: Vratko Polak
+        org: Cisco Systems
+        email: vrpolak@cisco.com
+
+normative:
+  RFC1242:
+  RFC2285:
+  RFC2544:
+  RFC9004:
+
+informative:
+  TST009:
+    target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf
+    title: "TST 009"
+  FDio-CSIT-MLRsearch:
+    target: https://csit.fd.io/cdocs/methodology/measurements/data_plane_throughput/mlr_search/
+    title: "FD.io CSIT Test Methodology - MLRsearch"
+    date: 2023-10
+  PyPI-MLRsearch:
+    target: https://pypi.org/project/MLRsearch/0.4.1/
+    title: "MLRsearch 0.4.1, Python Package Index"
+    date: 2021-07
+
+--- abstract
+
+This document proposes extensions to [RFC2544] throughput search by
+defining a new methodology called Multiple Loss Ratio search
+(MLRsearch). The main objectives of MLRsearch are to minimize the
+total search duration, to support searching for multiple loss ratios
+and to improve results repeatability and comparability.
+
+The main motivation behind extending [RFC2544] is the new set of challenges
+and requirements posed by evaluating and testing software based networking
+systems, specifically their data planes.
+
+MLRsearch offers several ways to address these challenges, giving user
+configuration options to select their preferred way.
+
+--- middle
+
+{::comment}
+    As we use kramdown to convert from markdown,
+    we use this way of marking comments not to be visible in rendered draft.
+    https://stackoverflow.com/a/42323390
+    If other engine is used, convert to this way:
+    https://stackoverflow.com/a/20885980
+{:/comment}
+
+# Purpose and Scope
+
+The purpose of this document is to describe Multiple Loss Ratio search
+(MLRsearch), a data plane throughput search methodology optimized for software
+networking DUTs.
+
+Applying vanilla [RFC2544] throughput bisection to software DUTs
+results in a number of problems:
+
+- Binary search takes too long as most of trials are done far from the
+  eventually found throughput.
+- The required final trial duration and pauses between trials
+  prolong the overall search duration.
+- Software DUTs show noisy trial results,
+  leading to a big spread of possible discovered throughput values.
+- Throughput requires loss of exactly zero frames, but the industry
+  frequently allows for small but non-zero losses.
+- The definition of throughput is not clear when trial results are inconsistent.
+
+MLRsearch library aims to address these problems by applying the following set
+of enhancements:
+
+- Allow multiple shorter trials instead of one big trial per load.
+  - Optionally, tolerate a percentage of trial result with higher loss.
+- Allow searching for multiple search goals, with differing loss ratios.
+  - Any trial result can affect each search goal in principle.
+- Insert multiple coarse targets for each search goal, earlier ones need
+  to spend less time on trials.
+  - Earlier targets also aim for lesser precision.
+  - Use Forwarding Rate (FR) at maximum offered load
+    [RFC2285] (section 3.6.2) to initialize the initial targets.
+- Take care when dealing with inconsistent trial results.
+  - Reported throughput is smaller than smallest load with high loss.
+  - Smaller load candidates are measured first.
+- Apply several load selection heuristics to save even more time
+  by trying hard to avoid unnecessarily narrow bounds.
+
+Some of these enhancements are formalized as MLRsearch specification,
+the remaining enhancements are treated as implementation details,
+thus achieving high comparability without limiting future improvements.
+
+MLRsearch configuration options are flexible enough to
+support both conservative settings and aggressive settings.
+Where the conservative settings lead to results
+unconditionally compliant with [RFC2544],
+but longer search duration and worse repeatability.
+Conversely, aggressive settings lead to shorter search duration
+and better repeatability, but the results are not compliant with [RFC2544].
+
+No part of [RFC2544] is intended to be obsoleted by this document.
+
+# Identified Problems
+
+This chapter describes the problems affecting usability
+of various preformance testing methodologies,
+mainly a binary search for [RFC2544] unconditionally compliant throughput.
+
+The last chapter will summarize how the problems are addressed,
+the middle chapters provide explanations and definitions needed for that.
+
+## Long Search Duration
+
+Emergence of software DUTs, with frequent software updates and a
+number of different frame processing modes and configurations,
+has increased both the number of performance tests requred to verify DUT update
+and the frequency of running those tests.
+This makes the overall test execution time even more important than before.
+
+In the context of characterising particular DUT's network performance,
+this calls for improving the time efficiency of throughput search.
+A vanilla bisection (at 60sec trial duration for unconditional [RFC2544]
+compliance) is slow, because most trials spend time quite far from the
+eventual throughput.
+
+[RFC2544] does not specify any stopping condition for throughput search,
+so users can trade-off between search duration and achieved precision.
+But, due to logarithmic nature of bisection, even small improvement
+in search duration needs relatively big sacrifice in the precision of the
+discovered throughput.
+
+## DUT in SUT
+
+[RFC2285] defines:
+- DUT as
+  - The network forwarding device to which stimulus is offered and
+    response measured [RFC2285] (section 3.1.1).
+- SUT as
+  - The collective set of network devices to which stimulus is offered
+    as a single entity and response measured [RFC2285] (section 3.1.2).
+
+[RFC2544] specifies a test setup with an external tester stimulating the
+networking system, treating it either as a single DUT, or as a system
+of devices, an SUT.
+
+In case of software networking, the SUT consists nt only of the DUT
+as a software program processing frames, but also of
+a server hardware and operating system functions,
+with server hardware resources shared across all programs
+and the operating system running on the same server.
+
+The DUT is effectively nested within the rest of the SUT.
+
+Due to a shared multi-tenant nature of SUT, DUT is subject to
+possible interference coming from the operating system and any other
+software running on the same server. Some sources of such interference
+can be to some degree eliminated, e.g. by pinning DUT program threads
+to specific CPU cores and isolating those cores to avoid context switching.
+But some level of adverse effects may remain even after
+all such reasonable precautions are applied.
+These effects affect DUT's network performance negatively.
+As the effects are hard to predict in general, they have impact similar to
+what other engineering disciplines define as a noise.
+Thus, all such effects are called an SUT noise.
+
+DUT can also exhibit fluctuating performance itself, for reasons
+not related to the rest of SUT, for example due to pauses in execution
+as needed for internal stateful processing. In many cases this
+may be an expected per-design behavior, as it would be observable even
+in a hypothetical scenario where all sources of SUT noise are eliminated.
+Such behavior affects trial results in a way similar to SUT noise.
+As the two phenomenons are hard to destinguish,
+this document uses the word noise as a shorthand covering both
+this internal DUT performance fluctuations and genuine SUT noise.
+
+A simple model of SUT performance consists of an idealized noiseless performance,
+and additional noise effects. The noiseless performance is assumed to be constant,
+all observed performance variations are due to noise.
+The impact of the noise can vary in time, sometimes wildly,
+even within a single trial.
+The noise can sometimes be negligible, but frequently
+it lowers the observed SUT performance as observed in trial results.
+
+In this model, SUT does not have a single performance value, it has a spectrum.
+One end of the spectrum is the idealized noiseless performance value,
+the other end can be called a noiseful performance. In practice, trial results
+close to the noiseful end of the spectrum happen only rarely.
+The worse the performance value is, the more rarely it is seen in a trial.
+Therefore, the extreme noiseful end of SUT spectrum is not really observable
+among trial results. Also, the extreme noiseless end of SUT spectrum
+is unlikely to be observable, this time because some small noise effects
+are likely to occur multiple times during a trial.
+
+Unless specified otherwise, this document talks about potentially observable
+ends of the SUT performance spectrum, not about the extreme ones.
+
+Focusing on DUT, the benchmarking effort should aim
+at eliminating only the SUT noise from SUT measurements.
+In practice that is not really possible, as based on authors experience
+and available literature, there are no realistic enough models
+able to distinguish SUT noise from DUT fluctuations.
+
+However, assuming that a well-constructed SUT has the DUT as its
+performance bottleneck, the DUT ideal noiseless performance can be defined
+as the noiseless end of SUT performance spectrum. At least for
+throughput. For other performance quantities such as latency there may be an
+additive difference.
+
+Note that by this definition, DUT noiseless performance
+also minimizes the impact of DUT fluctuations, as much as realistically possible
+for a given trial duration.
+
+In this document, we reduce the DUT in SUT problem to estimating
+the noiseless end of SUT performance spectrum from a limited number of
+trial results.
+
+Any improvements to throughput search algorithm, aimed for better
+dealing with software networking SUT and DUT setup, should employ
+strategies recognizing the presence of SUT noise, and allow discovery of
+(proxies for) DUT noiseless performance
+at different levels of sensitivity to SUT noise.
+
+## Repeatability and Comparability
+
+[RFC2544] does not suggest to repeat throughput search. And from just one
+discovered throughput value, it cannot be determined how repeatable that value is.
+In practice, poor repeatability is also the main cause of poor
+comparability, that is different benchmarking teams can test the same SUT
+but get throughput values differing more than expected from search precision.
+
+[RFC2544] throughput requirements (60 seconds trial and
+no tolerance of a single frame loss) affect the throughput results
+in the following way.
+The SUT behavior close to the noiseful end of its performance spectrum
+consists of rare occasions of significantly low performance,
+but the long trial duration makes those occasions not so rare on the trial level.
+Therefore, the binary search results tend to wander away from the noiseless end
+of SUT performance spectrum, more frequently and more widely than shorter
+trials would, thus resulting in poor throughput repeatability.
+
+The repeatability problem can be addressed by defining a search procedure
+which reports more stable results,
+even if they can no longer be called throughput in [RFC2544] sense.
+According to the SUT performance spectrum model, better repeatability
+will be at the noiseless end of the spectrum.
+Therefore, solutions to the DUT in SUT problem
+will help also with the repeatability problem.
+
+Conversely, any alteration to [RFC2544] throughput search
+that improves repeatability should be considered
+as less dependent on the SUT noise.
+
+An alternative option is to simply run a search multiple times, and report some
+statistics (e.g. average and standard deviation). This can be used
+for a subset of tests deemed more important,
+but it makes the search duration problem even more pronounced.
+
+## Throughput with Non-Zero Loss
+
+[RFC1242] (section 3.17) defines throughput as:
+    The maximum rate at which none of the offered frames
+    are dropped by the device.
+
+Then, it says:
+    Since even the loss of one frame in a
+    data stream can cause significant delays while
+    waiting for the higher level protocols to time out,
+    it is useful to know the actual maximum data
+    rate that the device can support.
+
+Contrary to that, many benchmarking teams settle with small, non-zero
+loss ratio as the goal for a their load search.
+
+Motivations are many:
+
+- Modern protocols tolerate frame loss better,
+  compared to the time when [RFC1242] and [RFC2544] were specified.
+
+- Trials nowadays send way more frames within the same duration,
+  increasing the chance small SUT performance fluctuatios
+  is enough to cause frame loss.
+
+- Small bursts of frame loss caused by noise have otherwise smaller impact
+  on the average frame loss ratio ovserved in the trial,
+  as during other parts of the same trial the SUT may work mroe closely
+  to its noiseless performance, thus perhaps lowering the trial loss ratio
+  below the goal loss ratio value.
+
+- If an approximation of the SUT noise impact on the trial loss ratio is known,
+  it can be set as the goal loss ratio.
+
+Regardless of validity of any and all similar motivations,
+support for non-zero loss goals makes any search algorithm more user friendly.
+[RFC2544] throughput is not user friendly in this regard.
+
+Assuming users are allowed to specify the goal loss ratio value,
+the usefulness is enhanced even more if users can specify multiple
+loss ratio values, especially when a single search can find all relevant bounds.
+
+Searching for multiple search goals also helps to describe the SUT performance
+spectrum better than a single search goal result.
+For example, repeated wide gap between zero and non-zero loss loads
+indicates the noise has a large impact on the observed performance,
+which is not evident from a single goal load search procedure result.
+
+It is easy to modify the vanilla bisection to find a lower bound
+for intended load that satisfies a non-zero goal loss ratio.
+But it is not that obvious how to search for multiple goals at once,
+hence the support for multiple search goals remains a problem.
+
+## Inconsistent Trial Results
+
+While performing throughput search by executing a sequence of
+measurement trials, there is a risk of encountering inconsistencies
+between trial results.
+
+The plain bisection never encounters inconsistent trials.
+But [RFC2544] hints about possibility of inconsistent trial results,
+in two places in its text.
+The first place is section 24, where full trial durations are required,
+presumably because they can be inconsistent with results
+from shorter trial durations.
+The second place is section 26.3, where two successive zero-loss trials
+are recommended, presumably because after one zero-loss trial
+there can be subsequent inconsistent non-zero-loss trial.
+
+Examples include:
+
+- A trial at the same load (same or different trial duration) results
+  in a different trial loss ratio.
+- A trial at higher load (same or different trial duration) results
+  in a smaller trial loss ratio.
+
+Any robust throughput search algorithm needs to decide how to continue
+the search in presence of such inconsistencies.
+Definitions of throughput in [RFC1242] and [RFC2544] are not specific enough
+to imply a unique way of handling such inconsistencies.
+
+Ideally, there will be a definition of a new quantity which both generalizes
+throughput for non-zero-loss (and other possible repeatibility enhancements),
+while being precise enough to force a specific way to resolve trial result
+inconsistencies.
+But until such definition is agreed upon, the correct way to handle
+inconsistent trial results remains an open problem.
+
+# MLRsearch Specification
+
+This chapter focuses on technical definitions needed for evaluating
+whether a particular test procedure adheres to MLRsearch specification.
+
+For motivations, explanations, and other comments see other chapters.
+
+## MLRsearch Architecture
+
+MLRsearch architecture consists of three main components:
+the manager, the controller and the measurer.
+Presence of other components (mainly the SUT) is also implied.
+
+While the manager and the measurer can be seen a abstractions
+present in any testing procedure, the behavior of the controller
+is what distinguishes MLRsearch algorithms from other search procedures.
+
+### Measurer
+
+The measurer is the component which performs one trial
+as described in [RFC2544] section 23, when requested by the controller.
+
+Specifically, one call to the measurer accepts a trial load value
+and trial duration value, performs the trial, and returns
+the measured trial loss ratio, and optionally a different duration value.
+
+It is responsibility of the measurer to uphold any requirements
+and assumptions present in MLRsearch specification
+(e.g. trial forwarding ratio not being larger than one).
+Implementers have some freedom, for example in the way they deal with
+duplicated frames, or what to return if tester sent zero frames towards SUT.
+Implementations are RECOMMENDED to document their behavior
+related to such freedoms in as detailed way as possible
+
+Implemenations MUST document any deviations from RFC documents,
+for example if the wait time around traffic
+is shorter than what [RFC2544] section 23 specifies.
+
+### Controller
+
+The controller is the component of MLRsearch architecture
+that is called by the manager (just once), calls the measurer
+(usually multiple times in succession),
+and returns the result of the search to the manager.
+
+The only required argument in the call to the controller
+is a list of search goals. For the structure of the search result,
+see subsection Search Result.
+
+### Manager
+
+The manager is the component that initializes SUT, traffic generator
+(tester in [RFC2544]), the measurer and the controller
+with intended configurations. It then hands over the execution
+to the controller and receives its result.
+
+Creation of reports of appropriate format can also be understood
+as the responsibility of the manager.
+
+## Units
+
+The specification deals with physical quantities, so it is assumed
+each numeric value is accompanied by an appropriate physical unit.
+
+The specification does not state which unit is appropriate,
+but implementations MUST make it explicit which unit is used
+for each value provided or received by the user.
+
+For example, load quantities (including the conditional throughput)
+returned by the controller are defined to be based on single-interface
+(unidirectional) loads. For bidirectional traffic, users are likely
+to expect bidirectional throughput quantities, so the manager is responsible
+for making its report clear.
+
+## SUT
+
+As defined in [RFC2285]:
+The collective set of network devices to which stimulus is offered
+as a single entity and response measured.
+
+## Trial
+
+A trial is the part of test described in [RFC2544] section 23.
+
+### Trial Load
+
+Trial load is the intended constant load for a trial.
+
+Load is the quantity implied by Constant Load of [RFC1242],
+Data Rate of [RFC2544] and Intended Load of [RFC2285].
+All three specify this value applies to one (input or output) interface.
+
+### Trial Duration
+
+Trial duration is the intended duration of the traffic for a trial.
+
+In general, this general quantity does not include any preparation nor waiting
+described in section 23 of [RFC2544].
+
+However, the measurer MAY return a duration value different
+from the intended duration. This may be useful for users
+who want to control the overal search duration, not just the traffic part of it.
+The manager MUST report how does the measurer computes the returned duration
+values in that case.
+
+### Trial Forwarding Ratio
+
+Trial forwarding ratio is dimensionless floating point value,
+assumed to be between 0.0 and 1.0, both including.
+It is computed as the number of frames forwarded by SUT, divided by
+the number of frames that should have been forwarded during the trial.
+
+Note that, contrary to load, frame counts used to compute
+trial forwarding ratio are aggregates over all SUT ports.
+
+Questions around what is the correct number of frames
+that should have been forwarded it outside of the scope of this document.
+E.g. what should the measurer return when it detects
+that the offered load differs significantly from the intended load.
+
+It is RECOMMENDED implementations return an irregular goal result
+if they detect questionable (in comparability sense) trial results
+affecting their goal result.
+
+### Trial Loss Ratio
+
+Trial loss ratio is equal to one minus the trial forwarding ratio.
+
+### Trial Forwarding Rate
+
+The trial forwarding rate is the trial load multiplied by
+the trial forwarding ratio.
+
+Note that this is very similar, but not identical to Forwarding Rate
+as defined in [RFC2285] section 3.6.1, as that definition
+is specific to one output interface, while trial forwarding ratio
+is based on frame counts aggregated over all SUT interfaces.
+
+## Traffic profile
+
+Any other specifics (besides trial load and trial duration)
+the measurer needs to perform the trial are understood as a composite
+called the traffic profile.
+All its attributes are assumed to be constant during the search,
+and the composite is configured on the measurer by the manager
+before the search starts.
+
+Traffic profile is REQUIRED by [RFC2544] to contain some specific quantities,
+for example frame size.
+Several more specific quantities may be RECOMMENDED.
+
+Depending on SUT configuration, e.g. when testing specific protocols,
+additional values need to be included in the traffic profile
+and in the test report. See other IETF documents.
+
+## Search Goal
+
+A search goal is one item of the list required as an argument
+when the manager calls the controller.
+
+Each search goal is composite consisting of several attributes,
+some of them are required.
+Implementations are free to add their own attributes.
+
+Subsections list all required attributes and one recommended attribute.
+
+The meaning of the attributes is formally given only by their effect
+on the computation of the attributes of the goal result.
+The subsections do contain a short informal description,
+but see other chapters for more in-depth explanations.
+
+### Goal Final Trial Duration
+
+A threshold value for trial durations.
+REQUIRED attribute, MUST be positive.
+
+Informally, the conditional throughput for this goal will be computed
+only from trial results from trials as long as this.
+
+### Goal Duration Sum
+
+A threshold value for a particular sum of trial durations.
+REQUIRED attribute, MUST be positive.
+
+This uses the duration values returned by the measurer,
+in case it returns something else than the intended durations
+for the traffic part of the search.
+
+Informally, even at looking only at trials done at this goal's
+final trial duration, MLRsearch may spend up to this time measuring
+the same load value.
+
+### Goal Loss Ratio
+
+A threshold value for trial loss ratios.
+REQUIRED attribute, MUST be non-negative and smaller than one.
+
+Informally, if a load causes too many trials with trial results larger than this,
+the conditional throughput for this goal will be smaller than that load.
+
+### Goal Exceed Ratio
+
+A threshold value for particular ratio of duration sums.
+REQUIRED attribute, MUST be non-negative and smaller than one.
+
+This uses the duration values returned by the measurer,
+in case it returns something else than the intended durations
+for the traffic part of the search.
+
+Informally, this acts as the q-value for a quantile when selecting
+the forwarding rate of which trial result becomes the conditional throughput.
+For example, when the goal exceed ratio is 0.5 and MLRsearch
+happened to use the whole goal duration sum when determining conditional
+throughput, it means the conditional throughput is the median
+of trial forwarding rates.
+In practice, MLRsearch may stop measuring a load before the goal duration sum
+is reached, and the conditional throughput in that case frequently
+is the worst trial still not exceeding the goal loss ratio.
+If the goal duration sum is no larger than the goal fina trial duration,
+MLRsearch performs only one trial per load (unless other goals need more)
+and the goal exceed ratio has no effect on the search result.
+
+### Goal Width
+
+A value used as a threshold for telling when two trial load values
+are close enough.
+
+RECOMMENDED attribute, positive. Implementations without this attribute
+MUST give the manager other ways to control the search exit condition.
+
+Absolute load difference and relative load difference are two popular choices,
+but implementations may choose a different way to specify width.
+
+Informally, this acts as a stopping condition, controling the precision
+of the search. The search stops if every goal has reached its precision.
+
+## Search Result
+
+Search result is a single composite object returned from the controller
+to the manager.
+It is a mapping from the search goals (see section Search Goal) into goal results
+(see section Goal Result).
+The mapping MUST map from all the search goals present in the controller input.
+
+Each search goal instance is mapped to a goal result instance.
+Multiple search goal instances may map to the same goal result instance.
+
+## Goal Result
+
+Goal result is a composite object consisting of several attributes.
+All attributes are related to the same search goal, the one search goal instance
+the Search Result is mapping into this instance of the Goal Result.
+
+Some of the attributes are required, some are recommended,
+implementations are free to add their own.
+
+The subsections define attributes for regular goal result.
+Implementations are free to define their own irregular goal results,
+but the manager MUST report them clearly as not regular according to this section.
+
+A typical irregular result is when all trials at maximum offered load
+have zero loss, as the relevant upper bound does not exist in that case.
+
+### Relevant Upper Bound
+
+Relevant upper bound is the intended load value that is classified
+at the end of the search as the relevant upper bound (see Appendix A)
+for this goal.
+This is a REQUIRED attribute.
+
+Informally, this is the smallest intended load that failed to uphold
+all the requirements of this search goal, mainly the goal loss ratio
+in combination with the goal exceed ratio.
+
+### Relevant Lower Bound
+
+Relevant lower bound is the intended load value that got classified
+(after all trials) as the relevant lower bound (see Appendix A) for this goal.
+This is a REQUIRED attribute.
+
+The distance between the relevant lower bound and the relevant upper bound
+MUST NOT be larger than the goal width, for a regular goal result,
+if the implementation offers width as a goal attribute.
+
+Informally, this is the smallest intended load that managed to uphold
+all the requirements of this search goal, mainly the goal loss ratio
+in combination with the goal exceed ratio, while not being larger
+than the relevant upper bound.
+
+### Conditional Throughput
+
+The conditional throughput (see Appendix B) as evaluated
+at the relevant lower bound.
+This is a RECOMMENDED attribute.
+
+Informally, a typical forwarding rate expected to be seen
+at the relevant lower bound. But frequently just a conservative estimate thereof,
+as MLRsearch implementations tend to stop gathering more data
+as soon as they confirm this estimate cannot get worse within
+the goal duration sum.
+
+# MLRsearch Explanations
+
+This chapter focuses on intuitions and motivations
+and skips over some important details.
+
+Familiarity with the MLRsearch specification is not required here,
+so this chapter can act as an introduction.
+For example, this chapter start talking about tightest lower bounds
+before it is ready to talk about the relevant lower bound from the specification.
+
+## MLRsearch Versions
+
+The MLRsearch algorithm has been developed in a code-first approach,
+a Python library has been created, debugged and used in production
+before first descriptions (even informal) were published.
+In fact, multiple versions of the library were used in production
+over past few years, and later code was usually not compatible
+with earlier descriptions.
+
+The code in (any version of) MLRsearch library fully determines
+the search process (for given configuration parameters),
+leaving no space for deviations.
+MLRsearch as a name for a broad class of possible algorithms
+leaves plenty of space for future improvements, at the cost
+of poor comparability of results of different MLRsearch implementations.
+
+This document aspires to prescribe a MLRsearch specification
+in a way that restricts the important parts related to comparability,
+while leaving other parts vague enough so implementations can improve freely.
+
+## Exit Condition
+
+[RFC2544] prescribes that after performing one trial at a specific offered load,
+the next offered load should be larger or smaller, based on frame loss.
+
+The usual implementation uses binary search. Here a lossy trial becomes
+a new upper bound, a lossless trial becomes a new lower bound.
+The span of values between the tightest lower bound and the tightest upper bound
+forms an interval of possible results,
+and after each trial the width of that interval halves.
+
+Usually the binary search implementation tracks only the two tightest bounds,
+simply calling them bounds, but the old values still remain valid bounds,
+just not as tight as the new ones.
+
+After some number of trials, the tightest lower bound becomes the throughput.
+[RFC2544] does not specify when (if ever) should the search stop.
+
+MLRsearch library introduces a concept of goal width. The search stops
+when the distance between the tightest upper bound and the tightest lower bound
+is smaller than a user-configured value called goal width from now on.
+In other words, interval width has to be smaller than goal width
+at the end of the search.
+
+This goal width value therefore determines the precision of the result.
+As MLRsearch specification requires a particular structure of the result,
+the result itself does contain enough information to determine its precision,
+thus it is not required to report the goal width value.
+
+This allows MLRsearch implementations to use exit conditions
+different from goal width.
+The MLRsearch specification only REQUIRES the search procedure
+to always finish in a finite time, regardless of possible trial results.
+
+## Load Classification
+
+MLRsearch keeps the basic logic of binary search (tracking tightest bounds,
+measuring at the middle), perhaps with minor technical clarifications.
+The algorithm chooses an intended load (as opposed to offered load),
+the interval between bounds does not need to be split exactly in two equal halves,
+and the final reported structure specifies both bounds
+(optionally also the conditional throughput at the lower bound, defined later).
+
+The biggest difference is that in order to classify a load
+as an upper or lower bound, MLRsearch may need more than one trial
+(depending on configuration options) to be performed at the same intended load.
+
+As a consequence, even if a load already does have few trial results,
+it still may be classified as undecided, neither a lower bound nor an upper bound.
+
+Explanation of the classification logic is given in the next chatper,
+as it relies heavily on other sections of this chapter.
+
+For repeatability and comparability reasons, it is important that
+given a set of trial results, all implementations of MLRsearch
+classify the load in an equivalent way.
+
+## Loss Ratios
+
+Next difference is in goals of the search. [RFC2544] has a single goal,
+based on classifying full-length trials as either loss-less or lossy.
+
+As the name suggests, MLRsearch can seach for multiple goals, differing in their
+loss ratios. Precise definition of goal loss ratio will be given later.
+The [RFC2544] throughput goal then simply becomes a zero goal loss ratio.
+Different goals also may have different goal width.
+
+A set of trial results for one specific intended load value
+can classify the load as an upper bound for some goals, but a lower bound
+for some other goals, and undecided for the rest of the goals.
+
+Therefore, the load classification depends not only on ttrial results,
+but also on the goal. The overall search procedure becomes more complicated
+(compared to binary search with a single goal),
+but most of the complications do not affect the final result,
+except for one phenomenon, loss inversion.
+
+## Loss Inversion
+
+In [RFC2544] throuhput search using bisection, any load with lossy trial
+becomes a hard upper bound, meaning every subsequent trial has smaller
+intended load.
+
+But in MLRsearch, a load that is classified as an upper bound for one goal
+may still be a lower bound for another goal, and due to that other goal
+MLRsearch will probably perform trials at even higher loads.
+What to do when all such higher load trials happen to have zero loss?
+Does it mean the earlier upper bound was not real?
+Does it mean the later lossless trials are not considered a lower bound?
+Surely we do not want to have an upper bound at a load smaller than a lower bound.
+
+MLRsearch is conservative in these situations.
+The upper bound is considered real, and the lossless trials at higher loads
+are considered to be a coincidence, at least when computing the final result.
+
+This is formalized using new notions, the relevant upper bound and
+the relevant lower bound.
+Load classification is still based just on the set of trial results
+at a given intended load (trials at other loads are ignored),
+making it possible to have a lower load classified as an upper bound
+and a higher load classified as a lower bound (for the same goal).
+The relevant upper bound (for a goal) is the smallest load classified
+as an upper bound. But the relevant lower bound is not simply
+the largest among lower bounds. It is the largest load among loads
+that are lower bounds while also being smaller than the relevant upper bound.
+
+With these definitions, the relevant lower bound is always smaller
+than the relevant upper bound (if both exist), and the two relevant bounds
+are used analogously as the two tightest bounds in the binary search.
+When they are less than goal width apart, the relevant bounds are used in output.
+
+One consequence is that every trial result can have an impact on the search result.
+That means if your SUT (or your traffic generator) needs a warmup,
+be sure to warm it up before starting the search.
+
+## Exceed Ratio
+
+The idea of performing multiple trials at the same load comes from
+a model where some trial results (those with high loss) are affected
+by infrequent effects, causing poor repeatability of [RFC2544] throughput results.
+See the discussion about noiseful and noiseless ends of SUT performance spectrum.
+Stable results are closer to the noiseless end of SUT preformance spectrum,
+so MLRsearch may need to allow some frequency of high-loss trials
+to ignore the reare but big effects near the noisefull end.
+
+MLRsearch is able to do such trial result filtering, but it needs
+a configuration option to tell it how much frequent can the infrequent big loss be.
+This option is called exceed ratio. It tells MLRsearch what ratio of trials
+(more exactly what ratio of trial seconds) can have trial loss ratio
+larger than goal loss ratio and still be classified as a lower bound.
+Zero exceed ratio means all trials have to have trial loss ratio
+equal to or smaller than the goal loss ratio.
+
+For explainability reasons, the RECOMMENDED value for exceed ratio is 0.5,
+as it simplifies some later concepts by relating them to the concept of median.
+
+## Duration Sum
+
+When more than one trial is needed to classify a load,
+MLRsearch also needs something that controlls the number of trials needed.
+Therefore, each goal also has an attribute called duration sum.
+
+The meaning of a goal duration sum is that when a load has trials
+(at full trial duration, details later)
+whose trial durations when summed up give a value at least this,
+the load is guaranteed to be classified as an upper bound or a lower bound
+for the goal.
+
+As the duration sum has a big impact on the overall search duration,
+and [RFC2544] prescibes wait intervals around trial traffic,
+the MLRsearch algorithm may sum durations that are different
+from the actual trial traffic durations.
+
+## Short Trials
+
+Section 24 of [RFC2544] already anticipates possible time savings
+when short trials (shorter than full length trials) are used.
+
+MLRsearch requires each goal to specify its final trial duration.
+Full-length trial is the short name for a trial whose intended trial duration
+is equal to the goal final trial duration.
+
+Any MLRsearch implementation may include its own configuration options
+which control when and how MLRsearch chooses to use shorter trial durations.
+
+For explainability reasons, when exceed ratio of 0.5 is used,
+it is recommended for the goal duration sum to be an odd multiple
+of the full trial durations, so conditional throughput becomes identical to
+a median of a particular set of forwarding rates.
+
+Presence of shorter trial results complicates the load classification logic.
+Full details are given later. In short, results from short trials
+may cause a load to be classified as an upper bound.
+This may cause loss inversion, and thus lower the relevant lower bound
+(below what would classification say when considering full-length trials only).
+
+For explainability reasons, it is RECOMMENDED users use such configurations
+that guarantee all trials have the same length.
+Alas, such configurations are usually not compliant with [RFC2544] requirements,
+or not time-saving enough.
+
+## Conditional Throughput
+
+As testing equipment takes intended load as input parameter
+for a trial measurement, any load search algorithm needs to deal
+with intended load values internally.
+
+But in presence of goals with non-zero loss ratio, the intended load
+usually does not match the user intuition of what a throughput is.
+The forwarding rate (as defined in [RFC2285] section 3.6.1) is better,
+but it is not obvious how to generalize it
+for loads with multiple trial results,
+especially with non-zero goal exceed ratio.
+
+MLRsearch defines one such generalization, called the conditional throughput.
+It is the forwarding rate from one of the trials performed at the load
+in question. Specification of which trial exactly is quite technical.
+More detailed explanations are given in the next chapter.
+
+Conditional throughput is partially related to load classification.
+If a load is classified as a lower bound for a goal,
+the conditional throughpt can be calculated,
+and guaranteed to show effective loss ratio no larger than goal loss ratio.
+
+While the conditional throughput gives more intuitive-looking values
+than the relevant lower bound, especially for non-zero goal loss ratio values,
+the actual definition is more complicated than the definition of the relevant
+lower bound. In future, other intuitive values may become popular,
+but they are unlikely to supersede the definition of the relevant lower bound
+as the most fitting value for comparability purposes,
+therefore the relevant lower bound remains a required attribute
+of the goal result structure.
+
+Note that comparing best and worst case, the same relevant lower bound value
+may result in the conditional throughput differing up to the goal loss ratio.
+Therefore it is rarely needed to set the goal width (if expressed
+as relative difference of loads) below the goal loss ratio.
+In other words, setting the goal width below the goal loss ratio
+may cause the conditional throughput for a larger loss ratio to became smaller
+than a conditional throughput for a goal with a smaller goal loss ratio,
+which is counter-intuitive, considering they come from the same search.
+Therefore it is RECOMMENDED to set the goal width to a value no smaller
+than the goal loss ratio.
+
+## Search Time
+
+The main motivation for MLRsearch was to have an algorithm that spends less time
+finding a throughput, either the [RFC2544] compliant one,
+or some generalization thereof. The art of achieving short search times
+is mainly in smart selection of intended loads (and intended durations)
+for the next trial to perform.
+
+While there is an indirect impact of the load selection on the reported values,
+in practice such impact tends to be small,
+even for SUTs with quite broad performance spectrum.
+
+A typical example of two approaches to load selection leading to different
+relevant lower bounds is when the interval is split in a very uneven way.
+An implementation chosing loads very close to the current relevant lower bound
+are quite likely to eventually stumble upon a trial result
+with poor performance (due to SUT noise).
+For an implementation chosing load very close to the current relevant upper bound
+this is unlikely, as it examines more loads that can see a performance
+close to the noiseless end of the SUT performance spectrum.
+The reason why it is unlikely to have two MLRsearch implementation showing
+this kind of preference in load selection is precisely
+in the desire to have short searches.
+Even splits are the best way to achive the desired precision,
+so the more optimized a search algorithm is for the overall search duration,
+the better the repeatability and comparability
+of its results will be, assuming the user configuration remains the same.
+
+Therefore, this document remains quite vague on load selection
+and other optimisation details, and configuration attributes related to them.
+Assuming users prefer libreries that achieve short overall search time,
+the definition of the relevant lower bound
+should be strict enough to ensure result repeatability
+and comparability between different implementations,
+while not restricting future implementations much.
+
+Sadly, different implementations may exhibit their sweet spot of
+best repeatability at given search duration at different goals attribute values,
+especially with respect to optional goal attributes
+such as initial trial duration.
+Thus, this document does not comment much on which configurations
+are good for comparability between different implementations.
+For comparability between different SUTs using the same implementation,
+refer to configurations recommended by that particular implementation.
+
+## [RFC2544] compliance
+
+The following search goal ensures unconditional compliance with
+[RFC2544] throughput search procedure:
+
+- Goal loss ratio: zero.
+
+- Goal final trial duration: 60 seconds.
+
+- Goal duration sum: 60 seconds.
+
+- Goal exceed ratio: zero.
+
+Presence of other search goals does not affect compliance of this goal result.
+The relevant lower bound and the conditional throughput are in this case
+equal to each other, and the value is the [RFC2544] throughput.
+
+If the 60 second quantity is replaced by a smaller quantity in both attributes,
+the conditional throughput is still conditionally compliant with
+[RFC2544] throughput.
+
+# Selected Functional Details
+
+This chapter continues with explanations,
+but this time more precise definitions are needed
+for readers to follow the explanations.
+The definitions here are wordy, implementers can look into the next chapter
+for more concise definitions.
+
+The two areas of focus in this chapter are the load classification
+and the conditional throughput, starting with the latter.
+
+## Performance Spectrum
+
+There are several equivalent ways to define the conditional throughput computation.
+One of the ways relies on an object called the performance spectrum.
+First, two heavy definitions are needed.
+
+Take an intended load value, and a finite set of trial results, all trials
+measured at that load value. The performance spectrum is the function that maps
+any non-negative real number into a sum of trial durations among all trials
+in the set that have that number as their forwarding rate,
+e.g. map to zero if no trial has that particular forwarding rate.
+
+A related function, defined if there is at least one trial in the set,
+is the performance spectrum divided by sum of durations of all trials in the set.
+That function is called the performance probability function, as it satisfies
+all the requirements for probability mass function function
+of a discrete probability distribution,
+the one-dimensional random variable being the trial forwarding rate.
+
+These functions are related to the SUT performance spectrum,
+as sampled by the trials in the set.
+
+As for any other probability function, we can talk about percentiles,
+of the performance probability function, and bout other quantiles
+such as the median. The conditional throughput will be
+one such quantile value for a specifically chosen set of trials.
+
+Take a set of all full-length trials performed at the load in question.
+The sum of durations of those trials may be less than goal duration sum, or not.
+If it is less, add an imaginary trial result with zero forwarding rate
+such that the new sum of durations is equal to the goal duration sum.
+This is the set of trials to use. The q-value for the quantile
+is the goal exceed ratio. If the quantile touches two trials,
+the larger forwarding rate is used.
+
+First example. For zero exceed ratio when goal duration sum
+has been reached. The conditional throughput is the smallest forwarding
+rate among the trials.
+
+Second example. For zero exceed ratio when goal duration sum
+has not been reached yet. Due to the missing duration sum,
+the worst case may still happen, so the conditional througput
+is zero. This is not reported to the user, as this load
+cannot become the relevant lower bound yet.
+
+Third example. Exceed ratio 50%, goal duration sum two seconds,
+one trial present with duration one second and zero loss.
+An imaginary trial is added with duration one second and zero forwarding rate.
+Median would touch both trials, so the conditional throughput
+is the forwarding rate of the one non-imaginary trial.
+As that had zero loss, the value is equal to the offered load.
+
+The classification does not need the whole performance spectrum,
+only few duration sums.
+
+A trial is called bad (according to a goal) if its trial loss ratio
+is larger than the goal loss ratio. Trial that is not bad is called good.
+
+## Single Trial Duration
+
+When goal attributes are chosen in such a way that every trial has the same
+intended duration, the load classification is sipler.
+
+The following description looks technical, but it follows the motivation
+of goal loss ratio, goal exceed ratio and goal duration sum.
+If sum of durations of all trials (at given load) is less than the goal
+duration sum, imagine best case scenario (all subsequent trials having zero loss)
+and worst case scenario (all subsequent trials having 100% loss).
+Here we assume there is as many subsequent trials as needed
+to make the sum of all trials to become equal to the goal duration sum.
+As the exceed ratio is defined just using sums of durations
+(number of trials does not matter), it does not matter whether
+the "subsequent trials" can consist of integer number of full-length trials.
+
+If even in the best case scenario the load exceed ratio would be larger
+than the goal exceed ratio, the load is an upper bound.
+If even in the worst case scenario the load exceed ratio would not be larger
+than the goal exceed ratio, the load is a lower bound.
+
+Even more specifically.
+Take all trials measured at a given load.
+Sum of durations of all bad full-length trials is called the bad sum.
+Sum of durations of all good full-length trials is called the good sum.
+The result of adding bad sum plus the good sum is called the measured sum.
+The larger of the measured sum and the goal duration sum is called the whole sum.
+The whole sum minus the measured sum is called the missing sum.
+Optimistic exceed ratio is the bad sum divided by the whole sum.
+Pessimistic exceed ratio is the bad sum plus the missing sum, that divided by
+the whole sum.
+If optimistic exceed ratio is larger than the goal exceed ratio,
+the load is classified as an upper bound.
+If pessimistic exceed ratio is not larger than the goal exceed ratio,
+the load is classified as a lower bound.
+Else, the load is classified as undecided.
+
+The definition of pessimistic exceed ratio matches the logic in
+the conditional throughput computation, so a load is a lower bound
+if and only if the conditional throughput effective loss ratio
+is not larger than the goal loss ratio.
+If it is larger, the load is either an upper bound or undecided.
+
+## Short Trial Scenarios
+
+Trials with intended duration smaller than the goal final trial duration
+are called short trials.
+The motivation for load classification logic in presence of short trials
+is based around a counter-factual case: What would the trial result be
+if a short trial has been measured as a full-length trial instead?
+
+There are three main scenarios where human intuition guides
+the intended behavior of load classification.
+
+Scenario one. The user had their reason for not configuring shorter goal
+final trial duration. Perhaps SUT has buffers that may get full at longer
+trial durations. Perhaps SUT shows periodic decreases of performance
+the user does not want to treat as noise. In any case, many good short trials
+may became bad full-length trial in the counter-factual case.
+In extreme case, there are no bad short trials.
+In this scenario, we want the load classification NOT to classify the load
+as a lower bound, despite the abundance of good short trials.
+Effectively, we want the good short trials to be ignored, so they
+do not contribute to comparisons with the goal duration sum.
+
+Scenario two. When there is a frame loss in a short trial,
+the counter-factual full-length trial is expected to lose at least as many
+frames. And in practice, bad short trials are rarely turning into
+good full-length trials. In extreme case, there are no good short trials.
+In this scenario, we want the load classification
+to classify the load as an upper bound just based on abundance
+of short bad trials. Effectively we want the bad short trials
+to contribute to comparisons with the goal duration sum,
+so the load can be classified sooner.
+
+Scenario three. Some SUTs are quite indifferent to trial duration.
+Performance probability function constructed from short trial results
+is likely to be similar to performance probability function constructed
+from full-length trial results (perhaps with smaller dispersion,
+but overall without big impact on the median quantiles).
+For moderate goal exceed ratio values, this may mean there are both
+good short trials and bad short trials.
+This scenario is there just to invalidate a simple heuristic
+of always ignoring good short trials and never ignoring bad short trials.
+That simple heuristic would be too biased. Yes, the short bad trials
+are likely to turn into full-length bad trials in the counter-factual case,
+but there is no information on what would the good short trials turn into.
+The only way to decide is to do more trials at full length,
+the same as in scenario one.
+
+## Short Trial Logic
+
+MLRsearch picks a particular logic for load classification
+in presence of short trials, but it is still RECOMMENDED to use configurations
+that imply no short trials, so the possible inefficiencies in that logic
+do not affect the result, and the result has better explainability.
+
+With thas said, the logic differs from the single trial duration case
+only in different definition of bad sum.
+Good sum is still the sum across all good full-length trials.
+
+Few more notions are needed for definig the new bad sum.
+Sum of durations of all bad full-length trials is called the bad long sum.
+Sum of durations of all bad short trials is called the bad short sum.
+Sum of durations of all good short trials is called the good short sum.
+One minus the goal exceed ratio is called the inceed ratio.
+The goal exceed ratio divided by the inceed ratio is called the exceed coefficient.
+The good short sum multiplied by the exceed coefficient is called the balancing sum.
+The bad short sum minus the balancing sum is called the excess sum.
+If the excess sum is negative, the bad sum is equal to the bad long sum.
+Else, the bad sum is equal to the bad long sum plus the excess sum.
+
+Here is how the new definition of the bad sum fares in the three scenarios,
+where the load is close to what would relevant bounds be
+if only full-length trials were used for the search.
+
+Scenario one. If duration is too short, we expect to see higher frequency
+of good short trials. This could lead to negative excess sum,
+which has no impact, hence the load classification is given just by
+full-length trials.
+Thus, MLRsearch using too short trials has no detrimental effect
+on result comparability in this scenario.
+But also using short trials does not help with overall search duration,
+proably making it worse.
+
+Scenario two. Settings with small exceed ratio have small exceed coefficient,
+so the impact of good short sum is small and the bad short sum
+is almost wholly converted into excess sum, thus bad short trials
+have almost as big impact as full-length bad trials.
+The same conclusion applies for moderate exceed ratio values
+when the good short sum is small.
+Thus, short trials can cause a load to get classified as an upper bound earlier
+bringing time savings (while not affecting comparability).
+
+Scenario three. Here excess sum is small in absolute value, as balancing sum
+is expected to be be similar to the bad short sum.
+Once again, full-length trials are needed for final load classification,
+but usage of short trials probably means MLRsearch needed shorter search time
+before selecting this load for measurement, bringing time savings
+(while not affecting comparability).
+
+## Longer Trial Durations
+
+If there are trial results with intended duration larger
+than the goal trial duration, the classification logic is intentionally undefined.
+
+The implementations MAY treat such longer trials as if they were full-length.
+In any case, presence of such longer trials in either the relevant lower bound
+or the relevant upper bound SHOULD be mentioned, as for sume SUTs
+it is likely to affect comparability.
+
+
+TODO: Here will be a chapter summarizing how MLRsearch library
+adresses the problems from the Identified Problems chapter.
+
+{::comment}
+
+    # Problems after MLRsearch
+
+    Now when MLRsearch is clearly specified and explained,
+    it is possible to summarize how does MLRsearch specification help with problems.
+
+    Here, "multiple trials" is a shorthand for having the goal final trial duration
+    significantly smaller than the goal duration sum.
+    This results in MLRsearch performing multiple trials at the same load,
+    which may not be the case with other configurations.
+
+    ## Long Test Duration
+
+    As shortening the overall search duration is the main motivation
+    of MLRsearch library development, the library implements
+    multiple improvements on this front, both big and small.
+    Most of implementation details are not part of the MLRsearch specification,
+    so that future implementations may keep shortening the search duration even more.
+
+    TODO: The rest is about attributes.
+
+    From the required goal attributes, the goal duration sum
+    remains the best way to get even shorter searches.
+
+    Usage of multiple trials can also save time,
+    depending on wait times around trial traffic.
+
+    The farther the goal exceed ratio is from 0.5 towards either side,
+    the less predictable the overal search duration becomes in practice.
+
+    Width parameter does not change search duration much in practice
+    (compared to other, mainly optional goal attributes).
+
+    ## DUT in SUT
+
+    Practice shows big improvements when multiple trials
+    and moderate exceed ratios are used. Mainly when it comes to result
+    repeatability, as sometimes it is not easy to distinguish
+    SUT noise from DUT instability.
+
+    Conditional throughput has intuitive meaning when described
+    using the performance spectrum, so this is an improvement,
+    especially when compared to search procedures which use non-zero
+    goal loss ratio but return only the intended load at a lower bound.
+
+    Multiple trials can save time also when the noisy end of
+    the preformance spectrum needs to be examined, e.g. for [RFC9004].
+
+    Under some circumstances, testing the same DUT and SUT setup with different
+    DUT configurations can give some hints on what part of noise us SUT noise
+    and what part is DUT performance fluctuations.
+    In practice, both types of noise tend to be too complicated for that analysis.
+    MLRsearch does not offer additional tools in this regard,
+    apart of giving users ability to search for more goals,
+    hoping to get more insight at the cost of longer overall search time.
+
+    ## Repeatability and Comparability
+
+    Multiple trials improve repeatability, depending on exceed ratio.
+
+    In practice, 1s goal final trial duration with exceed ratio 0.5
+    is good enough for modern SUTs (but that usually requires
+    smaller wait times around the traffic part of the trial,
+    otherwise too much search time is wasted waiting).
+
+    It is not clear whether exceed ratios higher than 0.5 are better
+    for repeatability.
+    The 0.5 value is still preferred due to explainability using median.
+    TODO: Stress single value is for comparability, which one is due explainability.
+
+    It is possible that the conditional throughput values (with non-zero
+    goal loss ratio) are better for repeatability than the relevant
+    lower bound values, especially for implementations
+    which pick load from a small set of discrete values.
+
+    Implementations focusing on shortening the overall search time
+    are automatically forced to avoid comparability issues
+    due to load selection, as they must prefer even splits wherever possible.
+    But this conclusion only holds when the same goals are used.
+    Larger adoption is needed before any further claims on comparability
+    between MLRsearch implementations can be made.
+
+    ## Throughput with Non-Zero Loss
+
+    Suported by the goal loss ratio attribute.
+    Improves repeatability as expected.
+
+    ## Inconsistent Trial Results
+
+    MLRsearch is conservative wherever possible,
+    this is built into the definition of conditional throughput,
+    and into the treatment of short trial results for load classification.
+
+    This is consistent with [RFC2544] zero loss tolerance motivation.
+
+    If the very best (noiseless) part of the SUT performance spectrum
+    is of interest, it should be enough to set small value for
+    the goal final trial duration, and perhaps also a large value
+    for the goal exceed ratio.
+
+    Implementations may offer other (optional) configuration attributes
+    (and optional goal result attributes)
+    to become less conservative, but currently it is not clear
+    what impact would that have on repeatability.
+
+{:/comment}
+
+# IANA Considerations
+
+No requests of IANA.
+
+# Security Considerations
+
+Benchmarking activities as described in this memo are limited to
+technology characterization of a DUT/SUT using controlled stimuli in a
+laboratory environment, with dedicated address space and the constraints
+specified in the sections above.
+
+The benchmarking network topology will be an independent test setup and
+MUST NOT be connected to devices that may forward the test traffic into
+a production network or misroute traffic to the test management network.
+
+Further, benchmarking is performed on a "black-box" basis, relying
+solely on measurements observable external to the DUT/SUT.
+
+Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
+benchmarking purposes. Any implications for network security arising
+from the DUT/SUT SHOULD be identical in the lab and in production
+networks.
+
+# Acknowledgements
+
+Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
+review and numerous useful comments and suggestions.
+
+Special wholehearted gratitude and thanks to late Al Morton for his
+thorough reviews filled with very specific feedback and constructive
+guidelines. Thank you Al for the close collaboration over the years,
+for your continuous unwavering encouragements full of empathy and
+positive attitude.
+Al, you are dearly missed.
+
+# Appendix A
+
+This is a specification of load classification.
+
+The block at the end of this appendix holds pseudocode
+which computes two values, stored in variables named optimistic and pessimistic.
+The pseudocode happens to be a valid Python code.
+
+If both values are computed to be true, the load in question
+is classified as a lower bound according to the goal in question.
+If both values are false, the load is classified as an upper bound.
+Otherwise, the load is classifies as undecided.
+
+The pseudocole expects the following variables hold values as follows:
+
+- goal_duration_sum: The goal duration sum value.
+
+- goal_exceed_ratio: The goal exceed ratio value.
+
+- good_long_sum: Sum of durations across trials with trial duration
+  at least equal to the goal final trial duration and with trial loss ratio
+  not higher than the goal loss ratio.
+
+- bad_long_sum: Sum of durations across trials with trial duration
+  at least equal to the goal final trial duration and with trial loss ratio
+  higher than the goal loss ratio.
+
+- good_short_sum: Sum of durations across trials with trial duration
+  shorter than the goal final trial duration and with trial loss ratio
+  not higher than the goal loss ratio.
+
+- bad_short_sum: Sum of durations across trials with trial duration
+  shorter than the goal final trial duration and with trial loss ratio
+  higher than the goal loss ratio.
+
+Here the implicit set of available trial results consists of all trials
+measured at given intended load at the end of search.
+
+The code works correctly also when there are no trial results at given load.
+
+```
+balancing_sum = good_short_sum * goal_exceed_ratio / (1.0 - goal_exceed_ratio)
+effective_bad_sum = bad_long_sum + max(0.0, bad_short_sum - balancing_sum)
+effective_whole_sum = max(good_long_sum + effective_bad_sum, goal_duration_sum)
+quantile_duration_sum = effective_whole_sum * goal_exceed_ratio
+optimistic = effective_bad_sum <= quantile_duration_sum
+pessimistic = (effective_whole_sum - good_long_sum) <= quantile_duration_sum
+```
+
+# Appendix B
+
+This is a specification of conditional throughput.
+
+The block at the end of this appendix holds pseudocode
+which computes a value stored as variable conditional_throughput.
+The pseudocode happens to be a valid Python code.
+
+The pseudocole expects the following variables hold values as follows:
+
+- goal_duration_sum: The goal duration sum value.
+
+- goal_exceed_ratio: The goal exceed ratio value.
+
+- good_long_sum: Sum of durations across trials with trial duration
+  at least equal to the goal final trial duration and with trial loss ratio
+  not higher than the goal loss ratio.
+
+- bad_long_sum: Sum of durations across trials with trial duration
+  at least equal to the goal final trial duration and with trial loss ratio
+  higher than the goal loss ratio.
+
+- long_trials: An iterable of all trial results from trials with trial duration
+  at least equal to the goal final trial duration,
+  sorted by increasing trial loss ratio.
+  A trial result is a composite with the following two attributes available:
+
+  - trial.loss_ratio: The trial loss ratio as measured for this trial.
+
+  - trial.duration: The trial duration of this trial.
+
+Here the implicit set of available trial results consists of all trials
+measured at given intended load at the end of search.
+
+The code works correctly only when there if there is at leas one
+trial result measured at given load.
+
+```
+all_long_sum = max(goal_duration_sum, good_long_sum + bad_long_sum)
+remaining = all_long_sum * (1.0 - goal_exceed_ratio)
+quantile_loss_ratio = None
+for trial in long_trials:
+    if quantile_loss_ratio is None or remaining > 0.0:
+        quantile_loss_ratio = trial.loss_ratio
+        remaining -= trial.duration
+    else:
+        break
+else:
+    if remaining > 0.0:
+        quantile_loss_ratio = 1.0
+
+conditional_throughput = intended_load * (1.0 - quantile_loss_ratio)
+```
+
+--- back
diff --git a/docs/ietf/process.txt b/docs/ietf/process.txt
index 78f7d7f34f..f83e6bbc99 100644
--- a/docs/ietf/process.txt
+++ b/docs/ietf/process.txt
@@ -20,7 +20,7 @@ $ sudo gem install kramdown-rfc2629
 $ kdrfc --version
 
 Main:
-$ kdrfc draft-ietf-bmwg-mlrsearch-04.md
+$ kdrfc draft-ietf-bmwg-mlrsearch-05.md
 
 If that complains, do it manually at https://author-tools.ietf.org/
 
-- 
2.16.6