From: Vratko Polak Date: Tue, 8 Mar 2022 09:06:56 +0000 (+0100) Subject: chore(ietf): Partially update MLRsearch content X-Git-Url: https://gerrit.fd.io/r/gitweb?p=csit.git;a=commitdiff_plain;h=b89d8e07bd48f615b2e5955d754ee5381deb0012 chore(ietf): Partially update MLRsearch content - Multiple comments and TODOs. - Still contains old sections with duplicated content. - Many commented-out sections waiting for delete. - Several nits reported by IETF nitpicking tool. Anyway, the time is up, so to time fixing it all. Change-Id: I0c56f63bc9ae8173b6f566e862490123091ead79 Signed-off-by: Vratko Polak --- diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-02.md b/docs/ietf/draft-ietf-bmwg-mlrsearch-02.md index 48db64e46a..fef146618c 100644 --- a/docs/ietf/draft-ietf-bmwg-mlrsearch-02.md +++ b/docs/ietf/draft-ietf-bmwg-mlrsearch-02.md @@ -2,7 +2,7 @@ title: Multiple Loss Ratio Search for Packet Throughput (MLRsearch) abbrev: Multiple Loss Ratio Search docname: draft-ietf-bmwg-mlrsearch-02 -date: 2021-11-02 +date: 2022-03-07 ipr: trust200902 area: ops @@ -34,9 +34,9 @@ normative: informative: FDio-CSIT-MLRsearch: - target: https://docs.fd.io/csit/rls2101/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.html + target: https://s3-docs.fd.io/csit/rls2110/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.html#mlrsearch-tests title: "FD.io CSIT Test Methodology - MLRsearch" - date: 2021-02 + date: 2021-11 PyPI-MLRsearch: target: https://pypi.org/project/MLRsearch/0.4.0/ title: "MLRsearch 0.4.0, Python Package Index" @@ -44,6 +44,8 @@ informative: --- abstract +TODO: Update after all sections are ready. + This document proposes changes to [RFC2544], specifically to packet throughput search methodology, by defining a new search algorithm referred to as Multiple Loss Ratio search (MLRsearch for short). Instead @@ -67,91 +69,766 @@ can be done to describe the replicability. --- middle +{::comment} + As we use kramdown to convert from markdown, + we use this way of marking comments not to be visible in rendered draft. + https://stackoverflow.com/a/42323390 + If other engine is used, convert to this way: + https://stackoverflow.com/a/20885980 +{:/comment} + # Terminology -* Frame size: size of an Ethernet Layer-2 frame on the wire, including - any VLAN tags (dot1q, dot1ad) and Ethernet FCS, but excluding Ethernet - preamble and inter-frame gap. Measured in bytes (octets). -* Packet size: same as frame size, both terms used interchangeably. -* Device Under Test (DUT): In software networking, "device" denotes a - specific piece of software tasked with packet processing. Such device - is surrounded with other software components (such as operating system - kernel). It is not possible to run devices without also running the - other components, and hardware resources are shared between both. For - purposes of testing, the whole set of hardware and software components - is called "system under test" (SUT). As SUT is the part of the whole - test setup performance of which can be measured by [RFC2544] methods, - this document uses SUT instead of [RFC2544] DUT. Device under test - (DUT) can be re-introduced when analysing test results using whitebox - techniques, but this document sticks to blackbox testing. -* System Under Test (SUT): System under test (SUT) is a part of the - whole test setup whose performance is to be benchmarked. The complete - test setup contains other parts, whose performance is either already - established, or not affecting the benchmarking result. -* Bi-directional throughput tests: involve packets/frames flowing in - both transmit and receive directions over every tested interface of - SUT/DUT. Packet flow metrics are measured per direction, and can be - reported as aggregate for both directions and/or separately - for each measured direction. In most cases bi-directional tests - use the same (symmetric) load in both directions. -* Uni-directional throughput tests: involve packets/frames flowing in - only one direction, i.e. either transmit or receive direction, over - every tested interface of SUT/DUT. Packet flow metrics are measured - and are reported for measured direction. -* Packet Loss Ratio (PLR): ratio of packets received relative to packets - transmitted over the test trial duration, calculated using formula: - PLR = ( pkts_transmitted - pkts_received ) / pkts_transmitted. - For bi-directional throughput tests aggregate PLR is calculated based - on the aggregate number of packets transmitted and received. -* Effective loss ratio: A corrected value of measured packet loss ratio - chosen to avoid difficulties if SUT exhibits decreasing loss - with increasing load. Maximum of packet loss ratios measured at the same - duration on all loads smaller than (and including) the current one. -* Target loss ratio: A packet loss ratio value acting as an input for search. - The search is finding tight enough lower and upper bound in intended load, - so that the lower bound has smaller or equal loss ratio, and upper bound - has strictly larger loss ratio. For the tightest upper bound, - the effective loss ratio is the same as packet loss ratio. +TODO: Update after most other sections are updated. + +{::comment} + The following is probably not needed (or defined elsewhere). + + * Frame size: size of an Ethernet Layer-2 frame on the wire, including + any VLAN tags (dot1q, dot1ad) and Ethernet FCS, but excluding Ethernet + preamble and inter-frame gap. Measured in bytes (octets). + * Packet size: same as frame size, both terms used interchangeably. + * Device Under Test (DUT): In software networking, "device" denotes a + specific piece of software tasked with packet processing. Such device + is surrounded with other software components (such as operating system + kernel). It is not possible to run devices without also running the + other components, and hardware resources are shared between both. For + purposes of testing, the whole set of hardware and software components + is called "system under test" (SUT). As SUT is the part of the whole + test setup performance of which can be measured by [RFC2544] methods, + this document uses SUT instead of [RFC2544] DUT. Device under test + (DUT) can be re-introduced when analysing test results using whitebox + techniques, but this document sticks to blackbox testing. + * System Under Test (SUT): System under test (SUT) is a part of the + whole test setup whose performance is to be benchmarked. The complete + test setup contains other parts, whose performance is either already + established, or not affecting the benchmarking result. + * Bi-directional throughput tests: involve packets/frames flowing in + both transmit and receive directions over every tested interface of + SUT/DUT. Packet flow metrics are measured per direction, and can be + reported as aggregate for both directions and/or separately + for each measured direction. In most cases bi-directional tests + use the same (symmetric) load in both directions. + * Uni-directional throughput tests: involve packets/frames flowing in + only one direction, i.e. either transmit or receive direction, over + every tested interface of SUT/DUT. Packet flow metrics are measured + and are reported for measured direction. + * Packet Throughput Rate: maximum packet offered load DUT/SUT forwards + within the specified Packet Loss Ratio (PLR). In many cases the rate + depends on the frame size processed by DUT/SUT. Hence packet + throughput rate MUST be quoted with specific frame size as received by + DUT/SUT during the measurement. For bi-directional tests, packet + throughput rate should be reported as aggregate for both directions. + Measured in packets-per-second (pps) or frames-per-second (fps), + equivalent metrics. + * Bandwidth Throughput Rate: a secondary metric calculated from packet + throughput rate using formula: bw_rate = pkt_rate * (frame_size + + L1_overhead) * 8, where L1_overhead for Ethernet includes preamble (8 + octets) and inter-frame gap (12 octets). For bi-directional tests, + bandwidth throughput rate should be reported as aggregate for both + directions. Expressed in bits-per-second (bps). + * TODO do we need this as it is identical to RFC2544 Throughput? + Non Drop Rate (NDR): maximum packet/bandwidth throughput rate sustained + by DUT/SUT at PLR equal zero (zero packet loss) specific to tested + frame size(s). MUST be quoted with specific packet size as received by + DUT/SUT during the measurement. Packet NDR measured in + packets-per-second (or fps), bandwidth NDR expressed in + bits-per-second (bps). + * TODO if needed, reformulate to make it clear there can be multiple rates + for multiple (non-zero) loss ratios. + : Partial Drop Rate (PDR): maximum packet/bandwidth throughput rate + sustained by DUT/SUT at PLR greater than zero (non-zero packet loss) + specific to tested frame size(s). MUST be quoted with specific packet + size as received by DUT/SUT during the measurement. Packet PDR + measured in packets-per-second (or fps), bandwidth PDR expressed in + bits-per-second (bps). + * TODO: Refer to FRMOL instead. + Maximum Receive Rate (MRR): packet/bandwidth rate regardless of PLR + sustained by DUT/SUT under specified Maximum Transmit Rate (MTR) + packet load offered by traffic generator. MUST be quoted with both + specific packet size and MTR as received by DUT/SUT during the + measurement. Packet MRR measured in packets-per-second (or fps), + bandwidth MRR expressed in bits-per-second (bps). + * TODO just keep using "trial measurement"? + Trial: a single measurement step. See [RFC2544] section 23. + * TODO already defined in RFC2544: + Trial duration: amount of time over which packets are transmitted + in a single measurement step. +{:/comment} +{::comment} +{:/comment} + +* TODO: The current text uses Throughput for the zero loss ratio load. + Is the capital T needed/useful? +* DUT and SUT: see the definitions in https://gerrit.fd.io/r/c/csit/+/35545 +* Traffic Generator (TG) and Traffic Analyzer (TA): see + https://datatracker.ietf.org/doc/html/rfc6894#section-4 + TODO: Maybe there is an earlier RFC? +* Overall search time: the time it takes to find all required loads within + their precision goals, starting from zero trials measured at given + DUT configuration and traffic profile. +* TODO: traffic profile? +* Intended load: https://datatracker.ietf.org/doc/html/rfc2285#section-3.5.1 +* Offered load: https://datatracker.ietf.org/doc/html/rfc2285#section-3.5.2 +* Maximum offered load (MOL): see + https://datatracker.ietf.org/doc/html/rfc2285#section-3.5.3 +* Forwarding rate at maximum offered load (FRMOL) + https://datatracker.ietf.org/doc/html/rfc2285#section-3.6.2 +* Trial Loss Count: the number of frames transmitted + minus the number of frames received. Negative count is possible, + e.g. when SUT duplicates some frames. +* Trial Loss Ratio: ratio of frames received relative to frames + transmitted over the trial duration. + For bi-directional throughput tests, the aggregate ratio is calculated, + based on the aggregate number of frames transmitted and received. + If the trial loss count is negative, its absolute value MUST be used + to keep compliance with RFC2544. +* Safe load: any value, such that trial measurement at this (or lower) + intended load is correcrly handled by both TG and TA, regardless of SUT behavior. + Frequently, it is not known what the safe load is. +* Max load (TODO rename?): Maximal intended load to be used during search. + Benchmarking team decides which value is low enough + to guarantee values reported by TG and TA are reliable. + It has to be a safe load, but it can be lower than a safe load estimate + for added safety. + See the subsection on unreliable test equipment below. + This value MUST NOT be higher than MOL, which itself MUST NOT + be higher than Maximum Frame Rate + https://datatracker.ietf.org/doc/html/rfc2544#section-20 +* Min load: Minimal intended load to be used during search. + Benchmarking team decides which value is high enough + to guarantee the trial measurement results are valid. + E.g. considerable overall search time can be saved by declaring SUT + faulty if min load trial shows too high loss rate. + Zero frames per second is a valid min load value +* Effective loss ratio: a corrected value of trial loss ratio + chosen to avoid difficulties if SUT exhibits decreasing loss ratio + with increasing load. It is the maximum of trial loss ratios + measured at the same duration on all loads smaller than (and including) + the current one. +* Target loss ratio: a loss ratio value acting as an input for the search. + The search is finding tight enough lower and upper bounds in intended load, + so that the measurement at the lower bound has smaller or equal + trial loss ratio, and upper bound has strictly larger trial loss ratio. + For the tightest upper bound, the effective loss ratio is the same as + trial loss ratio at that upper bound load. For the tightest lower bound, the effective loss ratio can be higher - than the packet loss ratio, but still not larger than the target loss ratio. -* Packet Throughput Rate: maximum packet offered load DUT/SUT forwards - within the specified Packet Loss Ratio (PLR). In many cases the rate - depends on the frame size processed by DUT/SUT. Hence packet - throughput rate MUST be quoted with specific frame size as received by - DUT/SUT during the measurement. For bi-directional tests, packet - throughput rate should be reported as aggregate for both directions. - Measured in packets-per-second (pps) or frames-per-second (fps), - equivalent metrics. -* Bandwidth Throughput Rate: a secondary metric calculated from packet - throughput rate using formula: bw_rate = pkt_rate * (frame_size + - L1_overhead) * 8, where L1_overhead for Ethernet includes preamble (8 - octets) and inter-frame gap (12 octets). For bi-directional tests, - bandwidth throughput rate should be reported as aggregate for both - directions. Expressed in bits-per-second (bps). -* Non Drop Rate (NDR): maximum packet/bandwidth throughput rate sustained - by DUT/SUT at PLR equal zero (zero packet loss) specific to tested - frame size(s). MUST be quoted with specific packet size as received by - DUT/SUT during the measurement. Packet NDR measured in - packets-per-second (or fps), bandwidth NDR expressed in - bits-per-second (bps). -* Partial Drop Rate (PDR): maximum packet/bandwidth throughput rate - sustained by DUT/SUT at PLR greater than zero (non-zero packet loss) - specific to tested frame size(s). MUST be quoted with specific packet - size as received by DUT/SUT during the measurement. Packet PDR - measured in packets-per-second (or fps), bandwidth PDR expressed in - bits-per-second (bps). -* Maximum Receive Rate (MRR): packet/bandwidth rate regardless of PLR - sustained by DUT/SUT under specified Maximum Transmit Rate (MTR) - packet load offered by traffic generator. MUST be quoted with both - specific packet size and MTR as received by DUT/SUT during the - measurement. Packet MRR measured in packets-per-second (or fps), - bandwidth MRR expressed in bits-per-second (bps). -* Trial: a single measurement step. See [RFC2544] section 23. -* Trial duration: amount of time over which packets are transmitted - in a single measurement step. + than the trial loss ratio at that lower bound, but still not larger + than the target loss ratio. +* TODO: Search algorithm. +* TODO: Precision goal. +* TODO: Define a "benchmarking group". +* TODO: Upper and lower bound. +* TODO: Valid and invalid bound? +* TODO: Interval and interval width? + +TODO: Mention NIC/PCI bandwidth/pps limits can be lower than bandwidth of medium. + +# Intentions of this document + +{::comment} + Instead of talking about DUTs being non-deterministic + and vendors "gaming" in order to get better Throughput results, + Maciek and Vratko currently prefer to talk about result repeatability. +{:/comment} + +The intention of this document is to provide recommendations for: +* optimizing search for multiple target loss ratios at once, +* speeding up the overall search time, +* improve search results repeatability and comparability. + +No part of RFC2544 is intended to be obsoleted by this document. + +{::comment} + This document may contain examples which contradict RFC2544 requirements + and suggestions. + That is not an ecouragement for benchmarking groups + to stop being compliant with RFC2544. +{:/comment} + +# RFC2544 + +## Throughput search + +It is useful to restate the key requirements of RFC2544 +using the new terminology (see section Terminology). + +The following sections of RFC2544 are of interest for this document. + +* https://datatracker.ietf.org/doc/html/rfc2544#section-20 + Mentions the max load SHOULD not be larget than the theoretical + maximum rate for the frame size on the media. + +* https://datatracker.ietf.org/doc/html/rfc2544#section-23 + Lists the actions to be done for each trial measurement, + it also mentions loss rate as an example of trial measurement results. + This document uses loss count instead, as that is the quantity + that is easier for the current test equipment to measure, + e.g. it is not affected by the real traffic duration. + TODO: Time uncertainty again. + +* https://datatracker.ietf.org/doc/html/rfc2544#section-24 + Mentions "full length trials" leading to the Throughput found, + as opposed to shorter trial durations, allowed in an attempt + to "minimize the length of search procedure". + This document talks about "final trial duration" and aims to + "optimize overal search time". + +* https://datatracker.ietf.org/doc/html/rfc2544#section-26.1 + with https://www.rfc-editor.org/errata/eid422 + finaly states requirements for the search procedure. + It boils down to "increase intended load upon zero trial loss + and decrease intended load upon non-zero trial loss". + +No additional constraints are placed on the load selection, +and there is no mention of an exit condition, e.g. when there is enough +trial measurements to proclaim the largest load with zero trial loss +(and final trial duration) to be the Throughput found. + +{::comment} + The following section is probably not useful enough. + + ## Generalized search + + Note that the Throughput search can be restated as a "conditional + load search" with a specific condition. + + "increase intended load upon trial result satisfying the condition + and decrease intended load upon trial result not satisfying the condition" + where the Throughput condition is "trial loss count is zero". + + This works for any condition that can be evaluated from a single + trial measurement result, and is likely to be true at low loads + and false at high loads. + + MLRsearch can incorporate multiple different conditions, + as long as there is total ligical ordering between them + (e.g. if a condition for a target loss ratio is not satisfied, + it is also not satisfied for any other codition which uses + larger target loss ratio). + + TODO: How to call a "load associated with this particular condition"? +{:/comment} + +{::comment} + + TODO: Not sure if this subsection is needed an where. + + ## Simple bisection + + There is one obvious and simple search algorithm which conforms + to throughput search requirements: simple bijection. + + Input: target precision, in frames per second. + + Procedure: + + 1. Chose min load to be zero. + 1. No need to measure, loss count has to be zero. + 2. Use the zero load as the current lower bound. + 2. Chose max load to be the max value allowed by bandwidth of the medium. + 1. Perform a trial measurement (at the full length duration) at max load. + 2. If there is zero trial loss count, return max load as Throughput. + 3. Use max load as the current upper bound. + 3. Repeat until the difference between lower bound and upper bound is + smaller or equal to the precision goal. + 1. If it is not larget, return the current lower bound as Throughput. + 2. Else: Chose new load as the arithmetic average of lower and upper bound. + 3. Perform a trial measurement (at the full length duration) at this load. + 4. If the trial loss rate is zero, consider the load as new lower bound. + 5. Else consider the load as the new upper bound. + 6. Jump back to the repeat at 3. + + Another possible stop condition is the overal search time so far, + but that is not really a different condition, as the time for search to reach + the precision goal is just a function of precision goal, trial duration + and the difference between max and min load. + + While this algorithm can be accomodated to search for multiple + target loss ratios "at the same time (see somewhere below), + it is still missing multiple improvement which give MLRsearch + considerably better overal search time in practice. + +{:/comment} + +# Problems + +## Repeatability and Comparability + +RFC2544 does not suggest to repeat Throughput search, +{::comment}probably because the full set of tests already takes long{:/comment} +and from just one Throughput value, it cannot be determined +how repeatable that value is (how likely it is for a repeated Throughput search +to end up with a value less then the precision goal away from the first value). + +Depending on SUT behavior, different benchmark groups +can report significantly different Througput values, +even when using identical SUT and test equipment, +just because of minor differences in their search algorithm +(e.g. different max load value). + +While repeatability can be addressed by repeating the search several times, +the differences in the comparability scenario may be systematic, +e.g. seeming like a bias in one or both benchmark groups. + +MLRsearch algorithm does not really help with the repeatability problem. +This document RECOMMENDS to repeat a selection of "important" tests +ten times, so users can ascertain the repeatability of the results. + +TODO: How to report? Average and standard deviation? + +Following MLRsearch algorithm leaves less freedom for the benchmark groups +to encounter the comparability problem, +alghough more research is needed to determine the effect +of MLRsearch's tweakable parameters. + +{::comment} + Possibly, the old DUTs were quite sharply consistent in their performance, + and/or precision goals were quite large in order to save overal search time. + + With software DUTs and with time-efficient search algorithms, + nowadays the repeatability of Throughput can be quite low, + as in standard deviation of repeated Througput results + is considerably higher than the precision goal. +{:/comment} + +{::comment} + TODO: Unify with PLRsearch draft. + TODO: No-loss region, random region, lossy region. + TODO: Tweaks with respect to non-zero loss ratio goal. + TODO: Duration dependence? + + Both RFC2544 and MLRsearch return Throughput somewhere inside the random region, + or at most the precision goal below it. +{:/comment} + +{::comment} + TODO: Make sure this is covered elsewhere, then delete. + + ## Search repeatability + + The goal of RFC1242 and RFC2544 is to limit how vendors benchmark their DUTs, + in order to force them to report values that have higher chance + to be confirmed by independent benchmarking groups following the same RFCs. + + This works well for deterministic DUTs. + + But for non-deterministic DUTs, the RFC2544 Throughput value + is only guaranteed to fall somewhere below the lossy region (TODO define). + It is possible to arrive at a value positioned likely high in the random region + at the cost of increased overall search duration, + simply by lowering the load by very small amounts (instead of exact halving) + upon lossy trial and increasing by large amounts upon lossless trial. + + Prescribing an exact search algorithm (bisection or MLRsearch or other) + will force vendors to report less "gamey" Throughput values. +{:/comment} + +{::comment} + ## Extensions + + The following two sections are probably out of scope, + as they does not affect MLRsearch design choices. + + ### Direct and inverse measurements + + TODO expand: Direct measurement is single trial measurement, + with predescribed inputs and outputs turned directly into the quality of interest + Examples: + Latency https://datatracker.ietf.org/doc/html/rfc2544#section-26.2 + is a single direct measurement. + Frame loss rate https://datatracker.ietf.org/doc/html/rfc2544#section-26.3 + is a sequence of direct measurements. + + TODO expand: Indirect measurement aims to solve an "inverse function problem", + meaning (a part of) trial measurement output is prescribed, and the quantity + of interest is (derived from) the input parameters of trial measurement + that achieves the prescribed output. + In general this is a hard problem, but if the unknown input parameter + is just one-dimensional quantity, algorithms such as bisection + do converge regardless of outputs seen. + We call any such algorithm examining one-dimensional input as "search". + Of course, some exit condition is needed for the search to end. + In case of Throughput, bisection algorithm tracks both upper bound + and lower bound, with lower bound at the end of search is the quantity + satisfying the definition of Throughput. + + ### Metrics other than frames + + TODO expand: Small TCP transaction can succeed even if some frames are lost. + + TODO expand: It is possible for loss ratio to use different metric than load. + E.g. pps loss ratio when traffic profile uses higher level transactions per second. + + ### TODO: Stateful DUT + + ### TODO: Stateful traffic +{:/comment} + +## Non-Zero Target Loss Ratios + +https://datatracker.ietf.org/doc/html/rfc1242#section-3.17 +defines Throughput as: + The maximum rate at which none of the offered frames + are dropped by the device. + +and then it says: + Since even the loss of one frame in a + data stream can cause significant delays while + waiting for the higher level protocols to time out, + it is useful to know the actual maximum data + rate that the device can support. + +{::comment} + + While this may still be true for some protocols, + research has been performed... + + TODO: Add this link properly: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-Y.1541-201112-I!!PDF-E&type=items + TODO: List values from that document, from 10^-3 to 4*10^-6. + + ...on other protocols and use cases, + resulting in some small but non-zero loss ratios being considered + as acceptable. Unfortunately, the acceptable value depends on use case + and properties such as TCP window size and round trip time, + so no single value of target loss rate (other than zero) + is considered to be universally applicable. + +{:/comment} + +New "software DUTs" (traffic forwarding programs running on +commercial-off-the-shelf compute server hardware) frequently exhibit quite +low repeatability of Throughput results per above definition. + +This is due to, in general, throughput rates of software DUTs (programs) +being sensitive to server resource allocation by OS during runtime, +as well as any interrupts or blocking of software threads involved +in packet processing. + +To deal with this, this document recommends discovery of multiple throughput rates of interest for software DUTs that run on general purpose COTS servers (with x86, AArch64 Instruction Set Architectures): +* throughput rate with target of zero packet loss ratio. +* at least one throughput rate with target of non-zero packet loss ratio. + + +In our experience, the higher the target loss ratio is, +the better is the repeatability of the corresponding load found. + +TODO: Define a good name for a load corresponding to a specific non-zero +target loss ration, while keeping Throughput for the load corresponding +to zero target loss ratio. + +This document RECOMMENDS the benchmark groups to search for corresponding loads +to at least one non-zero target loss ratio. +This document does not suggest any particular non-zero target loss ratio value +to search the corresponding load for. + +{::comment} + What is worse, some benchmark groups (which groups?; citation needed) + started reporting loads that achieved only "approximate zero loss", + while still calling that a Throughput (and thus becoming non-compliant + with RFC2544). +{:/comment} + +# Solution ideas + +This document gives several independent ideas on how to lower the (average) +overall search time, while remaining unconditionally compliant with RFC2544 +(and adding some of extensions). + +This document also specifies one particular way to combine all the ideas +into a single search algorithm class (single logic with few tweakable parameters). + +Little to no research has been done into the question of which combination +of ideas achieves the best compromise with respect to overal search time, +high repeatability and high comparability. + +TODO: How important it is to discuss particular implementation choices, +especially when motivated by non-deterministic SUT behavior? + +## Short duration trials + +https://datatracker.ietf.org/doc/html/rfc2544#section-24 +already mentions the possibity of using shorter duration +for trials that are not part of "final determination". + +Obviously, the upper and lower bound from a smaller duration trial +can be used as the initial upper and lower bound for the final determination. + +MLRsearch makes it clear a re-measurement is always needed +(new trial measurement with the same load but longer duration). +It also specifes what to do if the longer trial is no longer a valid bound +(TODO define?), e.g. start an external search. +Additionaly one halving can be saved during the shorter duration search. + +## FRMOL as reasonable start + +TODO expand: Overal search ends with "final determination" search, +preceded by "shorter duration search" preceded by "bound initialization", +where the bounds can be considerably different from min and max load. + +For SUTs with high repeatability, the FRMOL is usually a good approximation +of Throughput. But for less repeatable SUTs, forwarding rate (TODO define) +is frequently a bad approximation to Throughput, therefore halving +and other robust-to-worst-case approaches have to be used. +Still, forwarding rate at FRMOL load can be a good initial bound. + +## Non-zero loss ratios + +See the "Popularity of non-zero target loss ratios" section above. + +TODO: Define "trial measurement result classification criteria", +or keep reusing long phrases without definitions? + +A search for a load corresponding to a non-zero target loss rate +is very similar to a search for Throughput, +just the criterion when to increase or decrease the intended load +for the next trial measurement uses the comparison of trial loss ratio +to the target loss ratio (instead of comparing loss count to zero) +Any search algorithm that works for Throughput can be easily used also for +non-zero target loss rates, perhaps with small modifications +in places where the measured forwarding rate is used. + +Note that it is possible to search for multiple loss ratio goals if needed. + +## Concurrent ratio search + +A single trial measurement result can act as an upper bound for a lower +target loss ratio, and as a lower bound for a higher target loss ratio +at the same time. This is an example of how +it can be advantageous to search for all loss ratio goals "at once", +or at least "reuse" trial measurement result done so far. + +Even when a search algorithm is fully deterministic in load selection +while focusing on a single loss ratio and trial duration, +the choice of iteration order between target loss ratios and trial durations +can affect the obtained results in subtle ways. +MLRsearch offers one particular ordering. + +{::comment} + It is not clear if the current ordering is "best", + it is not even clear how to measure how good an ordering is. + We would need several models for bad SUT behaviors, + bug-free implementations of different orderings, + simulator to show the distribution of rates found, + distribution of overall durations, + and a criterion of which rate distribution is "bad" + and whether it is worth the time saved. +{:/comment} +{::comment} +{:/comment} + +## Load selection heuristics and shortcuts + +Aside of the two heuristics already mentioned (FRMOL based initial bounds +and saving one halving when increasing trial duration), +there are other tricks that can save some overall search time +at the cost of keeping the difference between final lower and upper bound +intentionally large (but still within the precision goal). + +TODO: Refer implementation subsections on: +* Uneven splits. +* Rounding the interval width up. +* Using old invalid bounds for interval width guessing. + +The impact on overall duration is probably small, +and the effect on result distribution maybe even smaller. +TODO: Is the two-liner above useful at all? + +# Non-compliance with RFC2544 + +It is possible to achieve even faster search times by abandoning +some requirements and suggestions of RFC2544, +mainly by reducing the wait times at start and end of trial. + +Such results are therefore no longer compliant with RFC2544 +(or at least not unconditionally), +but they may still be useful for internal usage, or for comparing +results of different DUTs achieved with an identical non-compliant algorithm. + +TODO: Refer to the subsection with CSIT customizations. + +# Additional Requirements + +RFC2544 can be understood as having a number of implicit requirements. +They are made explicit in this section +(as requirements for this document, not for RFC2544). + +Recommendations on how to properly address the implicit requirements +are out of scope of this document. + +{::comment} + + Although some (insufficient) ideas are proposed. + +{:/comment} + +## TODO: Search Stop Criteria + +TODO: Mention the timeout parameter? + +{::comment} + + TODO: highlight importance of results consistency + for SUT performance trending and anomaly detection. + +{:/comment} + +## Reliability of Test Equipment + +Both TG and TA MUST be able to handle correctly +every intended load used during the search. + +On TG side, the difference between Intended Load and Offered Load +MUST be small. + +TODO: How small? Difference of one packet may not be measurable +due to time uncertainties. + +{::comment} + + Maciek: 1 packet out of 10M, that's 10**-7 accuracy. + + Vratko: For example, TRex uses several "worker" threads, each doing its own + rounding on how many packets to send, separately per each traffic stream. + For high loads and durations, the observed number of frames transmitted + can differ from the expected (fractional) value by tens of frames. + +{:/comment} + +TODO expand: time uncertainty. + +To ensure that, max load (see Terminology) has to be set to low enough value. +Benchmark groups MAY list the max load value used, +especially if the Throughput value is equal (or close) to the max load. + +{::comment} + + The following is probably out of scope of this document, + but can be useful when put into a separate document. + + TODO expand: If it results in smaller Throughput reported, + it is not a big issue. Treat similarly to bandwidth and PPS limits of NICs. + + TODO expand: TA dropping packets when loaded only lowers Throughput, + so not an issue. + + TODO expand: TG sending less packets but stopping at target duration + is also fine, as long as the forwarding rate is used as Throughput value, + not the higher intended load. + + TODO expand: Duration stretching is not fine. + Neither "check for actual duration" nor "start+sleep+stop" + are reliable solutions due to time overheads and uncertainty + of TG starting/stopping traffic (and TA stopping counting packets). + +{:/comment} + +Solutions (even problem formulations) for the following open problems +are outside of the scope of this document: +* Detecting when the test equipment operates above its safe load. +* Finding a large but safe load value. +* Correcting any result affected by max load value not being a safe load. + +{::comment} + + TODO: Mention 90% of self-test as an idea: + https://datatracker.ietf.org/doc/html/rfc8219#section-9.2.1 + + This is pointing to DNS testing, nothing to do with throughput, + so how is it relevant here? + +{:/comment} + +{::comment} + + Part of discussion on BMWG mailing list (with small edits): + + This is a hard issue. + The algorithm as described has no way of knowing + which part of the whole system is limiting the performance. + + It could be SUT only (no problem, testing SUT as expected), + it could be TG only (can be mitigated by TG self-test + and using small enough loads). + + But it could also be an interaction between DUT and TG. + Imagine a TG (the Traffic Analyzer part) which is only able + to handle incoming traffic up to some rate, + but passes the self-test as the Generator part has maximal rate + not larger than that. But what if SUT turns that steady rate + into long-enough bursts of a higher rate (with delays between bursts + large enough, so average forwarding rate matches the load). + This way TA will see some packets as missing (when its buffers + fill up), even though SUT has processed them correctly. + +{:/comment} + +### Very late frames + +{::comment} + + In CSIT we are aggressive at skipping all wait times around trial, + but few of DUTs have large enough buffers. + Or there is another reason why we are seeing negative loss counts. + +{:/comment} + + +RFC2544 requires quite conservative time delays +see https://datatracker.ietf.org/doc/html/rfc2544#section-23 +to prevent frames buffered in one trial measurement +to be counted as received in a subsequent trial measurement. + +However, for some SUTs it may still be possible to buffer enough frames, +so they are still sending them (perhaps in bursts) +when the next trial measurement starts. +Sometimes, this can be detected as a negative trial loss count, e.g. TA receiving +more frames than TG has sent during this trial measurement. Frame duplication +is another way of causing the negative trial loss count. + +https://datatracker.ietf.org/doc/html/rfc2544#section-10 +recommends to use sequence numbers in frame payloads, +but generating and verifying them requires test equipment resources, +which may be not plenty enough to suport at high loads. +(Using low enough max load would work, but frequently that would be +smaller than SUT's sctual Throughput.) + +RFC2544 does not offer any solution to the negative loss problem, +except implicitly treating negative trial loss counts +the same way as positive trial loss counts. + +This document also does not offer any practical solution. + +Instead, this document SUGGESTS the search algorithm to take any precaution +necessary to avoid very late frames. + +This document also REQUIRES any detected duplicate frames to be counted +as additional lost frames. +This document also REQUIRES, any negative trial loss ratio +to be treated as positive trial loss ratio of the same absolute value. + +{::comment} + + !!! Make sure this is covered elsewere, at least in better comments. !!! + + ## TODO: Bad behavior of SUT + + (Highest load with always zero loss can be quite far from lowest load + with always nonzero loss.) + (Non-determinism: warm up, periodic "stalls", perf decrease over time, ...) + + Big buffers: + http://www.hit.bme.hu/~lencse/publications/ECC-2017-B-M-DNS64-revised.pdf + See page 8 and search for the word "gaming". + +{:/comment} + +!!! Nothing below is up-to-date with draft v02. !!! # MLRsearch Background +TODO: Old section, probably obsoleted by preceding section(s). + Multiple Loss Ratio search (MLRsearch) is a packet throughput search algorithm suitable for deterministic systems (as opposed to probabilistic systems). MLRsearch discovers multiple packet throughput