From: Vratko Polak Date: Mon, 7 Jul 2025 14:11:32 +0000 (+0200) Subject: feat(ietf): Finalize MLRsearch draft 11 X-Git-Url: https://gerrit.fd.io/r/gitweb?a=commitdiff_plain;h=refs%2Fchanges%2F67%2F43367%2F28;p=csit.git feat(ietf): Finalize MLRsearch draft 11 Change-Id: Ie7d0a00d309b1db48b0524286bf19b0d33ba0116 Signed-off-by: Maciek Konstantynowicz Signed-off-by: Vratko Polak --- diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-10.md b/docs/ietf/draft-ietf-bmwg-mlrsearch-11.md similarity index 60% rename from docs/ietf/draft-ietf-bmwg-mlrsearch-10.md rename to docs/ietf/draft-ietf-bmwg-mlrsearch-11.md index b323a98533..4981c3185a 100644 --- a/docs/ietf/draft-ietf-bmwg-mlrsearch-10.md +++ b/docs/ietf/draft-ietf-bmwg-mlrsearch-11.md @@ -2,9 +2,10 @@ title: Multiple Loss Ratio Search abbrev: MLRsearch -docname: draft-ietf-bmwg-mlrsearch-10 -date: 2025-03-16 +docname: draft-ietf-bmwg-mlrsearch-11 +date: 2025-07-07 +submissionType: IETF ipr: trust200902 area: ops wg: Benchmarking Working Group @@ -38,27 +39,34 @@ normative: informative: RFC5180: -{::comment} - - [MB116]: Please move to information, as this was provided only as an example. - - [VP]: Ok. - - [MK]: Moved. -{:/comment} +# Comment: This part before first --- is not markdown but YAML, so comments need different escape. +#{::comment} +# +# MB116: Please move to information, as this was provided only as an example. +# +# VP: Ok. +# +# MK: Moved. +# +#{:/comment} RFC6349: + RFC6985: RFC8219: -{::comment} - - [MB117]: Idem as the other entry. - - [VP]: Ok. - - [MK]: Moved. -{:/comment} +#{::comment} +# +# MB117: Idem as the other entry. +# +# VP: Ok. +# +# MK: Moved. +# +#{:/comment} TST009: target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf title: "TST 009" + Y.1564: + target: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-Y.1564-201602-I!!PDF-E&type=items + title: "Y.1564" FDio-CSIT-MLRsearch: target: https://csit.fd.io/cdocs/methodology/measurements/data_plane_throughput/mlr_search/ title: "FD.io CSIT Test Methodology - MLRsearch" @@ -69,72 +77,84 @@ informative: date: 2023-10 Lencze-Shima: target: https://datatracker.ietf.org/doc/html/draft-lencse-bmwg-rfc2544-bis-00 - title: "An Upgrade to Benchmarking Methodology for Network Interconnect Devices" -{::comment} - - [MB118]: This was expired since 2020. Please remove. Idem for all similar entries - - [VP]: Hmm, ok. - - [MK]: Disagree. It is still a useful reference. Marking as expired, - but keeping it here. Can we add following entry: - - [Lencze-Shima] Lencse, G., "Benchmarking Methodology for IP - Forwarding Devices – RFC 2544bis", Work in Progress, - Internet-Draftdraft-lencse-bmwg-rfc2544-bis-009 March 2015. - (Expired.) - -{:/comment} + title: "An Upgrade to Benchmarking Methodology for Network Interconnect Devices - expired" +#{::comment} +# +# MB118: This was expired since 2020. Please remove. Idem for all similar entries +# +# VP: Hmm, ok. +# +# MK: Disagree. It is still a useful reference. Marking as expired, +# but keeping it here. Can we add following entry: +# +# [Lencze-Shima] Lencse, G., "Benchmarking Methodology for IP +# Forwarding Devices - RFC 2544bis", Work in Progress, +# Internet-Draftdraft-lencse-bmwg-rfc2544-bis-009 March 2015. +# (Expired.) +# +# VP: DONE201: Delete or update or replace the active item. +# +#{:/comment} Lencze-Kovacs-Shima: target: http://dx.doi.org/10.11601/ijates.v9i2.288 title: "Gaming with the Throughput and the Latency Benchmarking Measurement Procedures of RFC 2544" Ott-Mathis-Semke-Mahdavi: target: https://www.cs.cornell.edu/people/egs/cornellonly/syslunch/fall02/ott.pdf title: "The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm" + Vassilev: + target: https://datatracker.ietf.org/doc/draft-ietf-bmwg-network-tester-cfg/06 + title: "A YANG Data Model for Network Tester Management" --- abstract -This document specifies extensions to “Benchmarking Methodology for -Network Interconnect Devices” (RFC 2544) throughput search by +This document specifies extensions to "Benchmarking Methodology for +Network Interconnect Devices" (RFC 2544) throughput search by +defining a new methodology called Multiple Loss Ratio search +(MLRsearch). MLRsearch aims to minimize search duration, +support multiple loss ratio searches, and improve result repeatability +and comparability. + {::comment} - [MB2]: The abstract should self-contained. Hence the need to expand the RFC title. + MB2: The abstract should self-contained. Hence the need to expand the RFC title. - [VP]: Ok. + VP: Ok. - [MK]: Ok. Edited. + MK: Ok. Edited. {:/comment} -defining a new methodology called Multiple Loss Ratio search + {::comment} - [MB1]: This may trigger automatically a comment whether we change (update or amend) any of RFC2544 text. + MB1: This may trigger automatically a comment whether we change (update or amend) any of RFC2544 text. Do we? - [VP]: Pending BMWG decision. - [VP]: For draft11: Officially independent. + VP: Pending BMWG decision. + VP: For draft11: Officially independent. - [MK]: MLRsearch extends RFC2544. Does not change it, nor does it amend it. + MK: MLRsearch extends RFC2544. Does not change it, nor does it amend it. + + VP: The idea was to extend in sense of adding one new benchmark. + But as we added more requirements and possible deviations around trials, + the new methodology is independent from (while possible to combine with) RFC2544. + + MK: LGTM. {:/comment} -(MLRsearch). MLRsearch aims to minimize search duration, -support multiple loss ratio searches, and improve result repeatability -and comparability. MLRsearch is motivated by the pressing need to address the challenges of evaluating and testing the various data plane solutions, especially in software- based networking systems based on Commercial Off-the-Shelf (COTS) CPU hardware vs purpose-built ASIC / NPU / FPGA hardware. - {::comment} - [MB3]: What is meant here? What is specific to these systems? + MB3: What is meant here? What is specific to these systems? Do we need to have this mention at this stage? - [VP]: Do not distinguish in abstract + VP: Do not distinguish in abstract - [MK]: Updated text to focus on COTS hardware vs purpose-built + MK: Updated text to focus on COTS hardware vs purpose-built hardware. Let us know if this requires further text in abstract. (We should keep it concise.) @@ -142,12 +162,12 @@ software- based networking systems based on Commercial Off-the-Shelf {::comment} - [MB4]: Too detailed for an abstract. Can be mentioned in an overview/introduction section + MB4: Too detailed for an abstract. Can be mentioned in an overview/introduction section - [VP]: Agreed, we no not need to list the options here. + VP: Agreed, we no not need to list the options here. - [MK]: OK. - [MK]: Removed. + MK: OK. + MK: Removed. {:/comment} @@ -169,202 +189,242 @@ software- based networking systems based on Commercial Off-the-Shelf This document describes the Multiple Loss Ratio search (MLRsearch) methodology, optimized for determining data plane -throughput in software-based networking functions running on commodity +throughput in software-based networking functions running on commodity systems with x86/ARM CPUs (vs purpose-built ASIC / NPU / FPGA). Such network functions can be deployed on dedicated physical appliance (e.g., a standalone hardware device) or as virtual appliance (e.g., Virtual Network Function running on shared servers in the compute cloud). -{::comment} +## Purpose - [MB6]: Should be defined. - Not sure what is specific as any networking device is a software-based device. Even hardware, it is not more than frozen software ;) +{::comment} - [VP]: We can mention “noisiness” here, not sure how detailed + MK: Suggest to change title to Purpose, as it does not provide + brief overview of the document's structure and key content areas. - [MK]: Good point. Added text clarifying the difference. See if this - is good enough, or does this need any more explanation. - [MK]: Edited. + VP: Done. {:/comment} -Applying the vanilla throughput binary search, as specified in -[RFC2544] and [TST-009] +The purpose of this document is to describe the Multiple Loss Ratio search +(MLRsearch) methodology, optimized for determining +data plane throughput in software-based networking devices and functions. {::comment} - [MB7]: Can we have an explicit reference for the method? + MB6: Should be defined. + Not sure what is specific as any networking device is a software-based device. Even hardware, it is not more than frozen software ;) - [VP]: Need to search but should be doable + VP: We can mention "noisiness" here, not sure how detailed - [MK]: RFC2544 mentions binary-search style procedure without fully - specifying the algorithm. The only other standard that defines is - ETSI GS NFV-TST 009 - adding it here. - [MK]: Edited. + MK: Good point. Added text clarifying the difference. See if this + is good enough, or does this need any more explanation. + MK: Edited. {:/comment} -to software devices under test (DUTs) results in several problems + +Applying the vanilla throughput binary search, +as specified for example in [TST009] and [RFC2544] +to software devices under test (DUTs) results in several problems: + {::comment} - [MB8]: Expand + MB7: Can we have an explicit reference for the method? + + VP: Need to search but should be doable + + MK: RFC2544 mentions binary-search style procedure without fully + specifying the algorithm. The only other standard that defines is + ETSI GS NFV-TST 009 - adding it here. + MK: Edited. - [VP]: Ok (point to DUT). + VP: Removed RFC 2544 as I understand MB wants reference to specifics. - [MK]: Edited. + MK: But section 24 of RFC 2544 does list "binary search", and it is + this that has been adopted as a defacto standard. Re-adding it back + in :) {:/comment} -- Binary search takes long as most trials are done far from the {::comment} - [MB9]: Can we have a public reference to share here? + MB8: Expand - [VP]: Need to search but should be doable). + VP: Ok (point to DUT). - [MK]: Removed "too". Explanation and public references are provided - in the Identified Problems section. - [MK]: Edited. + MK: Edited. {:/comment} + +- Binary search takes long as most trials are done far from the eventually found throughput. - The required final trial duration and pauses between trials prolong the overall search duration. - Software DUTs show noisy trial results, leading to a big spread of possible discovered throughput values. - Throughput requires a loss of exactly zero frames, but the industry best practices + frequently allow for low but non-zero losses tolerance ([Y.1564], test-equipment manuals). +- The definition of throughput is not clear when trial results are inconsistent. + (e.g., when successive trials at the same - or even a higher - offered + load yield different loss ratios, the classical [RFC1242] / [RFC2544] + throughput metric can no longer be pinned to a single, unambiguous + value.) + {::comment} - [MB10]: What is meant there? + MB9: Can we have a public reference to share here? - [VP]: Expand (industry). + VP: Need to search but should be doable). - [MK]: Improved clarity, by referring to loss tolerance. Added references. - [MK]: Edited. + MK: Removed "too". Explanation and public references are provided + in the Identified Problems section. + MK: Edited. {:/comment} - frequently allow for low but non-zero losses tolerance ([Y.1564], test-equipment manuals). -- The definition of throughput is not clear when trial results are inconsistent. - (e.g., When successive trials at the same - or even a higher - offered - load yield different loss ratios, the classical RFC 1242/RFC 2544 - throughput metric can no longer be pinned to a single, unambiguous - value.) {::comment} - [MB11]: Can we expand on this one? + MB10: What is meant there? - [VP]: Some soft intro to incunsistent trials may be needed here. + VP: Expand (industry). - [MK]: Added text in brackets. See if it is sufficient. - [MK]: Edited. + MK: Improved clarity, by referring to loss tolerance. Added references. + MK: Edited. {:/comment} -To address these problems, -the MLRsearch test methodology employs the following enhancements: {::comment} - [VP]: We should reformulate to make clear which improvements - are not covered by the specification. + MB11: Can we expand on this one? + + VP: Some soft intro to inconsistent trials may be needed here. + + MK: Added text in brackets. See if it is sufficient. + MK: Edited. {:/comment} -- Allow multiple short trials instead of one big trial per load. - - Optionally, tolerate a percentage of trial results with higher loss. -- Allow searching for multiple Search Goals, with differing loss ratios. - - Any trial result can affect each Search Goal in principle. -- Insert multiple coarse targets for each Search Goal, earlier ones need - to spend less time on trials. - - Earlier targets also aim for lesser precision. - - Use Forwarding Rate (FR) at Maximum Offered Load (FRMOL), as defined - in Section 3.6.2 of [RFC2285], to initialize bounds. +To address these problems, +early MLRsearch implementations employed the following enhancements: + +1. Allow multiple short trials instead of one big trial per load. + - Optionally, tolerate a percentage of trial results with higher loss. +2. Allow searching for multiple Search Goals, with differing loss ratios. + - Any trial result can affect each Search Goal in principle. +3. Insert multiple coarse targets for each Search Goal, earlier ones need + to spend less time on trials. + - Earlier targets also aim for lesser precision. + - Use Forwarding Rate (FR) at Maximum Offered Load (FRMOL), as defined + in Section 3.6.2 of [RFC2285], to initialize bounds. +4. Be careful when dealing with inconsistent trial results. + - Reported throughput is smaller than the smallest load with high loss. + - Smaller load candidates are measured first. +5. Apply several time-saving load selection heuristics that deliberately + prevent the bounds from narrowing unnecessarily. {::comment} - [MB12]: There is no such section in the document. + MB12: There is no such section in the document. Do you meant Section 3.6.2 of [RFC2285]? If so, please update accordingly. Idem for all similar occurrences in the document. Thanks. - [VP]: Clarify. Check for every external section referenced. + VP: Clarify. Check for every external section referenced. - [MK]: Yes Section 3.6.2 of [RFC2285] defining FRMOL. - [MK]: Edited. + MK: Yes Section 3.6.2 of [RFC2285] defining FRMOL. + MK: Edited. {:/comment} -- Be careful when dealing with inconsistent trial results. - - Reported throughput is smaller than the smallest load with high loss. - - Smaller load candidates are measured first. -- Apply several time-saving load selection heuristics that deliberately - prevent the bounds from narrowing unnecessarily. - {::comment} - [MB13]: Maximizing means? + MB13: Maximizing means? + + VP: Reformulate. + + MK: Edited. + +{:/comment} + +{::comment} - [VP]: Reformulate. + VP: Item 3 is also mostly out of scope, + if we do not count Goal Initial Trial Duration + (it is and example of optional attribute, not a recommendation). - [MK]: Edited. + TODO202: Either say the list talks about CSIT implementation, + or downgrade item 3 to level of item 5 (example optimization + that is ultimately out of scope of MLRsearch Specification). {:/comment} -The first four enhancements +Enhacements 1, 2 and partly 4 are formalized as MLRsearch Specification +within this document, other implementation details are out the scope. + {::comment} - [MB14]: Which ones? + MB14: Which ones? - [VP]: Describe the lists better so "some" is not needed here. + VP: Describe the lists better so "some" is not needed here. - [MK]: Edited. + MK: Edited. {:/comment} -are formalized as MLRsearch Specification within this document. + {::comment} - [MB15]: Where? In this document? + MB15: Where? In this document? - [VP]: Yes. + VP: Yes. - [MK]: Edited. + MK: Edited. {:/comment} + The remaining enhancements are treated as implementation details, thus achieving high comparability without limiting future improvements. -MLRsearch configuration options +MLRsearch configuration +supports both conservative settings and aggressive settings. +Conservative enough settings lead to results +unconditionally compliant with [RFC2544], +but without much improvement on search duration and repeatability - see +[MLRsearch Compliant with RFC 2544](#mlrsearch-compliant-with-rfc-2544). +Conversely, aggressive settings lead to shorter search durations +and better repeatability, but the results are not compliant with [RFC2544]. +Exact settings are not specified, but see the discussion in +[Overview of RFC 2544 Problems](#overview-of-rfc-2544-problems) +for the impact of different settings on result quality. + {::comment} - [MB16]: Where are those defined? Please add a pointer to the appropriate section. + MB16: Where are those defined? Please add a pointer to the appropriate section. - [VP]: Add pointer. + VP: Add pointer. - [MK]: TODO. + MK: DONE203. We do not have any section in this document covering + implementation details, as these are out of scope. Shall we add a + note to that regard? {:/comment} -are flexible enough to + {::comment} - [MB17]: "flexibe" is ambiguous. Simply, state what we do. + MB17: "flexibe" is ambiguous. Simply, state what we do. - [VP]: Reformulate. + VP: Reformulate. - [MK]: TODO. + MK: Edited. {:/comment} -support both conservative settings and aggressive settings. -Conservative enough settings lead to results -unconditionally compliant with [RFC2544], -but without much improvement on search duration and repeatability. -Conversely, aggressive settings lead to shorter search durations -and better repeatability, but the results are not compliant with [RFC2544]. + {::comment} - [MB18]: Add pointers where this is further elaborated. + MB18: Add pointers where this is further elaborated. - [VP]: Point to specific subsection. + VP: Point to specific subsection. - [MK]: TODO. + MK: Added. {:/comment} @@ -372,12 +432,12 @@ This document does not change or obsolete any part of [RFC2544]. {::comment} - [MB19]: List the set of terms/definitions used in this document. + MB19: List the set of terms/definitions used in this document. I guess we should at least leverage terms defined in 2544/1242. - [VP]: Move list of terms here? + VP: Move list of terms here? - [MK]: Relevant existing terms, including the ones from rfcs 1242, + MK: Relevant existing terms, including the ones from rfcs 1242, 2285 and 2544, are captured in section 4.3 Existing Terms, followed by the new terms that form the MLRsearch Specification. We went through quite a few iterations of getting it right, including a @@ -385,51 +445,75 @@ This document does not change or obsolete any part of [RFC2544]. following BMWG comments and reviews ended up with the current document structure. Reworking it back is substantial work - [MK]: TODO Instead I propose we list one liners explaining the term in + MK: Instead I propose we list one liners explaining the term in the context of the benchmarking domain. + VP: See the comment in first Specification paragraph. + For specific MB comment, I propose to say no edit needed, + but ask on bmwg mailer to confirm. + {:/comment} {::comment} - [MB20]: Also, please add a statement that the convention used in bmwg + MB20: Also, please add a statement that the convention used in bmwg are followed here as well (def, discussion, etc.) - [VP]: Ok + VP: Ok - [MK] The Requirements Language text is the standard one we use in + MK: The Requirements Language text is the standard one we use in BMWG. There are no any strict BMWG conventions that are followed in this document. Rather, the convention used for terms that are specific to this document, is described in the Section 4 of this document, and forms part of the MLRsearch Specification. -{:/comment} - -# Requirements Language - -{::comment} - - [MB5]: Move after the intro - - [VP]: Ok. - - [MK]: OK. - [MK]: Moved. + VP: I think this is done, covered by edits elsewhere. {:/comment} -The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", -"SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" -in this document are to be interpreted as described in BCP 14 [RFC2119] {::comment} - The two references have to come one after another to avoid boilerplate nit, - but the xml2rfc processing (web service) is buggy and strips rfc2119 brackets. - Luckily having this comment here avoids the bug and creates correct .xml file. + WONTFIX204: Update the subsection above when the subsections below are complete enough. + Too late. {:/comment} -[RFC8174] when, and only when, they appear in all capitals, as shown here. +## Positioning within BMWG Methodologies + +The Benchmarking Methodology Working Group (BMWG) produces recommendations (RFCs) +that describe various benchmarking methodologies for use in a controlled laboratory environment. +A large number of these benchmarks are based on the terminology from [RFC1242] +and the foundational methodology from [RFC2544]. +A common pattern has emerged where BMWG documents reference the methodology of [RFC2544] +and augment it with specific requirements for testing particular network systems or protocols, +without modifying the core benchmark definitions. + +While BMWG documents are formally recommendations, +they are widely treated as industry norms to ensure the comparability of results between different labs. +The set of benchmarks defined in [RFC2544], in particular, +became a de facto standard for performance testing. +In this context, the MLRsearch Specification formally defines a new +class of benchmarks that fits within the wider [RFC2544] framework +(see [Scope ](#scope)). + +A primary consideration in the design of MLRsearch is the trade-off +between configurability and comparability. The methodology's flexibility, +especially the ability to define various sets of Search Goals, +supporting both single-goal and multiple-goal benchmarks in an unified way +is powerful for detailed characterization and internal testing. +However, this same flexibility is detrimental to inter-lab comparability +unless a specific, common set of Search Goals is agreed upon. + +Therefore, MLRsearch should not be seen as a direct extension +nor a replacement for the [RFC2544] Throughput benchmark. +Instead, this document provides a foundational methodology +that future BMWG documents can use to define new, specific, and comparable benchmarks +by mandating particular Search Goal configurations. +For operators of existing test procedures, it is worth noting +that many test setups measuring [RFC2544] Throughput +can be adapted to produce results compliant with the MLRsearch Specification, +often without affecting Trials, +merely by augmenting the content of the final test report. # Overview of RFC 2544 Problems @@ -440,72 +524,76 @@ mainly a binary search for [RFC2544] unconditionally compliant throughput. ## Long Search Duration The proliferation of software DUTs, with frequent software updates and a + +number of different frame processing modes and configurations, +has increased both the number of performance tests +required to verify the DUT update and the frequency of running those tests. +This makes the overall test execution time even more important than before. + {::comment} - [MB21]: Is this really new? + MB21: Is this really new? - [VP]: Not sure, ask Maciek + VP: Not sure, ask Maciek - [MK]: Changed “emergence” to “proliferation”. And yes, the + MK: Changed "emergence" to "proliferation". And yes, the proliferation and their importance is new. - [MK]: Edited. + MK: Edited. {:/comment} -number of different frame processing modes and configurations, -has increased both the number of performance tests -required to verify the DUT update and the frequency of running those tests. -This makes the overall test execution time even more important than before. The throughput definition per [RFC2544] restricts the potential +for time-efficiency improvements. +The bisection method, when used in a manner unconditionally compliant +with [RFC2544], is excessively slow due to two main factors. + +Firstly, a significant amount of time is spent on trials +with loads that, in retrospect, are far from the final determined throughput. + {::comment} - [MB22]: Won’t age well + MB22: Won't age well - [VP]: I agree, should be reformulated, not sure how. + VP: I agree, should be reformulated, not sure how. - [MK]: Accepted proposed text change. - [MK]: Edited. + MK: Accepted proposed text change. + MK: Edited. {:/comment} -for time-efficiency improvements. {::comment} - [MB23]: Concretely, be affirmative if we provide an elaborated def, + MB23: Concretely, be affirmative if we provide an elaborated def, otherwise this statement can be removed. - [VP]: Reformulate to affirm and point. + VP: Reformulate to affirm and point. - [MK]: Agree. This is problem statement, not solution description, so + MK: Agree. This is problem statement, not solution description, so removed this paragraph. - [MK]: Removed. + MK: Removed. {:/comment} -The bisection method, when used in a manner unconditionally compliant -with [RFC2544], is excessively slow due to two main factors. {::comment} - [MB24]: Can we have a reference? + MB24: Can we have a reference? - [VP]: Find references. + VP: Find references. - [MK]: Added wording connecting to the following paragraphs with + MK: Added wording connecting to the following paragraphs with explanations. - [MK]: Edited. + MK: Edited. {:/comment} -Firstly, a significant amount of time is spent on trials -with loads that, in retrospect, are far from the final determined throughput. {::comment} - [MB25]: Define "users". + MB25: Define "users". - [VP]: Yes, we should be more careful around role names. + VP: Yes, we should be more careful around role names. - [MK]: Added text. - [MK]: Edited. + MK: Added text. + MK: Edited. {:/comment} @@ -514,26 +602,28 @@ throughput search, so users of testing equipment implementing the procedure already have access to a limited trade-off between search duration and achieved precision. However, each of the full 60-second trials doubles the precision. + {::comment} - [MB26]: Can we include a reminder of the 2544 search basics? (no need to be verbose, though)? + MB26: Can we include a reminder of the 2544 search basics? (no need to be verbose, though)? - [VP]: Maybe, not sure how feasible. + VP: Maybe, not sure how feasible. - [MK]: Added. - [MK]: Edited. + MK: Added. + MK: Edited. {:/comment} + As such, not many trials can be removed without a substantial loss of precision. For reference, here is a brief [RFC2544] throughput binary -(bisection) reminder, based on Sections 24 and 26 of [RFC2544]: +(bisection) reminder, based on Sections 24 and 26 of [RFC2544: -* Set Max ≈ line-rate and Min = a proven loss-free load. +* Set Max = line-rate and Min = a proven loss-free load. * Run a single 60-s trial at the midpoint. -* Zero-loss ⇒ midpoint becomes new Min; any loss ⇒ new Max. -* Repeat until the Max–Min gap meets the desired precision, then report - the highest zero-loss rate for every mandatory frame size. +* Zero-loss -> midpoint becomes new Min; any loss-> new Max. +* Repeat until the Max-Min gap meets the desired precision, then report + the highest zero-loss rate for every mandatory frame size. ## DUT in SUT @@ -543,14 +633,15 @@ DUT as: - The network frame forwarding device to which stimulus is offered and response measured Section 3.1.1 of [RFC2285]. + {::comment} - [MB27]: Double check + MB27: Double check - [VP]: Ok. + VP: Ok. - [MK]: Checked. OK. - [MK]: Edited. + MK: Checked. OK. + MK: Edited. {:/comment} @@ -564,7 +655,6 @@ stimulating the networking system, treating it either as a single Device Under Test (DUT), or as a system of devices, a System Under Test (SUT). - For software-based data-plane forwarding running on commodity x86/ARM CPUs, the SUT comprises not only the forwarding application itself, the DUT, but the entire execution environment: host hardware, firmware and @@ -573,48 +663,48 @@ that share the same CPUs, memory and I/O resources. {::comment} - [MB28]: This makes assumptions on the software architecture. We need to make sure this is generic enough. + MB28: This makes assumptions on the software architecture. We need to make sure this is generic enough. For example, what is a server? Etc. Does it applies to container, microservice, SF a la RFC7665, VNF a la ETSI, etc.? - [VP]: Ask Maciek. - - [MK]: Rewritten it a bit to make it more generic. See if this helps. - [MK]: Edited. + VP: Ask Maciek. + MK: Rewritten it a bit to make it more generic. See if this helps. + MK: Edited. {:/comment} Given that a SUT is a shared multi-tenant environment, +the DUT might inadvertently +experience interference from the operating system +or other software operating on the same server. + {::comment} - [MB29]: Such as? + MB29: Such as? - [VP]: We should reformulate. Other components may differ (give few examples) but interference is general. + VP: We should reformulate. Other components may differ (give few examples) but interference is general. - [MK]: Removed surplus text, as it is now explained in preceding paragraph. - [MK]: Edited. + MK: Removed surplus text, as it is now explained in preceding paragraph. + MK: Edited. {:/comment} -the DUT might inadvertently -experience interference from the operating system -or other software operating on the same server. Some of this interference can be mitigated. For instance, in multi-core CPU systems, pinning DUT program threads to specific CPU cores +and isolating those cores can prevent context switching. {::comment} - [MB30]: If many? Or do we assume there are always many? + MB30: If many? Or do we assume there are always many? - [VP]: Reformulate. + VP: Reformulate. - [MK]: Made it explicit for this paragraph. - [MK]: Edited. + MK: Made it explicit for this paragraph. + MK: Edited. {:/comment} -and isolating those cores can prevent context switching. Despite taking all feasible precautions, some adverse effects may still impact the DUT's network performance. @@ -659,12 +749,12 @@ below the theoretical maximum. {::comment} - [MB31]: I don't parse this one. Please reword. + MB31: I don't parse this one. Please reword. - [VP]: Ok. + VP: Ok. - [MK]: Rephrased. Hope it reads better now. - [MK]: Edited. + MK: Rephrased. Hope it reads better now. + MK: Edited. {:/comment} @@ -681,50 +771,53 @@ to distinguish SUT noise from DUT fluctuations {::comment} - [MB32]: As we need to reflect the view of the WG/IETF, not only authors + MB32: As we need to reflect the view of the WG/IETF, not only authors - [VP]: Ask Maciek. + VP: Ask Maciek. - [MK]: Proposed text looks good. OK. - [MK]: Edited. + MK: Proposed text looks good. OK. + MK: Edited. {:/comment} Provided SUT execution environment and any co-resident workloads place only negligible demands on SUT shared resources, so that +the DUT remains the principal performance limiter, +the DUT's ideal noiseless performance is defined +as the noiseless end of the SUT performance spectrum. {::comment} - [MB33]: That is? + MB33: That is? - [VP]: Reformulate. + VP: Reformulate. - [MK]: Clarified. - [MK]: Edited. + MK: Clarified. + MK: Edited. {:/comment} -the DUT remains the principal performance limiter, -the DUT's ideal noiseless performance is defined + {::comment} - [MB34]: Please avoid "we" constructs. + MB34: Please avoid "we" constructs. + + VP: Ok. Search and replace all into passive voice. - [VP]: Ok. Search and replace all into passive voice. + MK: for the whole document. - [MK]: OK. TODO for the whole document. + VP: Done here, created separate comments elsewhere. {:/comment} -as the noiseless end of the SUT performance spectrum + {::comment} - [MB35]: Can we cite an example? + MB35: Can we cite an example? - [VP]: Yes for latency + VP: Yes for latency - [MK]: Focus of mlrsearch is finding throughput. On 2nd thought, + MK: Focus of mlrsearch is finding throughput. On 2nd thought, removing reference to latency as it is not applicable. - [MK]: Edited. - + MK: Edited. {:/comment} @@ -744,11 +837,11 @@ across a range of SUT noise-tolerance levels. {::comment} - [MB36]: ? + MB36: ? - [VP]: Reformulate. + VP: Reformulate. - [MK]: Edited. + MK: Edited. {:/comment} @@ -790,31 +883,33 @@ as less dependent on the SUT noise. An alternative option is to simply run a search multiple times, and report some statistics (e.g., average and standard deviation, and/or percentiles like p95). + {::comment} - [MB37]: What about at some other representative percentiles? + MB37: What about at some other representative percentiles? - [VP]: Ok. + VP: Ok. - [MK]: Added percentiles. - [MK]: Edited. + MK: Added percentiles. + MK: Edited. {:/comment} + This can be used for a subset of tests deemed more important, but it makes the search duration problem even more pronounced. ## Throughput with Non-Zero Loss Section 3.17 of [RFC1242] defines throughput as: - The maximum rate at which none of the offered frames - are dropped by the device. +: The maximum rate at which none of the offered frames +are dropped by the device. Then, it says: - Since even the loss of one frame in a - data stream can cause significant delays while - waiting for the higher-level protocols to time out, - it is useful to know the actual maximum data - rate that the device can support. +: Since even the loss of one frame in a +data stream can cause significant delays while +waiting for the higher-level protocols to time out, +it is useful to know the actual maximum data +rate that the device can support. However, many benchmarking teams accept a low, non-zero loss ratio as the goal for their load search. @@ -822,30 +917,32 @@ non-zero loss ratio as the goal for their load search. Motivations are many: - Networking protocols tolerate frame loss better, + compared to the time when [RFC1242] and [RFC2544] were specified. + {::comment} - [MB38]: 1242 was also modern at the time they were published ;) - This can be easily stale. Let’s avoid that + MB38: 1242 was also modern at the time they were published ;) + This can be easily stale. Let's avoid that - [VP]: Ok. + VP: Ok. - [MK]: OK. + MK: OK. {:/comment} - compared to the time when [RFC1242] and [RFC2544] were specified. - Increased link speeds require trials sending way more frames within the same duration, + increasing the chance of a small SUT performance fluctuation + being enough to cause frame loss. + {::comment} - [MB39]: Won’t age well. + MB39: Won't age well. - [VP]: Ok, but some things did change over time (in focus if not in existence). Ask Maciek. + VP: Ok, but some things did change over time (in focus if not in existence). Ask Maciek. - [MK]: Edited. + MK: Edited. {:/comment} - increasing the chance of a small SUT performance fluctuation - being enough to cause frame loss. - Because noise-related drops usually arrive in small bursts, their impact on the trial's overall frame loss ratio is diluted by the @@ -855,40 +952,42 @@ Motivations are many: {::comment} - [MB40]: Please split. Too long + MB40: Please split. Too long - [VP]: At this point we probably should add a subsection somewhere, + VP: At this point we probably should add a subsection somewhere, discussing how short-time performance may fluctuate within reasonable-duration trial (even as short as 1s). - [MK]: Split with some rewording. - [MK]: Edited. + MK: Split with some rewording. + MK: Edited. {:/comment} - If an approximation of the SUT noise impact on the Trial Loss Ratio is known, + it can be set as the Goal Loss Ratio (see definitions of + Trial and Goal terms in [Trial Terms](#trial-terms) and [Goal Terms](#goal-terms)). + {::comment} - [MB41]: Help readers find where to look for an authoritative definition. + MB41: Help readers find where to look for an authoritative definition. - [VP]: The original paragraph maybe describes periodic processes eating CPU or even impact + VP: The original paragraph maybe describes periodic processes eating CPU or even impact of reconfiguration during traffic, but both may be too exotic for this specification. I recommend to delete this paragraph. Otherwise, add link. - [MK]: Added. - [MK]: Edited. + MK: Added. + MK: Edited. {:/comment} - it can be set as the Goal Loss Ratio (see definitions of - Trial and Goal terms in Sections 4.4 and 4.5 of this document). + {::comment} - [MB42]: Help readers find where to look for an authoritative definition. + MB42: Help readers find where to look for an authoritative definition. - [VP]: Add link if not deleted? + VP: Add link if not deleted? - [MK]: Added. - [MK]: Edited. + MK: Added. + MK: Edited. {:/comment} @@ -897,19 +996,20 @@ Motivations are many: Regardless of the validity of all similar motivations, support for non-zero loss goals makes a +search algorithm more user-friendly. +[RFC2544] throughput is not user-friendly in this regard. + {::comment} - [MB43]: We cant claim that + MB43: We cant claim that - [VP]: Ok, but also current sentence has circular dependency between non-zero rates + VP: Ok, but also current sentence has circular dependency between non-zero rates and specific user-friendliness. Reformulate. - [MK]: done. - [MK]: Edited. + MK: done. + MK: Edited. {:/comment} -search algorithm more user-friendly. -[RFC2544] throughput is not user-friendly in this regard. Furthermore, allowing users to specify multiple loss ratio values, and enabling a single search to find all relevant bounds, @@ -927,17 +1027,6 @@ But it is not that obvious how to search for multiple goals at once, hence the support for multiple Search Goals remains a problem. At the time of writing there does not seem to be a consensus in the industry -{::comment} - - [MB44]: Among? - Also, indicate "at the time of writing". - - [VP]: Ok. - - [MK]: done. - [MK]: Edited. - -{:/comment} on which ratio value is the best. For users, performance of higher protocol layers is important, for example, goodput of TCP connection (TCP throughput, [RFC6349]), but relationship @@ -947,6 +1036,18 @@ Section 3 of [RFC6349] for loss ratios acceptable for an accurate measurement of TCP throughput, and [Ott-Mathis-Semke-Mahdavi] for models and calculations of TCP performance in presence of packet loss. +{::comment} + + MB44: Among? + Also, indicate "at the time of writing". + + VP: Ok. + + MK: done. + MK: Edited. + +{:/comment} + ## Inconsistent Trial Results While performing throughput search by executing a sequence of @@ -964,33 +1065,35 @@ The plain bisection never encounters inconsistent trials. But [RFC2544] hints about the possibility of inconsistent trial results, in two places in its text. The first place is Section 24 of [RFC2544], +where full trial durations are required, +presumably because they can be inconsistent with the results +from short trial durations. +The second place is Section 26.3 of [RFC2544], +where two successive zero-loss trials +are recommended, presumably because after one zero-loss trial +there can be a subsequent inconsistent non-zero-loss trial. + {::comment} - [MB45]: ?? + MB45: ?? - [VP]: Full reference is needed. + VP: Full reference is needed. - [MK]: done. - [MK]: Edited. + MK: done. + MK: Edited. {:/comment} -where full trial durations are required, -presumably because they can be inconsistent with the results -from short trial durations. -The second place is Section 26.3 of [RFC2544], + {::comment} - [MB46]: ?? + MB46: ?? - [VP]: Also full reference. + VP: Also full reference. - [MK]: done. - [MK]: Edited. + MK: done. + MK: Edited. {:/comment} -where two successive zero-loss trials -are recommended, presumably because after one zero-loss trial -there can be a subsequent inconsistent non-zero-loss trial. A robust throughput search algorithm needs to decide how to continue the search in the presence of such inconsistencies. @@ -1006,25 +1109,95 @@ inconsistent trial results remains an open problem. Relevant Lower Bound is the MLRsearch term that addresses this problem. +# Requirements Language + +{::comment} + + MB5: Move after the intro + + VP: Ok. + + MK: OK. + MK: Moved. + + VP: Currently the "intro" is quite long, so moved after "problems" now + so this is situated closer to Specification. + +{:/comment} + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", +"SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" +in this document are to be interpreted as described in BCP 14, [RFC2119] +and [RFC8174] when, and only when, they appear in all capitals, as shown here. + +{::comment} + + The two references have to come one after another to avoid boilerplate nit, + but the xml2rfc processing (web service) is buggy and strips rfc2119 brackets. + Luckily, having this comment here avoids the bug and creates correct .xml file. + + VP: DONE: Verify the .txt render is still ok. + +{:/comment} + +This document is categorized as an Informational RFC. +While it does not mandate the adoption of the MLRsearch methodology, +it uses the normative language of BCP 14 to provide an unambiguous specification. +This ensures that if a test procedure or test report claims compliance with the MLRsearch Specification, +it MUST adhere to all the absolute requirements defined herein. +The use of normative language is intended to promote repeatable and comparable results +among those who choose to implement this methodology. + +{::comment} + + VP: WONTFIX205: Mention conditional requirements if not clear from usage. + For example, RFC 2544 Trial requirements must be either honored or deviations must be reported. + Only mention in e-mail. + +{:/comment} + # MLRsearch Specification -MLRsearch Specification provides the technical -definitions needed for evaluating whether a particular test procedure +This chapter provides all technical definitions +needed for evaluating whether a particular test procedure complies with MLRsearch Specification. Some terms used in the specification are capitalized. It is just a stylistic choice for this document, reminding the reader this term is introduced, defined or explained -elsewhere in the document. See [Index ](#index) for list of such terms. -Lowercase variants are equally valid. +elsewhere in the document. Lowercase variants are equally valid. + +This document does not separate terminology from methodology. Terms are +fully specified and discussed in their own subsections, under sections +titled "Terms". This way, the list of terms is visible in table of +contents. + +{::comment} + + VP: DONE: Explain there is no separate list for terminology, + subsections under "Terms" sections are terminology items, + packaged together with related requirements and discussions. + + MK: Edited. Reworded to avoid repetition of subsections and such. + Should read better now. + +{:/comment} + +{::comment} + + VP: DONE206: Should we explicitly mention that + the table of contents also acts as a terminology list? + +{:/comment} + {::comment} - [MB47]: Please move this to the terminology section + MB47: Please move this to the terminology section where we can group all conventions used in the document. - [VP]: Ok. + VP: Ok. - [MK]: There is no terminology section per se in this + MK: There is no terminology section per se in this document. See my note to your comments in the Requirements Language section. @@ -1033,150 +1206,311 @@ Lowercase variants are equally valid. Each per term subsection contains a short *Definition* paragraph containing a minimal definition and all strict requirements, followed by *Discussion* paragraphs focusing on important consequences and -recommendations. Requirements about how other components can use the +recommendations. Requirements about how other components can use the defined quantity are also included in the discussion. - {::comment} - [MB48]: Not sure this brings much + MB48: Not sure this brings much - [VP]: Ok, delete. + VP: Ok, delete. - [MK]: done. - [MK]: Edited. + MK: done. + MK: Edited. {:/comment} -## Overview +{::comment} -MLRsearch Specification describes a set of abstract system components, -acting as functions with specified inputs and outputs. + Terminology structure + Second proposal: -A test procedure is said to comply with MLRsearch Specification -if it can be conceptually divided into analogous components, -each satisfying requirements for the corresponding MLRsearch component. -Any such compliant test procedure is called a MLRsearch Implementation. + The normative part of the MLRsearch specification can be decomposed + into a directed acyclic graph, where each node is a "term" + with its definition and requirements. The links in the graph are + dependencies, "later" term can only be fully defined + when all its "earlier" terms are already defined. -The Measurer component is tasked to perform Trials, -the Controller component is tasked to select Trial Durations and Loads, -the Manager component is tasked to pre-configure involved entities -and to produce the test report. -The test report explicitly states Search Goals (as Controller inputs) -and corresponding Goal Results (Controller outputs). + Some terms define composite quantities, subsections could be used + to hold definitions of all the attributes. -The Manager invokes -{::comment} + For readability, informative "discussion" text could be added, + but frequently it is convenient to use a later term + when discussing an earlier term. - [MB49]: Invoke? - Maybe better to clarify what is actually meant by "calls". + The currect structure of sections is a compromise between these motivations. + + VP: Describe this informal principle in official text? + VP: DONE: Not in draft, maybe in e-mail. - [VP]: Mention function calls in first sentence of this subsection. + VP: Distinguish requirements from other discussions? + VP: DONE: Not in draft11, maybe in darf12? - [MK]: done. - [MK]: Edited. + MK: I suggest we refrain from introducing and using DAG or other + analogies from graph theory to explain how terms and terminology + are managed in this document. {:/comment} -a Controller once, -{::comment} - [MB50]: Is there only one? Always? - Include a provision to have many +## Scope - [VP]: Add a sentence about one search. - Complete test suite may perform multiple searches, using maybe different controllers +This document specifies the Multiple Loss Ratio search (MLRsearch) methodology. +The MLRsearch Specification details a new class of benchmarks +by listing all terminology definitions and methodology requirements. +The definitions support "multi-goal" benchmarks, with "single-goal" as a subset. - [MK]: Not sure what you mean. It already says “stopping conditions”, - implying there are many. +The normative scope of this specification includes: -{:/comment} -and the Controller then invokes the Measurer repeatedly -until every stopping condition is satisfied. +* The terminology for all required quantities and their attributes. -The part during which the Controller invokes the Measurer is termed the -Search. Any work the Manager performs either before invoking the -Controller or after Controller returns, falls outside the scope of the -Search. +* An abstract architecture consisting of functional components + (Manager, Controller, Measurer) and the requirements for their inputs and outputs. -MLRsearch Specification prescribes regular search results and recommends -their stopping conditions. -{::comment} +* The required structure and attributes of the Controller Input, + including one or more Search Goal instances. - [MB51]: Does this also cover "abort" (before completion) to handle some error conditions? - Or this is more a "stop execution"? +* The required logic for Load Classification, which determines whether a given Trial Load + qualifies as a Lower Bound or an Upper Bound for a Search Goal. - [VP]: Add sentences about regular exits, irregular errors and user aborts? +* The required structure and attributes of the Controller Output, + including a Goal Result for each Search Goal. - [MK]: Stop execution of the search. +### Relationship to RFC 2544 -{:/comment} -Irregular search results are also allowed, -they may have different requirements and stopping conditions. +MLRsearch Specification is an independent methodology +and does not change or obsolete any part of [RFC2544]. -Search results are based on Load Classification. When measured enough, -a chosen Load can either achieve or fail each Search Goal -(separately), thus becoming a Lower Bound or an Upper Bound for that -Search Goal. +This specification permits deviations from the Trial procedure +as described in [RFC2544]. Any deviation from the [RFC2544] procedure +must be documented explicitly in the Test Report, +and such variations remain outside the scope of the original [RFC2544] benchmarks. +A specific single-goal MLRsearch benchmark can be configured +to be compliant with [RFC2544] Throughput, +and most procedures reporting [RFC2544] Throughput +can be adapted to satisfy also MLRsearch requirements for specific search goal. -For repeatability and comparability purposes, it is important that -all implementations of MLRsearch classify the Load equivalently, -{::comment} +### Applicability of Other Specifications - [MB52]: Do we have taxonomoy/means to make that equivalence easy to put in place? +Methodology extensions from other BMWG documents that specify details +for testing particular DUTs, configurations, or protocols +(e.g., by defining a particular Traffic Profile) are considered orthogonal +to MLRsearch and are applicable to a benchmark conducted using MLRsearch methodology. - [VP]: Add links to Goal Result or Load Classification. - But maybe this sentence is not needed in this subsection? +### Out of Scope - [MK]: It is covered in Sections 4.6.2 Load Classification and 6.1 - Load Classification Logic and 6.4.3 Load Classification - Computations. +The following aspects are explicitly out of the normative scope of this document: -{:/comment} -based on all Trials measured at that Load. +* This specification does not mandate or recommend any single, + universal Search Goal configuration for all use cases. + The selection of Search Goal parameters is left + to the operator of the test procedure or may be defined by future specifications. -When the Relevant Lower Bound is close enough to Relevant Upper Bound -according to Goal Width, the Regular Goal Result is found. +* The internal heuristics or algorithms used by the Controller to select Trial Input values + (e.g., the load selection strategy) are considered implementation details. + +* The potential for, and the effects of, interference between different Search Goal instances + within a multiple-goal search are considered outside the normative scope of this specification. + +## Architecture Overview + +{::comment} + + DONE: Calls/invocations, interfaces. + Low priority but useful for cleaning up the language. + Maybe already fixed by MB49? + +{:/comment} + +Although the normative text references only terminology that has already +been introduced, explanatory passages beside it sometimes profit from +terms that are defined later in the document. To keep the initial +read-through clear, this informative section offers a concise, top-down +sketch of the complete MLRsearch architecture. + +The architecture is modelled as a set of abstract, interacting +components. Information exchange between components is expressed in an +imperative-programming style: one component “calls” another, supplying +inputs (arguments) and receiving outputs (return values). This notation +is purely conceptual; actual implementations need not exchange explicit +messages. When the text contrasts alternative behaviours, it refers to +the different implementations of the same component. + +{::comment} + + VP: DONE207: Re-check usage of Implementation. + Mention in e-mail. + +{:/comment} + +A test procedure is considered compliant with the MLRsearch +Specification if it can be conceptually decomposed into the abstract +components defined herein, and each component satisfies the +requirements defined for its corresponding MLRsearch element. + +The Measurer component is tasked to perform Trials, +the Controller component is tasked to select Trial Durations and Loads, +the Manager component is tasked to pre-configure involved entities +and to produce the Test Report. +The Test Report explicitly states Search Goals (as Controller Input) +and corresponding Goal Results (Controller Output). + +This constitutes one benchmark (single-goal or multi-goal). +Repeated or slightly differing benchmarks are realized +by calling Controller once for each benchmark. + +The Manager calls a Controller once, +and the Controller then invokes the Measurer repeatedly +until Controler decides it has enough information to return outputs. + +{::comment} + + MB49: Invoke? + Maybe better to clarify what is actually meant by "calls". + + VP: Mention function calls in first sentence of this subsection. + + MK: done. + MK: Edited. + + VP: Imperative programming introduced, "calls" is not correct. + + MK: OK. + +{:/comment} + +{::comment} + + MB50: Is there only one? Always? + Include a provision to have many + + VP: Add a sentence about one search. + Complete test suite may perform multiple searches, using maybe different controllers + + MK: Not sure what you mean. It already says "stopping conditions", + implying there are many. + MK: New text LGTM. + +{:/comment} + +The part during which the Controller invokes the Measurer is termed the +Search. Any work the Manager performs either before invoking the +Controller or after Controller returns, falls outside the scope of the +Search. + +MLRsearch Specification prescribes Regular Search Results and recommends +corresponding search completion conditions. + +{::comment} + + MB51: Does this also cover "abort" (before completion) to handle some error conditions? + Or this is more a "stop execution"? + + VP: Add sentences about regular exits, irregular errors and user aborts? + + MK: Stop execution of the search. + +{:/comment} + +Irregular Search Results are also allowed, +they have different requirements and their corresponding stopping conditions are out of scope. + +Search Results are based on Load Classification. When measured enough, +a chosen Load can either achieve or fail each Search Goal +(separately), thus becoming a Lower Bound or an Upper Bound for that +Search Goal. + +When the Relevant Lower Bound is close enough to Relevant Upper Bound +according to Goal Width, the Regular Goal Result is found. Search stops when all Regular Goal Results are found, or when some Search Goals are proven to have only Irregular Goal Results. +{::comment} + + Note: + This comment was about load classifications being equivalent among implementations. + We deleted tat sentence, keeping this block just for tracking purposes. + + MB52: Do we have taxonomoy/means to make that equivalence easy to put in place? + + VP: Add links to Goal Result or Load Classification. + But maybe this sentence is not needed in this subsection? + + MK: It is covered in Sections 4.6.2 Load Classification and 6.1 + Load Classification Logic and 6.4.3 Load Classification + Computations. + +{:/comment} + +{::comment} + + Repeating the same search is possible, this is about single search. + DONE by mentioning repeated benchmarks. + + +{:/comment} + +### Test Report + +A primary responsibility of the Manager is to produce a Test Report, +which serves as the final and formal output of the test procedure. + +This document does not provide a single, complete, normative definition +for the structure of the Test Report. For example, Test Report may contain +results for a single benchmark, or it could aggregate results of many benchmarks. + +Instead, normative requirements for the content of the Test Report +are specified throughout this document in conjunction +with the definitions of the quantities and procedures to which they apply. +Readers should note that any clause requiring a value to be "reported" +or "stated in the test report" constitutes a normative requirement +on the content of this final artifact. + +Even where not stated explicitly, the "Reporting format" +paragraphs in [RFC2544] sections are still requirements on Test Report +if they apply to a MLRsearch benchmark. + ### Behavior Correctness MLRsearch Specification by itself does not guarantee that the Search ends in finite time, as the freedom the Controller has for Load selection also allows for clearly deficient choices. + {::comment} - [MB53]: I suggest we be factual and avoid use of «believe» and so on. + MB53: I suggest we be factual and avoid use of "believe" and so on. - [VP]: Ok. + VP: Ok. - [MK]: Ok. - [MK]: Removed. + MK: Ok. + MK: Removed. + + VP: Verified, currently fixed everywhere. {:/comment} For deeper insights on these matters, refer to [FDio-CSIT-MLRsearch]. -The primary MLRsearch Implementation, used as the prototype +The primary MLRsearch implementation, used as the prototype for this specification, is [PyPI-MLRsearch]. ## Quantities MLRsearch Specification +uses a number of specific quantities, +some of them can be expressed in several different units. + {::comment} - [MB54]: "S" is used in the previous section, + MB54: "S" is used in the previous section, Please pick one form and be consistent through the document. - [VP]: S + VP: S - [MK]: MLRsearch Specification. Done all. - [MK]: Edited. + MK: MLRsearch Specification. Done all. + MK: Edited. {:/comment} -uses a number of specific quantities, -some of them can be expressed in several different units. In general, MLRsearch Specification does not require particular units to be used, but it is REQUIRED for the test report to state all the units. @@ -1184,28 +1518,28 @@ For example, ratio quantities can be dimensionless numbers between zero and one, but may be expressed as percentages instead. For convenience, a group of quantities can be treated as a composite quantity. -One constituent +One constituent of a composite quantity is called an attribute. +A group of attribute values is called an instance of that composite quantity. + {::comment} - [MB55]: Please check + MB55: Please check - [VP]: Reformulate. + VP: Reformulate. - [MK]: Fixed punctuation and broken sentence. - [MK]: Edited. + MK: Fixed punctuation and broken sentence. + MK: Edited. {:/comment} -of a composite quantity is called an attribute. -A group of attribute values is called an instance of that composite quantity. Some attributes may depend on others and can be calculated from other attributes. Such quantities are called derived quantities. ### Current and Final Values -Some quantities are defined so that it is possible to compute their -values in the middle of a Search. Other quantities are specified so -that their values can be computed only after a Search ends. Some +Some quantities are defined in a way that makes it possible to compute their +values in the middle of a Search. Other quantities are specified so +that their values can be computed only after a Search ends. Some quantities are important only after a Search ended, but their values are computable also before a Search ends. @@ -1217,22 +1551,22 @@ is used to denote the value of that quantity at the end of the Search. If a time evolution of such a dynamic quantity is guided by configuration quantities, those adjectives can be used to distinguish -quantities. For example, if the current value of "duration" +quantities. For example, if the current value of "duration" (dynamic quantity) increases from "initial duration" to "final -duration"(configuration quantities), all the quoted names denote -separate but related quantities. As the naming suggests, the final +duration" (configuration quantities), all the quoted names denote +separate but related quantities. As the naming suggests, the final value of "duration" is expected to be equal to "final duration" value. ## Existing Terms {::comment} - [MB56]: I would delete. + MB56: I would delete. - [VP]: Not sure yet. + VP: Not sure yet. - [MK]: Edited, instead of deleting. - [MK]: Edited. + MK: Edited, instead of deleting. + MK: Edited. {:/comment} @@ -1250,304 +1584,447 @@ be consulted before attempting to make use of this document: [RFC2544] contains discussions about terms and additional methodology requirements. - Definitions of some central terms from above documents are copied and discussed in the following subsections. + {::comment} - [MB57]: Please move this to a terminology section suggested above + MB57: Please move this to a terminology section suggested above - [VP]: Ok for paragraph text... + VP: Ok for paragraph text... - [MK]: See my note re your comment to the Requirements Language + MK: See my note re your comment to the Requirements Language section. We ended up keeping the Existing Terms section just before the MLRsearch specific terms for clarity and easier reading, based on feedback from BMWG. {:/comment} +{::comment} + + Some discussed aspects are important and specific to MLRsearch Specification, + that is why these terms get full subsections, instead of just external references. + + VP: DONE208: Downgrade the last sentence from draft to e-mail? + +{:/comment} + ### SUT Defined in Section 3.1.2 of [RFC2285] as follows. Definition: -The collective set of network devices to which stimulus is offered +  +: The collective set of network devices to which stimulus is offered as a single entity and response measured. Discussion: -An SUT consisting of a single network device is also allowed. +  +: An SUT consisting of a single network device is allowed by this definition. + {::comment} - [MB58]: Do we need to include this? - I would only introduce deviation from bas specs. + MB58: Do we need to include this? + I would only introduce deviation from base specs. - [VP]: Ok on deviation, not sure on base definition. + VP: Ok on deviation, not sure on base definition. - [MK]: We do need to include this, as the SUT and DUT terms are used - repeatedly and are fundamental to understanding this - specification. + MK: We do need to include this, as the SUT and DUT terms are used + repeatedly and are fundamental to understanding this + specification. + + VP: Edited. {:/comment} -In software-based networking SUT may comprise multitude of + +  +: In software-based networking SUT may comprise multitude of networking applications and the entire host hardware and software execution environment. +  +: SUT is the only entity that can be benchmarked directly, +even though only the performance of some sub-components are of interest. + ### DUT Defined in Section 3.1.1 of [RFC2285] as follows. Definition: -The network forwarding device +  +: The network forwarding device +to which stimulus is offered and response measured. + {::comment} - [MB59]: This reasons about "device", should we say that we extends this to "function"? + MB59: This reasons about "device", should we say that we extends this to "function"? - [VP]: Yes. Extend discussion. If device requires medium/cables, + VP: Yes. Extend discussion. If device requires medium/cables, function can be working with something software-like (packet vectors, shared memory regions). - [MK]: added text covering this. - [MK]: Edited. + MK: added text covering this. + MK: Edited. {:/comment} -to which stimulus is offered and response measured. Discussion: -DUT, as a sub-component of SUT, is only indirectly mentioned in +  +: Contrary to SUT, the DUT stimulus and response are frequently +initiated and observed only indirectly, on different parts of SUT. + +  +: DUT, as a sub-component of SUT, is only indirectly mentioned in MLRsearch Specification, but is of key relevance for its motivation. The device can represent a software-based networking functions running on commodity x86/ARM CPUs (vs purpose-built ASIC / NPU / FPGA). + {::comment} - [MB60]: Idem as SUT + MB60: Idem as SUT - [VP]: Yes. + VP: Yes. - [MK]: See my note re SUT. + MK: See my note re SUT. {:/comment} +  +: A well-designed SUTs should have the primary DUT as their performance bottleneck. +The ways to achieve that are outside of MLRsearch Specification scope. + ### Trial A trial is the part of the test described in Section 23 of [RFC2544]. Definition: - A particular test consists of multiple trials. Each trial returns - one piece of information, for example the loss rate at a particular - input frame rate. Each trial consists of a number of phases: +  +: A particular test consists of multiple trials. Each trial returns +one piece of information, for example the loss rate at a particular +input frame rate. Each trial consists of a number of phases: - a) If the DUT is a router, send the routing update to the "input" - port and pause two seconds to be sure that the routing has settled. +  +: a) If the DUT is a router, send the routing update to the "input" +port and pause two seconds to be sure that the routing has settled. - b) Send the "learning frames" to the "output" port and wait 2 - seconds to be sure that the learning has settled. Bridge learning - frames are frames with source addresses that are the same as the - destination addresses used by the test frames. Learning frames for - other protocols are used to prime the address resolution tables in - the DUT. The formats of the learning frame that should be used are - shown in the Test Frame Formats document. +  +: b) Send the "learning frames" to the "output" port and wait 2 +seconds to be sure that the learning has settled. Bridge learning +frames are frames with source addresses that are the same as the +destination addresses used by the test frames. Learning frames for +other protocols are used to prime the address resolution tables in +the DUT. The formats of the learning frame that should be used are +shown in the Test Frame Formats document. - c) Run the test trial. +  +: c) Run the test trial. - d) Wait for two seconds for any residual frames to be received. +  +: d) Wait for two seconds for any residual frames to be received. - e) Wait for at least five seconds for the DUT to restabilize. +  +: e) Wait for at least five seconds for the DUT to restabilize. Discussion: -The traffic is sent only in phase c) and received in phases c) and d). +  +: The traffic is sent only in phase c) and received in phases c) and d). + +  +: Trials are the only stimuli the SUT is expected to experience during the Search. -The definition describes some traits, and it is not clear whether all of them -are required, or some of them are only recommended. +  +: In some discussion paragraphs, it is useful to consider the traffic +as sent and received by a tester, as implicitly defined +in Section 6 of [RFC2544]. -Trials are the only stimuli the SUT is expected to experience during the Search. {::comment} - [MB61]: Is there any aspect new to MLRS? + MB61: Is there any aspect new to MLRS? - [VP]: No, make clear. + VP: No, make clear. - [MK]: Yes, it is covered in detail in the following sections. The + MK: Yes, it is covered in detail in the following sections. The important part in this section, apart from quoting the original definition, is the discussion part, that sets the convention of how deviations from the original definition are captured in this - document.. + document. {:/comment} +  +: The definition describes some traits, not using capitalized verbs +to signify strength of the requirements. For the purposes of the MLRsearch Specification, -it is allowed +the test procedure MAY deviate from the [RFC2544] description, +but any such deviation MUST be described explicitly in the Test Report. +It is still RECOMMENDED to not deviate from the description, +as any deviation weakens comparability. + {::comment} - [MB62]: Not a normative language + MB62: Not a normative language + + VP: Reformulate. - [VP]: Reformulate. + MK: ok. changed from ALLOWED to allowed. is anything else needed? + MK: Edited. - [MK]: ok. changed from ALLOWED to allowed. is anything else needed? - [MK]: Edited. + VP: I feel this is important. Not only as a notable deviation from RFC 2544, + but also as an example of normative language usage. + Where RFC 2544 says you MUST do A or you CANNOT do B, + MLRsearch may say there are specific conditions where you do not have to do A or can de B. + Med had few comments like "since there is exception, the requirement is not universal", + and I say "there are clear conditions, the requirement is universal if the conditions are satisfied". + + VP: Contruct appropriate "conditional requirement" sentence. + DONE: Sentence improved. {:/comment} -for the test procedure to deviate from the [RFC2544] description, -but any such deviation MUST be described explicitly in the test report. +{::comment} + + VP: DONE: No time to mention "allowed if worse" principle in draft11 cycle. -In some discussion paragraphs, it is useful to consider the traffic -as sent and received by a tester, as implicitly defined -in Section 6 of [RFC2544]. +{:/comment} -An example of deviation from [RFC2544] is using shorter wait times, +  +: An example of deviation from [RFC2544] is using shorter wait times, compared to those described in phases a), b), d) and e). -The [RFC2544] document itself seems to be treating phase b) +  +: The [RFC2544] document itself seems to be treating phase b) as any type of configuration that cannot be configured only once (by Manager, before Search starts), as some crucial SUT state could time-out during the Search. -It is RECOMMENDED +It is RECOMMENDED to interpret the "learning frames" to be +any such time-sensitive per-trial configuration method, +with bridge MAC learning being only one possible examples. +Appendix C.2.4.1 of [RFC2544] lists another example: ARP with wait time of 5 seconds. + {::comment} - [MB63]: Not a normative term + MB63: Not a normative term + + VP: Ok. - [VP]: Ok. - - [MK]: ok. MB and MK edits applied. - [MK]: Edited. + MK: ok. MB and MK edits applied. + MK: Edited. {:/comment} -to interpret the "learning frames" to be -any such time-sensitive per-trial configuration method, -with bridge MAC learning being only one possibe example. -Appendix C.2.4.1 of [RFC2544] lists another example: ARP with wait time of 5 seconds. + +{::comment} + + VP: DONE: Emphasize that this is a single trial. + Any recurring tests count as separate trials, + because they give different results. + +{:/comment} + +  +: Some methodologies describe recurring tests. +If those are based on Trials, they are treated as multiple independent Trials. ## Trial Terms +{::comment} + + WONTFIX209: Separate short descriptions from further discussions. Everywhere. + + Too late for draft-11. + +{:/comment} + This section defines new and redefine existing terms for quantities relevant as inputs or outputs of a Trial, as used by the Measurer component. -This includes also any derived quantities related to one trial result. +This includes also any derived quantities related to results of one Trial. ### Trial Duration Definition: -Trial Duration is the intended duration of the phase c) of a Trial. -The value MUST be positive. +  +: Trial Duration is the intended duration of the phase c) of a Trial. + {::comment} - [MB64]: Does this cover also recurrences? + MB64: Does this cover also recurrences? See, e.g., draft-ietf-netmod-schedule-yang-05 - A Common YANG Data Model for Scheduling or draft-ietf-opsawg-scheduling-oam-tests-00? - [VP]: No, mention that probably already in trial definition. + VP: No, mention that probably already in trial definition. - [MK]: No, it does not cover recurrences as specified in above two + MK: No, it does not cover recurrences as specified in above two drafts, as it does involve scheduled events. + VP: Created comment block at appropriate subsections. + {:/comment} Discussion: -While any positive real value may be provided, some Measurer +  +: The value MUST be positive. + +  +: While any positive real value may be provided, some Measurer implementations MAY limit possible values, e.g., by rounding down to -nearest integer in seconds. In that case, it is RECOMMENDED to give -such inputs to the Controller so that the Controller only uses +nearest integer in seconds. In that case, it is RECOMMENDED to give +such inputs to the Controller so that the Controller +only uses the accepted values. {::comment} - [MB65]: To? + MB65: To? - [VP]: Reformulate. + VP: Reformulate. - [MK]: Edited “proposes” => “uses”. - [MK]: Edited. + MK: Edited "proposes" => "uses". + MK: Edited. {:/comment} -the accepted values. ### Trial Load Definition: -Trial Load is the per-interface Intended Load for a Trial. +  +: Trial Load is the per-interface Intended Load for a Trial. Discussion: -For test report purposes, it is assumed that this is a constant load by default, +  +: Trial Load is equivalent to the quantities defined +as constant load (Section 3.4 of [RFC1242]), +data rate (Section 14 of [RFC2544]), +and Intended Load (Section 3.5.1 of [RFC2285]), +in the sense that all three definitions specify that this value +applies to one (input or output) interface. + +  +: For specification purposes, it is assumed that this is a constant load by default, as specified in Section 3.4 of [RFC1242]). +Informally, Traffic Load is a single number that can "scale" any traffic pattern +as long as the intuition of load intended against a single interface can be applied. + +{::comment} + + MB66: Please fix all similar ones in the doc + + VP: Ok. + + MK: ok. fixed only here for now. + MK: DONE fix everywhere. + MK: Edited. + +{:/comment} + +  +: It MAY be possible to use a Trial Load value to describe a non-constant traffic +(using average load when the traffic consists of repeated bursts of frames +e.g., as suggested in Section 21 of [RFC2544]). +In the case of a non-constant load, the Test Report +MUST explicitly mention how exactly non-constant the traffic is +and how it reacts to Traffic Load value. +But the rest of the MLRsearch Specification assumes that is not the case, +to avoid discussing corner cases (e.g., which values are possible within medium limitations). + +  +: Similarly, traffic patterns where different interfaces are subject to different loads +MAY be described by a single Trial Load value (e.g. using largest load among interfaces), +but again the Test Report MUST explicitly describe how the traffic pattern +reacts to Traffic Load value, +and this specification does not discuss all the implications of that approach. + +  +: In the common case of bidirectional traffic, as described in +Section 14. Bidirectional Traffic of [RFC2544], +Trial Load is the data rate per direction, half of aggregate data rate. + +  +: Traffic patterns where a single Trial Load does not describe their scaling +cannot be used for MLRsearch benchmarks. + +{::comment} + + VP: DONE: Put bursty and other non-constant loads outside of the scope? + +{:/comment} + {::comment} - [MB66]: Please fix all similar ones in the doc + VP: DONE: What about multiple-interface loads if not equal among interfaces? - [VP]: Ok. - - [MK]: ok. fixed only here for now. - [MK]: TODO fix everywhere. - [MK]: Edited. + VP: DONE: What about interfaces with different medium capacity (bandwidth or pps)? {:/comment} -Trial Load MAY be an average load performed with steady state traffic or -with repeated bursts of frames {::comment} - [MB67]: Example of an example. :) Please reword. + MB67: Example of an example. :) Please reword. - [VP]: Ok. + VP: Ok. - [MK]: Edited. + MK: Edited. {:/comment} -e.g., as suggested in Section 21 of [RFC2544]. -In the case of a non-constant load, the test report -MUST explicitly mention how exactly non-constant the traffic is. + {::comment} - [MB68]: Can we also cover load percentiles? + MB68: Can we also cover load percentiles? The avg may not be representative to stress functions with anti-ddos guards, for example. - [VP]: Not here. The average woks with aggregate counters used in loss definition. + VP: Not here. The average woks with aggregate counters used in loss definition. Maybe discuss anti-ddos in Traffic Profile subsection. - [MK]: Definition of burst traffic profiles is out of scope. + MK: Definition of burst traffic profiles is out of scope. + + VP: DONE: Re-check the current text. {:/comment} -Trial Load is equivalent to the quantities defined -as constant load (Section 3.4 of [RFC1242]), -data rate (Section 14 of [RFC2544]), -and Intended Load (Section 3.5.1 of [RFC2285]), -in the sense that all three definitions specify that this value -applies to one (input or output) interface. +  +: Similarly to Trial Duration, some Measurers MAY limit the possible values +of Trial Load. Contrary to Trial Duration, +documenting such behavior in the test report is OPTIONAL. +This is because the load differences are negligible (and frequently +undocumented) in practice. -Similarly to Trial Duration, some Measurers MAY limit the possible values -of trial load. Contrary to trial duration, {::comment} - [MB69]: Inappropriate use of normative language + MB69: Inappropriate use of normative language - [VP]: Maybe disagree? + VP: Maybe disagree? Reformulate other parts to stress test report is subject to requirements. - - [MK]: Edited. + + MK: Edited. {:/comment} -documenting such behavior in the test report is OPTIONAL. -This is because the load differences are negligible (and frequently -undocumented) in practice. -It is allowed to combine Trial Load and Trial Duration values in a way +  +: The Controller MAY select Trial Load and Trial Duration values in a way that would not be possible to achieve using any integer number of data frames. -If a particular Trial Load value is not tied to a single Trial, +{::comment} + + VP: DONE: Use normative MAY somewhere. + +{:/comment} + +  +: If a particular Trial Load value is not tied to a single Trial, e.g., if there are no Trials yet or if there are multiple Trials, this document uses a shorthand **Load**. -The test report MAY present the aggregate load across multiple +  +: The test report MAY present the aggregate load across multiple interfaces, treating it as the same quantity expressed using different -units. Each reported Trial Load value MUST state unambiguously whether +units. Each reported Trial Load value MUST state unambiguously whether it refers to (i) a single interface, (ii) a specified subset of interfaces (e.g., such as all logical interfaces mapped to one physical port), or (iii) the total across every interface. For any aggregate @@ -1556,47 +2033,61 @@ links the per-interface and multi-interface load values. {::comment} - [MB70]: The causality effect may not be evident for the subset case, at least. + MB70: The causality effect may not be evident for the subset case, at least. - [VP]: Reformulate. + VP: Reformulate. - [MK]: Edited. + MK: Edited. {:/comment} -The per-interface value remains the primary one, consistent -with prevailing practice in [RFC 1242], [RFC 2544], and [RFC 2285]. + +  +: The per-interface value remains the primary unit, consistent +with prevailing practice in [RFC1242], [RFC2544], and [RFC2285]. + {::comment} - [MB71]: Which ones? + MB71: Which ones? - [VP]: List the common examples. + VP: List the common examples. - [MK]: Edited. + MK: Edited. {:/comment} -The last paragraph also applies to other terms related to Load. +  +: The last paragraph also applies to other terms related to Load. + +  +: For example, tests with symmetric bidirectional traffic +can report load-related values as "bidirectional load" +(double of "unidirectional load"). ### Trial Input Definition: -Trial Input is a composite quantity, consisting of two attributes: +  +: Trial Input is a composite quantity, consisting of two attributes: Trial Duration and Trial Load. Discussion: -When talking about multiple Trials, it is common to say "Trial Inputs" +  +: When talking about multiple Trials, it is common to say "Trial Inputs" to denote all corresponding Trial Input instances. -A Trial Input instance acts as the input for one call of the Measurer component. +  +: A Trial Input instance acts as the input for one call of the Measurer component. -Contrary to other composite quantities, MLRsearch Implementations -MUST NOT add optional attributes here. +  +: Contrary to other composite quantities, MLRsearch implementations +MUST NOT add optional attributes into Trial Input. This improves interoperability between various implementations of a Controller and a Measurer. -Note that both attributes are **intended** quantities, +  +: Note that both attributes are **intended** quantities, as only those can be fully controlled by the Controller. The actual offered quantities, as realized by the Measurer, can be different (and must be different if not multiplying into integer number of frames), @@ -1607,51 +2098,71 @@ outside of the scope of this document. Definition: -Traffic Profile is a composite quantity containing +  +: Traffic Profile is a composite quantity containing all attributes other than Trial Load and Trial Duration, that are needed for unique determination of the Trial to be performed. Discussion: -All the attributes are assumed to be constant during the search, +  +: All the attributes are assumed to be constant during the Search, and the composite is configured on the Measurer by the Manager before the Search starts. This is why the traffic profile is not part of the Trial Input. -Therefore, implementations of the Manager and the Measurer +  +: Specification of traffic properties included in the Traffic Profile is +the responsibility of the Manager, but the specific configuration mechanisms +are outside of the scope of this docunment. + +  +: Informally, implementations of the Manager and the Measurer must be aware of their common set of capabilities, -{::comment} +so that Traffic Profile instance uniquely defines the traffic during the Search. +Typically, Manager and Measurer implementations are tightly integrated. - [MB72]: Can we provide an example how to make that? +  +: Integration efforts between independent Manager and Measurer implementations +are outside of the scope of this document. +An example standardization effort is [Vassilev], +a draft at the time of writing. + +{::comment} - [VP]: Nope. Say it is an integration effort. + VP: DONE210: Mention the YANG draft as a possible avenue? - [MK]: Edited. {:/comment} -so that Traffic Profile -instance uniquely defines the traffic during the Search making the Manager and the Measurer simple to integrate. -None of those capabilities -have to be known by the Controller implementations. -Specification of traffic properties included in the Traffic Profile is -out of scope of this document. +{::comment} + + MB72: Can we provide an example how to make that? + + VP: Nope. Say it is an integration effort. + + MK: Edited. + + VP: DONE. + +{:/comment} {::comment} - [MB73]: This is too vague. Unless we reword top better reflect the requirement, - I don’t think we can use the normative language here + MB73: This is too vague. Unless we reword top better reflect the requirement, + I don't think we can use the normative language here - [VP]: Reformulate. + VP: Reformulate. - [MK]: Edited. + MK: Edited. {:/comment} -Examples of traffic properties include: + +  +: Examples of traffic properties include: - Data link frame size - Fixed sizes as listed in Section 3.5 of [RFC1242] and in Section 9 of [RFC2544] - - mixed sizes as defined in [RFC6985] "IMIX Genome: Specification of - Variable Packet Size for Additional Testing" + - IMIX mixed sizes as defined in [RFC6985] - Frame formats and protocol addresses - Section 8, 12 and Appendix C of [RFC2544] - Symmetric bidirectional traffic @@ -1659,35 +2170,49 @@ Examples of traffic properties include: {::comment} - [MB74]: Inappropriate use of normative language + MB74: Inappropriate use of normative language - [VP]: Reformulate. + VP: Reformulate. - [MK]: Edited. + MK: Edited. -{:/comment +{:/comment} {::comment} - [MB75]: Idem as above. MUST is not appropriate here. + MB75: Idem as above. MUST is not appropriate here. + + VP: Reformulate. - [VP]: Reformulate. + MK: Edited. - [MK]: Edited. -{:/comment +{:/comment} -Other traffic properties that need to be somehow specified in Traffic -Profile, if they apply to the test scenario, include: +  +: Other traffic properties that need to be somehow specified +in Traffic Profile, and MUST be mentioned in Test Report +if they apply to the benchmark, include: -- bidirectional traffic from Section 14 of [RFC2544], +  +: - bidirectional traffic from Section 14 of [RFC2544], - fully meshed traffic from Section 3.3.3 of [RFC2285], - modifiers from Section 11 of [RFC2544]. +- IP version mixing from Section 5.3 of [RFC8219]. + +{::comment} + + VP: Multiple traffic profiles (at least frame sizes) in RFC2544, + this is about single SUT+config+profile benchmark. + DONE: I thihnk the current sentences are good enough. + +{:/comment} ### Trial Forwarding Ratio Definition: -The Trial Forwarding Ratio is a dimensionless floating point value. +  +: The Trial Forwarding Ratio is a dimensionless floating point value. It MUST range between 0.0 and 1.0, both inclusive. It is calculated by dividing the number of frames successfully forwarded by the SUT @@ -1695,12 +2220,16 @@ by the total number of frames expected to be forwarded during the trial. Discussion: -For most Traffic Profiles, "expected to be forwarded" means +  +: For most Traffic Profiles, "expected to be forwarded" means "intended to get received by SUT from tester". -Only if this is not the case, the test report SHOULD describe the Traffic Profile +This SHOULD be the default interpretation. +Only if this is not the case, the test report MUST describe the Traffic Profile +in a detail sufficient to imply how Trial Forwarding Ratio should be calculated. + {::comment} - [MB76]: MUST is an absolute requirement (i.e., there is no exception): + MB76: MUST is an absolute requirement (i.e., there is no exception): 1. MUST This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification. @@ -1710,199 +2239,260 @@ Only if this is not the case, the test report SHOULD describe the Traffic Profil implications must be understood and carefully weighed before choosing a different course. - [VP]: Reformulate. + VP: Reformulate. + + MK: Edited. + + VP: DONE: Apply stricter conditional requirements. + E-mail: explain conditional requirements. - [MK]: Edited. {:/comment} -in a way that implies how Trial Forwarding Ratio should be calculated. -Trial Forwarding Ratio MAY be expressed in other units +  +: Trial Forwarding Ratio MAY be expressed in other units (e.g., as a percentage) in the test report. -Note that, contrary to Load terms, frame counts used to compute +  +: Note that, contrary to Load terms, frame counts used to compute Trial Forwarding Ratio are generally aggregates over all SUT output interfaces, as most test procedures verify all outgoing frames. +The procedure for [RFC2544] Throughput counts received frames, +so implicitly it implies bidirectional counts for bidirectional traffic, +even though the final value is "rate" that is still per-interface. + +  +: For example, in a test with symmetric bidirectional traffic, +if one direction is forwarded without losses, but the opposite direction +does not forward at all, the Trial Forwarding Ratio would be 0.5 (50%). + {::comment} - [MB77]: Should we call for more granularity to be provided/characterized? + MB77: Should we call for more granularity to be provided/characterized? - [VP]: No, include sentence on why. + VP: No, include sentence on why. - [MK]: What is the granularity that is needed here? The test + MK: What is the granularity that is needed here? The test procedure is about testing SUT as a single system, not parts of it. + VP: DONE: Add the missing sentence. + {:/comment} -For example, in a test with symmetric bidirectional traffic, -if one direction is forwarded without losses, but the opposite direction -does not forward at all, the trial forwarding ratio would be 0.5 (50%). +  +: In future extensions, more general ways to compute Trial Forwarding Ratio +may be allowed, but the current MLRsearch Specification relies on this specific +averaged counters approach. ### Trial Loss Ratio Definition: - The Trial Loss Ratio is equal to one minus the Trial Forwarding Ratio. +  +: The Trial Loss Ratio is equal to one minus the Trial Forwarding Ratio. {::comment} - [MB78]: For all sections, please indent so that we separate the def/discussion vs. description + MB78: For all sections, please indent so that we separate the def/discussion vs. description - [VP]: Ok. + VP: Ok. - [MK]: Edited. Indented 2 spaces, will kramdown renderer take it? + MK: Edited. Indented 2 spaces, will kramdown renderer take it? -{:/comment} + VP: Applied the way from https://stackoverflow.com/a/59612110 instead. +{:/comment} Discussion: - 100% minus the Trial Forwarding Ratio, when expressed as a percentage. +  +: 100% minus the Trial Forwarding Ratio, when expressed as a percentage. - This is almost identical to Frame Loss Rate of [RFC1242](Section 3.6). - The only minor differences are that Trial Loss Ratio does not need to - be expressed as a percentage, and Trial Loss Ratio is explicitly - based on aggregate frame counts. +  +: This is almost identical to Frame Loss Rate of Section 3.6 of [RFC1242]. +The only minor differences are that Trial Loss Ratio does not need to +be expressed as a percentage, and Trial Loss Ratio is explicitly +based on averaged frame counts when more than one data stream is present. ### Trial Forwarding Rate Definition: -The Trial Forwarding Rate is a derived quantity, calculated by +  +: The Trial Forwarding Rate is a derived quantity, calculated by multiplying the Trial Load by the Trial Forwarding Ratio. Discussion: -This quantity is not identical -to the Forwarding Rate as defined in Section 3.6.1 of [RFC2285]. -Specifically, the latter is based on frame counts on one output interface only, -so each output interface can have different forwarding rate, -whereas the Trial Forwarding Rate is based on frame counts -aggregated over all SUT output interfaces, while still being a multiple of Load. +  +: This quantity differs from the Forwarding Rate described in Section +3.6.1 of [RFC2285]. Under the RFC 2285 method, each output interface is +measured separately, so every interface may report a distinct rate. The +Trial Forwarding Rate, by contrast, uses a single set of frame counts +and therefore yields one value that represents the whole system, +while still preserving the direct link to the per-interface load. + +  +: When the Traffic Profile is symmetric and bidirectional, as defined in +Section 14 of [RFC2544], the Trial Forwarding Rate is numerically equal +to the arithmetic average of the individual per-interface forwarding rates +that would be produced by the RFC 2285 procedure. -Consequently, for symmetric bidirectional Traffic Profiles (section 14 -of [RFC2544], {::comment} - [MB79]: Do we have an authoritative reference where this is defined? + MB79: Do we have an authoritative reference where this is defined? If not, please add an definition entry early in the terminology section. - [VP]: Add reference. + VP: Add reference. + + MK: Edited. Added reference to RFC2544. - [MK]: Edited. Added reference to RFC2544. {:/comment} -the Trial Forwarding Rate value is equal to the arithmetic average -of [RFC2285] Forwarding Rate values across all SUT output interfaces. {::comment} - [MB80]: Why both? + MB80: Why both? - [VP]: Add explanations to Traffic Profile subsection. + VP: Add explanations to Traffic Profile subsection. - [MK]: Edited. But shouldn't it say "sum of" instead of "arithmetic + MK: Edited. But shouldn't it say "sum of" instead of "arithmetic average"? Unless specified, Trial Forwarding Rate is an aggregate rate, not per interface, as it is representating capability of DUT/SUT not a subset of it associated with particular interface :) + + VP: DONE: Checked, it is average. + {:/comment} -Given that Trial Forwarding Rate is a quantity based on Load, -this quantity may be expressed using multi-interface values -in test report (e.g., as sum of per-interface forwarding rate values). +  +: For more complex traffic patterns, such as many-to-one as mentioned +in Section 3.3.2 Partially Meshed Traffic of [RFC2285], +the meaning of Trial Forwarding Rate is less straightforward. +For example, if two input interfaces receive one million frames per second each, +and a single interface outputs 1.4 million frames per second (fps), +Trial Load is 1 million fps, Trial Loss Ratio is 30%, +and Trial Forwarding Rate is 0.7 million fps. + +  +: Because this rate is anchored to the Load defined for one interface, +a test report MAY show it either as the single averaged figure just described, +or as the sum of the separate per-interface forwarding rates. +For the example above, the aggregate trial forwarding rate is 1.4 million fps. ### Trial Effective Duration Definition: -Trial Effective Duration is a time quantity related to the non-recurring trial, +  +: Trial Effective Duration is a time quantity related to a Trial, by default equal to the Trial Duration. + {::comment} - [MB81]: For the periodic/recurrences, does it cover only one recurrence + MB81: For the periodic/recurrences, does it cover only one recurrence or from start to last independent of in-between execution periods? - [VP]: Make sure Trial implies no recurrence. + VP: Make sure Trial implies no recurrence. - [MK]: Edited. BUT - Why do we need to state that. There is nothing in the text of + MK: Edited. BUT - Why do we need to state that. There is nothing in the text of Section 23 of RFC2544 and in above sections implying recurrences. Why then do we need to explicity say "no recurrence"? + + VP: DONE: After Trial is stable, simplify this sentence. + {:/comment} Discussion: -This is an optional feature. +  +: This is an optional feature. If the Measurer does not return any Trial Effective Duration value, the Controller MUST use the Trial Duration value instead. -Trial Effective Duration may be any positive time quantity +  +: Trial Effective Duration may be any positive time quantity +chosen by the Measurer to be used for time-based decisions in the Controller. + {::comment} - [MB82]: It is obvious, but should we say "positive"? + MB82: It is obvious, but should we say "positive"? - [VP]: Yes. + VP: Yes. + + MK: Edited. - [MK]: Edited. {:/comment} -chosen by the Measurer -to be used for time-based decisions in the Controller. -The test report MUST explain how the Measurer computes the returned +  +: The test report MUST explain how the Measurer computes the returned Trial Effective Duration values, if they are not always equal to the Trial Duration. -This feature can be beneficial for users of testing equipment +  +: This feature can be beneficial for time-critical benchmarks +designed to manage the overall search duration, +rather than solely the traffic portion of it. +An approach is to measure the duration of the whole trial (including all wait times) +and use that as the Trial Effective Duration. + {::comment} - [MB83]: To be defined early in the terminology section + MB83: To be defined early in the terminology section + + VP: Ok. - [VP]: Ok. + MK: Edited. - [MK]: Edited. {:/comment} -who wish to manage the overall search duration, -rather than solely the traffic portion of it. -An approach is to measure the duration of the whole trial (including all wait times) -and use that as the Trial Effective Duration. -This is also a way for the Measurer to inform the Controller about +  +: This is also a way for the Measurer to inform the Controller about its surprising behavior, for example, when rounding the Trial Duration value. ### Trial Output Definition: -Trial Output is a composite quantity consisting of several attributes. - +  +: Trial Output is a composite quantity consisting of several attributes. Required attributes are: Trial Loss Ratio, Trial Effective Duration and Trial Forwarding Rate. Discussion: -When referring to more than one trial, plural term “Trial Outputs” is +  +: When referring to more than one trial, plural term "Trial Outputs" is used to collectively describe multiple Trial Output instances. -Implementations may provide additional optional attributes. +  +: Measurer implementations may provide additional optional attributes. The Controller implementations SHOULD +ignore values of any optional attribute +they are not familiar with, +except when passing Trial Output instances to the Manager. + {::comment} - [MB84]: As we have an exception + MB84: As we have an exception - [VP]: Reformulate. + VP: Reformulate. Conditional MUST has an authoritative prescribed condition, SHOULD gives implementers freedom to choose their own conditions. - [MK]: Edited. + MK: Edited. + + VP: Done: Stricter conditional requirements not needed. + {:/comment} -ignore values of any optional attribute -they are not familiar with, -except when passing Trial Output instances to the Manager. -Example of an optional attribute: +  +: Example of an optional attribute: The aggregate number of frames expected to be forwarded during the trial, especially if it is not (a rounded-down value) implied by Trial Load and Trial Duration. -While Section 3.5.2 of [RFC2285] requires the Offered Load value +  +: While Section 3.5.2 of [RFC2285] requires the Offered Load value to be reported for forwarding rate measurements, it is not required in MLRsearch Specification, as search results do not depend on it. @@ -1911,29 +2501,33 @@ as search results do not depend on it. Definition: -Trial Result is a composite quantity, +  +: Trial Result is a composite quantity, consisting of the Trial Input and the Trial Output. Discussion: -When referring to more than one trial, plural term “Trial Results” is +  +: When referring to more than one trial, plural term "Trial Results" is used to collectively describe multiple Trial Result instances. -While implementations SHOULD NOT include additional attributes -with independent values, {::comment} - [MB85]: Can we include a short sentence to explain the risk if not followed? + While Controller implementations SHOULD NOT include additional attributes + with independent values, they MAY include derived quantities. + + MB85: Can we include a short sentence to explain the risk if not followed? - [VP]: Now I think even SHOULD NOT is too strong. Either way, reformulate. + VP: Now I think even SHOULD NOT is too strong. Either way, reformulate. - [MK]: For Vratko. Isn't this already covered in Trial Output? What + MK: For Vratko. Isn't this already covered in Trial Output? What other optional attributes are applicable here, give examples? Otherwise it's too abstract, open-ended, ambiguous and so on ... Many other blue-sky and hand-wavy adjectives come to my mind :) + VP: DONE: Deleted + {:/comment} -they MAY include derived quantities. ## Goal Terms @@ -1948,7 +2542,7 @@ are necessarily vague, as their fundamental meaning is to act as coefficients in formulas for Controller Output, which are not defined yet. The discussions in this section relate the attributes to concepts mentioned in Section -[Identified Problems](#identified-problems), but even these discussion +[Overview of RFC 2544 Problems](#overview-of-rfc-2544-problems), but even these discussion paragraphs are short, informal, and mostly referencing later sections, where the impact on search results is discussed after introducing the complete set of auxiliary terms. @@ -1957,26 +2551,32 @@ the complete set of auxiliary terms. Definition: -Minimal value for Trial Duration that must be reached. +  +: Minimal value for Trial Duration that must be reached. The value MUST be positive. Discussion: -Certain trials must reach this minimum duration before a load can be +  +: Certain trials must reach this minimum duration before a load can be classified as a lower bound. {::comment} - [MB86]: I don’t parse this. + MB86: I don't parse this. + + VP: Reformulate. - [VP]: Reformulate. + MK: Edited. - [MK]: Edited. -{:/comment -The Controller may choose shorter durations, +{:/comment} + +  +: The Controller may choose shorter durations, results of those may be enough for classification as an Upper Bound. -It is RECOMMENDED for all search goals to share the same +  +: It is RECOMMENDED for all search goals to share the same Goal Final Trial Duration value. Otherwise, Trial Duration values larger than the Goal Final Trial Duration may occur, weakening the assumptions the [Load Classification Logic](#load-classification-logic) is based on. @@ -1985,36 +2585,44 @@ the [Load Classification Logic](#load-classification-logic) is based on. Definition: -A threshold value for a particular sum of Trial Effective Duration values. +  +: A threshold value for a particular sum of Trial Effective Duration values. The value MUST be positive. + {::comment} - [MB87]: I like this, but we should be consistent + MB87: I like this, but we should be consistent and mention it when appropriate for all other metrics - [VP]: Ok. Check everywhere. + VP: Ok. Check everywhere. - [MK]: Checked all subsections under Goal Terms and Trial Terms. + MK: Checked all subsections under Goal Terms and Trial Terms. Applied as appropriate. + {:/comment} Discussion: -Informally, this prescribes the sufficient number of trials performed +  +: Informally, this prescribes the sufficient number of trials performed at a specific Trial Load and Goal Final Trial Duration during the search. -If the Goal Duration Sum is larger than the Goal Final Trial Duration, +  +: If the Goal Duration Sum is larger than the Goal Final Trial Duration, multiple trials may be needed to be performed at the same load. -Refer to Section [MLRsearch Compliant with TST009](#mlrsearch-compliant-with-tst009) +  +: Refer to Section [MLRsearch Compliant with TST009](#mlrsearch-compliant-with-tst009) for an example where the possibility of multiple trials at the same load is intended. -A Goal Duration Sum value shorter than the Goal Final Trial Duration +  +: A Goal Duration Sum value shorter than the Goal Final Trial Duration (of the same goal) could save some search time, but is NOT RECOMMENDED, as the time savings come at the cost of decreased repeatability. -In practice, the Search can spend less than Goal Duration Sum measuring +  +: In practice, the Search can spend less than Goal Duration Sum measuring a Load value when the results are particularly one-sided, but also, the Search can spend more than Goal Duration Sum measuring a Load when the results are balanced and include @@ -2024,18 +2632,22 @@ trials shorter than Goal Final Trial Duration. Definition: -A threshold value for Trial Loss Ratio values. +  +: A threshold value for Trial Loss Ratio values. The value MUST be non-negative and smaller than one. Discussion: -A trial with Trial Loss Ratio larger than this value +  +: A trial with Trial Loss Ratio larger than this value signals the SUT may be unable to process this Trial Load well enough. -See [Throughput with Non-Zero Loss](#throughput-with-non-zero-loss) +  +: See [Throughput with Non-Zero Loss](#throughput-with-non-zero-loss) for reasons why users may want to set this value above zero. -Since multiple trials may be needed for one Load value, +  +: Since multiple trials may be needed for one Load value, the Load Classification may be more complicated than mere comparison of Trial Loss Ratio to Goal Loss Ratio. @@ -2043,72 +2655,89 @@ of Trial Loss Ratio to Goal Loss Ratio. Definition: -A threshold value for a particular ratio of sums +  +: A threshold value for a particular ratio of sums of Trial Effective Duration values. The value MUST be non-negative and smaller than one. Discussion: -Informally, up to this proportion of Trial Results +  +: Informally, up to this proportion of Trial Results with Trial Loss Ratio above Goal Loss Ratio is tolerated at a Lower Bound. This is the full impact if every Trial was measured at Goal Final Trial Duration. The actual full logic is more complicated, as shorter Trials are allowed. -For explainability reasons, the RECOMMENDED value for exceed ratio is 0.5 (50%), +  +: For explainability reasons, the RECOMMENDED value for exceed ratio is 0.5 (50%), as in practice that value leads to the smallest variation in overall Search Duration. -Refer to Section [Exceed Ratio and Multiple Trials](#exceed-ratio-and-multiple-trials) +  +: Refer to Section [Exceed Ratio and Multiple Trials](#exceed-ratio-and-multiple-trials) for more details. ### Goal Width Definition: -A threshold value for deciding whether two Trial Load values are close enough. +  +: A threshold value for deciding whether two Trial Load values are close enough. This is an OPTIONAL attribute. If present, the value MUST be positive. Discussion: -Informally, this acts as a stopping condition, +  +: Informally, this acts as a stopping condition, controlling the precision of the search result. The search stops if every goal has reached its precision. -Implementations without this attribute +  +: Implementations without this attribute MUST provide the Controller with other means to control the search stopping conditions. -Absolute load difference and relative load difference are two popular choices, +  +: Absolute load difference and relative load difference are two popular choices, but implementations may choose a different way to specify width. -The test report MUST make it clear what specific quantity is used as Goal Width. +  +: The test report MUST make it clear what specific quantity is used as Goal Width. -It is RECOMMENDED to express Goal Width as a relative difference and +  +: It is RECOMMENDED to express Goal Width as a relative difference and setting it to a value not lower than the Goal Loss Ratio. -Refer to Section +  +: Refer to Section [Generalized Throughput](#generalized-throughput) for more elaboration on the reasoning. ### Goal Initial Trial Duration Definition: -Minimal value for Trial Duration suggested to use for this goal. +  +: Minimal value for Trial Duration suggested to use for this goal. If present, this value MUST be positive. Discussion: -This is an example of an optional Search Goal. +  +: This is an example of an optional Search Goal. -A typical default value is equal to the Goal Final Trial Duration value. +  +: A typical default value is equal to the Goal Final Trial Duration value. -Informally, this is the shortest Trial Duration the Controller should select +  +: Informally, this is the shortest Trial Duration the Controller should select when focusing on the goal. -Note that shorter Trial Duration values can still be used, +  +: Note that shorter Trial Duration values can still be used, for example, selected while focusing on a different Search Goal. Such results MUST be still accepted by the Load Classification logic. -Goal Initial Trial Duration is a mechanism for a user to discourage +  +: Goal Initial Trial Duration is a mechanism for a user to discourage trials with Trial Duration values deemed as too unreliable for a particular SUT and a given Search Goal. @@ -2116,139 +2745,199 @@ for a particular SUT and a given Search Goal. Definition: -The Search Goal is a composite quantity consisting of several attributes, +  +: The Search Goal is a composite quantity consisting of several attributes, some of them are required. -Required attributes: Goal Final Trial Duration, Goal Duration Sum, Goal +  +: Required attributes: Goal Final Trial Duration, Goal Duration Sum, Goal Loss Ratio and Goal Exceed Ratio. {::comment} - [MB88]: Listing the attributes this way allows to easily classify mandatory/optional. + MB88: Listing the attributes this way allows to easily classify mandatory/optional. However, this not followed in previous. Please pick your favorite approach and use it in a consistent manner in the document. - [VP]: Use this longer way everywhere (also saying if no other attributes could be added). + VP: Use this longer way everywhere (also saying if no other attributes could be added). Tangent: Be more lenient on attributes internal to Controller? - [MK]: Edited this one. Applied to subsections in Trial Terms and - Goal Terms as appropriate. TODO check if more places need this. + MK: Edited this one. Applied to subsections in Trial Terms and + Goal Terms as appropriate. + + WONTFIX211 check if more places need this. + Too late. {:/comment} -Optional attributes: Goal Initial Trial Duration and Goal Width. +  +: Optional attributes: Goal Initial Trial Duration and Goal Width. Discussion: -Implementations MAY add their own attributes. +  +: Implementations MAY add their own attributes. Those additional attributes may be required by an implementation even if they are not required by MLRsearch Specification. However, it is RECOMMENDED for those implementations to support missing attributes by providing typical default values. + {::comment} - [MB89]: I guess I understand what is meant here, but I think this should be reworded + MB89: I guess I understand what is meant here, but I think this should be reworded to avoid what can be seen as inconsistency: do not support vs. support a default. - [VP]: Yes, probably worth a separate subsection, + VP: Yes, probably worth a separate subsection, distinguishing automated implementations from manual processes. - [MK]: No separate subsection. We should state that that the listed + MK: No separate subsection. We should state that that the listed optional attributes should have documented default values. But i do not like the open-ended "Implementations MAY add their own attributes." Either examples are added or this sentence is removed. + VP: DONE: Check if Specification does not mention "implementation". + {:/comment} -For example, implementations with Goal Initial Trial Durations +  +: For example, implementations with Goal Initial Trial Durations may also require users to specify "how quickly" should Trial Durations increase. -Refer to Section [Compliance](#compliance) for important Search Goal settings. +  +: Refer to Section [Compliance](#compliance) for important Search Goal settings. ### Controller Input Definition: -Controller Input is a composite quantity +  +: Controller Input is a composite quantity required as an input for the Controller. The only REQUIRED attribute is a list of Search Goal instances. Discussion: -MLRsearch Implementations MAY use additional attributes. +  +: MLRsearch implementations MAY use additional attributes. Those additional attributes may be required by an implementation even if they are not required by MLRsearch Specification. -Formally, the Manager does not apply any Controller configuration +  +: Formally, the Manager does not apply any Controller configuration apart from one Controller Input instance. -For example, Traffic Profile is configured on the Measurer by the Manager, +  +: For example, Traffic Profile is configured on the Measurer by the Manager, without explicit assistance of the Controller. -The order of Search Goal instances in a list SHOULD NOT +  +: The order of Search Goal instances in a list SHOULD NOT have a big impact on Controller Output, -but MLRsearch Implementations MAY base their behavior on the order +but MLRsearch implementations MAY base their behavior on the order of Search Goal instances in a list. #### Max Load Definition: -Max Load is an optional attribute of Controller Input. +  +: Max Load is an optional attribute of Controller Input. It is the maximal value the Controller is allowed to use for Trial Load values. Discussion: -Max Load is an example of an optional attribute (outside the list of Search Goals) -required by some implementations of MLRsearch. +{::comment} + + VP: DONE: Use MUST NOT to make Controller behavior constrained, conditionally? + +{:/comment} -In theory, each search goal could have its own Max Load value, +  +: Max Load is an example of an optional attribute (outside the list of Search Goals) +required by some implementations of MLRsearch. + +  +: If the Max Load value is provided, Controller MUST NOT select +Trial Load values larger than that value. + +  +: In theory, each search goal could have its own Max Load value, but as all Trial Results are possibly affecting all Search Goals, it makes more sense for a single Max Load value to apply to all Search Goal instances. -While Max Load is a frequently used configuration parameter, already governed +  +: While Max Load is a frequently used configuration parameter, already governed (as maximum frame rate) by [RFC2544] (Section 20) and (as maximum offered load) by [RFC2285] (Section 3.5.3), some implementations may detect or discover it (instead of requiring a user-supplied value). -In MLRsearch Specification, one reason for listing +  +: In MLRsearch Specification, one reason for listing the [Relevant Upper Bound](#relevant-upper-bound) as a required attribute is that it makes the search result independent of Max Load value. -Given that Max Load is a quantity based on Load, -it is allowed to express this quantity using multi-interface values -in test report, e.g., as sum of per-interface maximal loads. +  +: Given that Max Load is a quantity based on Load, +Test Report MAY express this quantity using multi-interface values, +as sum of per-interface maximal loads. + +{::comment} + + VP: DONE: Use MAY. + +{:/comment} #### Min Load Definition: -Min Load is an optional attribute of Controller Input. +  +: Min Load is an optional attribute of Controller Input. It is the minimal value the Controller is allowed to use for Trial Load values. Discussion: -Min Load is another example of an optional attribute +{::comment} + + VP: DONE: Use MUST NOT? + +{:/comment} + +  +: Min Load is another example of an optional attribute required by some implementations of MLRsearch. Similarly to Max Load, it makes more sense to prescribe one common value, as opposed to using a different value for each Search Goal. -Min Load is mainly useful for saving time by failing early, +  +: If the Min Load value is provided, Controller MUST NOT select +Trial Load values smaller than that value. + +  +: Min Load is mainly useful for saving time by failing early, arriving at an Irregular Goal Result when Min Load gets classified as an Upper Bound. -For implementations, it is RECOMMENDED to require Min Load to be non-zero +  +: For implementations, it is RECOMMENDED to require Min Load to be non-zero and large enough to result in at least one frame being forwarded even at shortest allowed Trial Duration, so that Trial Loss Ratio is always well-defined, and the implementation can apply relative Goal Width safely. -Given that Min Load is a quantity based on Load, -it is allowed to express this quantity using multi-interface values -in test report, e.g., as sum of per-interface minimal loads. +  +: Given that Min Load is a quantity based on Load, +Test Report MAY express this quantity using multi-interface values, +as sum of per-interface minimal loads. + +{::comment} + + VP: DONE: Use MAY. + +{:/comment} ## Auxiliary Terms @@ -2305,74 +2994,104 @@ during the Search is rare in practice. Definition: -A Load value is called an Upper Bound if and only if it is classified -as such by [Appendix A](#appendix-a-load-classification) +  +: A Load value is called an Upper Bound if and only if it is classified +as such by [Appendix A](#load-classification-code) algorithm for the given Search Goal at the current moment of the Search. Discussion: -In more detail, the set of all Trial Result instances +  +: In more detail, the set of all Trial Result instances performed so far at the Trial Load (and any Trial Duration) is certain to fail to uphold all the requirements of the given Search Goal, mainly the Goal Loss Ratio in combination with the Goal Exceed Ratio. In this context, "certain to fail" relates to any possible results within the time remaining till Goal Duration Sum. -One search goal can have multiple different Trial Load values +  +: One search goal can have multiple different Trial Load values classified as its Upper Bounds. While search progresses and more trials are measured, any load value can become an Upper Bound in principle. -Moreover, a load can stop being an Upper Bound, but that +  +: Moreover, a Load can stop being an Upper Bound, but that can only happen when more than Goal Duration Sum of trials are measured (e.g., because another Search Goal needs more trials at this load). -In practice, the load becomes a Lower Bound (Section 4.6.2.2), -and we say the previous Upper Bound got Invalidated. +Informally, the previous Upper Bound got invalidated. +In practice, the Load frequently becomes a [Lower Bound](#lower-bound) instead. + +{::comment} + + VP: DONE: Reformulate to avoid the "we" construct. + + VP: DONE: Do we need Invalidation as a separate term? I guess no. + +{:/comment} #### Lower Bound Definition: -A Load value is called a Lower Bound if and only if it is classified -as such by [Appendix A](#appendix-a-load-classification) +  +: A Load value is called a Lower Bound if and only if it is classified +as such by [Appendix A](#load-classification-code) algorithm for the given Search Goal at the current moment of the search. Discussion: -In more detail, the set of all Trial Result instances +  +: In more detail, the set of all Trial Result instances performed so far at the Trial Load (and any Trial Duration) is certain to uphold all the requirements of the given Search Goal, mainly the Goal Loss Ratio in combination with the Goal Exceed Ratio. Here "certain to uphold" relates to any possible results within the time remaining till Goal Duration Sum. -One search goal can have multiple different Trial Load values +  +: One search goal can have multiple different Trial Load values classified as its Lower Bounds. As search progresses and more trials are measured, any load value can become a Lower Bound in principle. -No load can be both an Upper Bound and a Lower Bound for the same Search goal +  +: No load can be both an Upper Bound and a Lower Bound for the same Search goal at the same time, but it is possible for a larger load to be a Lower Bound while a smaller load is an Upper Bound. -Moreover, a load can stop being a Lower Bound, but that +  +: Moreover, a Load can stop being a Lower Bound, but that can only happen when more than Goal Duration Sum of trials are measured (e.g., because another Search Goal needs more trials at this load). -In that case, the load becomes an Upper Bound, -and we say the previous Lower Bound got Invalidated. +Informally, the previous Lower Bound got invalidated. +In practice, the Load frequently becomes an [Upper Bound](#upper-bound) instead. + +{::comment} + + Same as in upper bound: + + VP: DONE: Reformulate to avoid the "we" construct. + + VP: DONE: Do we need Invalidation as a separate term? I guess no. + +{:/comment} #### Undecided Definition: -A Load value is called Undecided if it is currently +  +: A Load value is called Undecided if it is currently neither an Upper Bound nor a Lower Bound. Discussion: -A Load value that has not been measured so far is Undecided. +  +: A Load value that has not been measured so far is Undecided. -It is possible for a Load to transition from an Upper Bound to Undecided +  +: It is possible for a Load to transition from an Upper Bound to Undecided by adding Short Trials with Low-Loss results. That is yet another reason for users to avoid using Search Goal instances with different Goal Final Trial Duration values. @@ -2394,36 +3113,49 @@ has its corresponding Goal Result instance. Definition: -The Relevant Upper Bound is the smallest Trial Load value +  +: The Relevant Upper Bound is the smallest Trial Load value classified as an Upper Bound for a given Search Goal at the end of the Search. Discussion: -If no measured load had enough High-Loss Trials, +  +: If no measured load had enough High-Loss Trials, the Relevant Upper Bound MAY be non-existent. For example, when Max Load is classified as a Lower Bound. -Conversely, when Relevant Upper Bound does exist, +  +: Conversely, when Relevant Upper Bound does exist, it is not affected by Max Load value. -Given that Relevant Upper Bound is a quantity based on Load, -it is allowed to express this quantity using multi-interface values -in test report, e.g., as sum of per-interface loads. +  +: Given that Relevant Upper Bound is a quantity based on Load, +Test Report MAY express this quantity using multi-interface values, +as sum of per-interface loads. + +{::comment} + + VP: DONE: Use MAY. + +{:/comment} ### Relevant Lower Bound Definition: -The Relevant Lower Bound is the largest Trial Load value +  +: The Relevant Lower Bound is the largest Trial Load value among those smaller than the Relevant Upper Bound, that got classified as a Lower Bound for a given Search Goal at the end of the search. Discussion: -If no load had enough Low-Loss Trials, the Relevant Lower Bound +  +: If no load had enough Low-Loss Trials, the Relevant Lower Bound MAY be non-existent. -Strictly speaking, if the Relevant Upper Bound does not exist, +  +: Strictly speaking, if the Relevant Upper Bound does not exist, the Relevant Lower Bound also does not exist. In a typical case, Max Load is classified as a Lower Bound, making it impossible to increase the Load to continue the search @@ -2431,43 +3163,63 @@ for an Upper Bound. Thus, it is not clear whether a larger value would be found for a Relevant Lower Bound if larger Loads were possible. -Given that Relevant Lower Bound is a quantity based on Load, -it is allowed to express this quantity using multi-interface values -in test report, e.g., as sum of per-interface loads. +  +: Given that Relevant Lower Bound is a quantity based on Load, +Test Report MAY express this quantity using multi-interface values, +as sum of per-interface loads. + +{::comment} + + VP: DONE: Use MAY. + +{:/comment} ### Conditional Throughput Definition: -Conditional Throughput is a value computed at the Relevant Lower Bound +  +: Conditional Throughput is a value computed at the Relevant Lower Bound according to algorithm defined in -[Appendix B](#appendix-b-conditional-throughput). +[Appendix B](#conditional-throughput-code). Discussion: -The Relevant Lower Bound is defined only at the end of the Search, +  +: The Relevant Lower Bound is defined only at the end of the Search, and so is the Conditional Throughput. But the algorithm can be applied at any time on any Lower Bound load, so the final Conditional Throughput value may appear sooner than at the end of a Search. -Informally, the Conditional Throughput should be +  +: Informally, the Conditional Throughput should be a typical Trial Forwarding Rate, expected to be seen at the Relevant Lower Bound of a given Search Goal. -But frequently it is only a conservative estimate thereof, -as MLRsearch Implementations tend to stop measuring more Trials +  +: But frequently it is only a conservative estimate thereof, +as MLRsearch implementations tend to stop measuring more Trials as soon as they confirm the value cannot get worse than this estimate within the Goal Duration Sum. -This value is RECOMMENDED to be used when evaluating repeatability -and comparability of different MLRsearch Implementations. +  +: This value is RECOMMENDED to be used when evaluating repeatability +and comparability of different MLRsearch implementations. + +  +: Refer to Section [Generalized Throughput](#generalized-throughput) for more details. + +  +: Given that Conditional Throughput is a quantity based on Load, +Test Report MAY express this quantity using multi-interface values, +as sum of per-interface forwarding rates. -Refer to Section [Generalized Throughput](#generalized-throughput) for more details. +{::comment} + + VP: DONE: Use MAY. -Given that Conditional Throughput is a quantity based on Load, -it is allowed to express this quantity using multi-interface values -in test report, e.g., as sum of per-interface forwarding rates. +{:/comment} ### Goal Results @@ -2480,57 +3232,69 @@ need to be supported. Definition: -Regular Goal Result is a composite quantity consisting of several attributes. +  +: Regular Goal Result is a composite quantity consisting of several attributes. Relevant Upper Bound and Relevant Lower Bound are REQUIRED attributes. Conditional Throughput is a RECOMMENDED attribute. -Stopping conditions for the corresponding Search Goal MUST -be satisfied to produce a Regular Goal Result. {::comment} - [MB90]: To do what? I’m afraid we need to explicit the meaning here. + MB90: To do what? I'm afraid we need to explicit the meaning here. - [VP]: Yes, reformulate. + VP: Yes, reformulate. + + MK: Edited. - [MK]: Edited. {:/comment} -Discussion: {::comment} - [MB91]: Isn’t this redundant with listing the bounds as required in the previous definition? + MB91: Isn't this redundant with listing the bounds as required in the previous definition? - [VP]: Do we need separation between may-not-exist and must-exist quantities? + VP: Do we need separation between may-not-exist and must-exist quantities? Either way, reformulate. - [MK]: Deleted. Agree with Med - Sentence was redundant as already + MK: Deleted. Agree with Med - Sentence was redundant as already covered by text in definition "Relevant Upper Bound and Relevant Lower Bound are REQUIRED attributes." + VP: WONTFIX212: Re-check. + Too late. + {:/comment} -If an implementation offers Goal Width as a Search Goal attribute, -the distance between the Relevant Lower Bound -and the Relevant Upper Bound MUST NOT be larger than the Goal Width, +Discussion: -Implementations MAY add their own attributes. +  +: Implementations MAY add their own attributes. -Test report MUST display Relevant Lower Bound. +  +: Test report MUST display Relevant Lower Bound. Displaying Relevant Upper Bound is RECOMMENDED, especially if the implementation does not use Goal Width. -For stopping conditions refer to Sections [Goal Width](#goal-width) and +  +: In general, stopping conditions for the corresponding Search Goal MUST +be satisfied to produce a Regular Goal Result. +Specifically, if an implementation offers Goal Width as a Search Goal attribute, +the distance between the Relevant Lower Bound +and the Relevant Upper Bound MUST NOT be larger than the Goal Width. + +  +: For stopping conditions refer to Sections [Goal Width](#goal-width) and [Stopping Conditions and Precision](#stopping-conditions-and-precision). #### Irregular Goal Result Definition: -Irregular Goal Result is a composite quantity. No attributes are required. +  +: Irregular Goal Result is a composite quantity. No attributes are required. Discussion: -It is RECOMMENDED to report any useful quantity even if it does not +  +: It is RECOMMENDED to report any useful quantity even if it does not satisfy all the requirements. For example, if Max Load is classified as a Lower Bound, it is fine to report it as an "effective" Relevant Lower Bound (although not a real one, as that requires @@ -2538,55 +3302,65 @@ Relevant Upper Bound which does not exist in this case), and compute Conditional Throughput for it. In this case, only the missing Relevant Upper Bound signals this result instance is irregular. -Similarly, if both relevant bounds exist, it is RECOMMENDED +  +: Similarly, if both relevant bounds exist, it is RECOMMENDED to include them as Irregular Goal Result attributes, -and let the Manager decide if their distance is too far for users' purposes. +and let the Manager decide if their distance is too far for Test Report purposes. -If test report displays some Irregular Goal Result attribute values, +  +: If Test Report displays some Irregular Goal Result attribute values, they MUST be clearly marked as coming from irregular results. -The implementation MAY define additional attributes. +  +: The implementation MAY define additional attributes, +for example explicit flags for expected situations, so the Manager logic can be simpler. #### Goal Result Definition: -Goal Result is a composite quantity. +  +: Goal Result is a composite quantity. Each instance is either a Regular Goal Result or an Irregular Goal Result. Discussion: -The Manager MUST be able to distinguish whether the instance is regular or not. +  +: The Manager MUST be able to distinguish whether the instance is regular or not. ### Search Result Definition: -The Search Result is a single composite object +  +: The Search Result is a single composite object that maps each Search Goal instance to a corresponding Goal Result instance. Discussion: -As an alternative to mapping, the Search Result may be represented +  +: As an alternative to mapping, the Search Result may be represented as an ordered list of Goal Result instances that appears in the exact sequence of their corresponding Search Goal instances. -When the Search Result is expressed as a mapping, it MUST contain an +  +: When the Search Result is expressed as a mapping, it MUST contain an entry for every Search Goal instance supplied in the Controller Input. {::comment} - [MB92]: To what? + MB92: To what? - [VP]: Subsections on quantities and interfaces should mention equivalent representations. + VP: Subsections on quantities and interfaces should mention equivalent representations. Then reformulate this. - [MK]: Edited. First two paragraphs in Discussion changed to make it + MK: Edited. First two paragraphs in Discussion changed to make it clearer. {:/comment} -Identical Goal Result instances MAY be listed for different Search Goals, +  +: Identical Goal Result instances MAY be listed for different Search Goals, but their status as regular or irregular may be different. For example, if two goals differ only in Goal Width value, and the relevant bound values are close enough according to only one of them. @@ -2595,32 +3369,53 @@ and the relevant bound values are close enough according to only one of them. Definition: -The Controller Output is a composite quantity returned from the Controller +  +: The Controller Output is a composite quantity returned from the Controller to the Manager at the end of the search. The Search Result instance is its only required attribute. Discussion: -MLRsearch Implementation MAY return additional data in the Controller Output, +  +: MLRsearch implementation MAY return additional data in the Controller Output, e.g., number of trials performed and the total Search Duration. -## MLRsearch Architecture +{::comment} + + VP: DONE low priority: Regular end, irregular exit, user abort. + Should not need new text, review related MD comments. + Maybe differentiate abort conditions, or at least make them explicitly vague? + +{:/comment} + +{::comment} + + VP: DONE elsewhere: Emphasize one controller call gives one benchmark. + Any recurring tests count as independent benchmarks. + +{:/comment} + +## Architecture Terms MLRsearch architecture consists of three main system components: -the Manager, the Controller, and the Measurer, defined in the following -subsections. +the Manager, the Controller, and the Measurer. +The components were introduced in [Architecture Overview](#architecture-overview), +and the following subsections finalize their definitions +using terms from previous sections. {::comment} - [MB93]: I guess these should be introduced before the attributes as these components + MB93: I guess these should be introduced before the attributes as these components are used in the description. Please reconsider the flow of the document. - [VP]: Reformulate this to clarify overview introduced, this finalizes the definition. + VP: Reformulate this to clarify overview introduced, this finalizes the definition. - [MK]: Edited. And I disagree. Three components of the architecture + MK: Edited. And I disagree. Three components of the architecture are listed, with definitions following. I do not envisage any problem from the reader perspective. + VP: DONE: Added a sentence. + {:/comment} Note that the architecture also implies the presence of other components, @@ -2629,62 +3424,72 @@ such as the SUT and the tester (as a sub-component of the Measurer). Communication protocols and interfaces between components are left unspecified. For example, when MLRsearch Specification mentions "Controller calls Measurer", +it is possible that the Controller notifies the Manager +to call the Measurer indirectly instead. In doing so, the Measurer implementations +can be fully independent from the Controller implementations, +e.g., developed in different programming languages. + {::comment} - [MB94]: Aha, this answers a comment I made earlier :) - Let’s save cycles for other readers and move all this + MB94: Aha, this answers a comment I made earlier :) + Let's save cycles for other readers and move all this section early in the document. - [VP]: Hmm, maybe a subsection of overview? + VP: Hmm, maybe a subsection of overview? Definitely something needs to be moved around. - [MK]: Edited. And addressed the original concern. See my note at MB93. + MK: Edited. And addressed the original concern. See my note at MB93. + + VP: DONE: Overview got updated. {:/comment} -it is possible that the Controller notifies the Manager -to call the Measurer indirectly instead. In doing so, the Measurer Implementations -can be fully independent from the Controller implementations, -e.g., developed in different programming languages. ### Measurer Definition: -The Measurer is a functional element that when called +  +: The Measurer is a functional element that when called with a [Trial Input](#trial-input) instance, performs one [Trial ](#trial) and returns a [Trial Output](#trial-output) instance. Discussion: -This definition assumes the Measurer is already initialized. +  +: This definition assumes the Measurer is already initialized. In practice, there may be additional steps before the Search, e.g., when the Manager configures the traffic profile (either on the Measurer or on its tester sub-component directly) and performs a warm-up (if the tester or the test procedure requires one). -It is the responsibility of the Measurer implementation to uphold +  +: It is the responsibility of the Measurer implementation to uphold any requirements and assumptions present in MLRsearch Specification, e.g., Trial Forwarding Ratio not being larger than one. -Implementers have some freedom. +  +: Implementers have some freedom. For example, Section 10 of [RFC2544] gives some suggestions (but not requirements) related to duplicated or reordered frames. Implementations are RECOMMENDED to document their behavior related to such freedoms in as detailed a way as possible. -It is RECOMMENDED to benchmark the test equipment first, +  +: It is RECOMMENDED to benchmark the test equipment first, e.g., connect sender and receiver directly (without any SUT in the path), find a load value that guarantees the Offered Load is not too far from the Intended Load and use that value as the Max Load value. When testing the real SUT, it is RECOMMENDED to turn any severe deviation between the Intended Load and the Offered Load into increased Trial Loss Ratio. -Neither of the two recommendations are made into mandatory requirements, +  +: Neither of the two recommendations are made into mandatory requirements, because it is not easy to provide guidance about when the difference is severe enough, in a way that would be disentangled from other Measurer freedoms. -For a sample situation where the Offered Load cannot keep up +  +: For a sample situation where the Offered Load cannot keep up with the Intended Load, and the consequences on MLRsearch result, refer to Section [Hard Performance Limit](#hard-performance-limit). @@ -2692,7 +3497,8 @@ refer to Section [Hard Performance Limit](#hard-performance-limit). Definition: -The Controller is a functional element that, upon receiving a Controller +  +: The Controller is a functional element that, upon receiving a Controller Input instance, repeatedly generates Trial Input instances for the Measurer and collects the corresponding Trial Output instances. This cycle continues until the stopping conditions are met, at which point @@ -2701,24 +3507,27 @@ terminates. {::comment} - [MB95]: Till a stop? + MB95: Till a stop? - [VP]: Yes. + VP: Yes. - [MK]: Edited. It should be clear now. + MK: Edited. It should be clear now. {:/comment} Discussion: -Informally, the Controller has big freedom in selection of Trial Inputs, +  +: Informally, the Controller has big freedom in selection of Trial Inputs, and the implementations want to achieve all the Search Goals in the shortest average time. -The Controller's role in optimizing the overall Search Duration +  +: The Controller's role in optimizing the overall Search Duration distinguishes MLRsearch algorithms from simpler search procedures. -Informally, each implementation can have different stopping conditions. +  +: Informally, each implementation can have different stopping conditions. Goal Width is only one example. In practice, implementation details do not matter, as long as Goal Result instances are regular. @@ -2727,31 +3536,43 @@ as long as Goal Result instances are regular. Definition: -The Manager is a functional element that is reponsible for +  +: The Manager is a functional element that is reponsible for provisioning other components, calling a Controller component once, and for creating the test report following the reporting format as defined in Section 26 of [RFC2544]. Discussion: -The Manager initializes the SUT, the Measurer +  +: The Manager initializes the SUT, the Measurer (and the tester if independent from Measurer) with their intended configurations before calling the Controller. -Note that Section 7 of [RFC2544] already puts requirements on SUT setups: +  +: Note that Section 7 of [RFC2544] already puts requirements on SUT setups: + +  +: "It is expected that all of the tests will be run without changing the +configuration or setup of the DUT in any way other than that required +to do the specific test. For example, it is not acceptable to change +the size of frame handling buffers between tests of frame handling +rates or to disable all but one transport protocol when testing the +throughput of that protocol." - It is expected that all of the tests will be run without changing the - configuration or setup of the DUT in any way other than that required - to do the specific test. For example, it is not acceptable to change - the size of frame handling buffers between tests of frame handling - rates or to disable all but one transport protocol when testing the - throughput of that protocol. +{::comment} + + VP: DONE: Nested "definition list" does not work. Use quotes here? + +{:/comment} -It is REQUIRED for the test report to encompass all the SUT configuration +  +: It is REQUIRED for the test report to encompass all the SUT configuration details, including description of a "default" configuration common for most tests and configuration changes if required by a specific test. -For example, Section 5.1.1 of [RFC5180] recommends testing jumbo frames +  +: For example, Section 5.1.1 of [RFC5180] recommends testing jumbo frames if SUT can forward them, even though they are outside the scope of the 802.3 IEEE standard. In this case, it is acceptable for the SUT default configuration to not support jumbo frames, @@ -2760,10 +3581,12 @@ as the handling of jumbo frames typically has different packet buffer requirements and potentially higher processing overhead. Non-jumbo frame sizes should also be tested on the jumbo-enabled setup. -The Manager does not need to be able to tweak any Search Goal attributes, +  +: The Manager does not need to be able to tweak any Search Goal attributes, but it MUST report all applied attribute values even if not tweaked. -A “user” — human or automated — invokes the Manager once to launch a +  +: A "user" - human or automated - invokes the Manager once to launch a single Search and receive its report. Every new invocation is treated as a fresh, independent Search; how the system behaves across multiple calls (for example, combining or comparing their results) is explicitly @@ -2771,23 +3594,23 @@ out of scope for this document. {::comment} - [MB96]: This answers a comment I have earlier. + MB96: This answers a comment I have earlier. Please move all these details to be provided early. - [VP]: Yes (covered by earlier comments). + VP: Yes (covered by earlier comments). - [MK]: Yes - covered by earlier edits. + MK: Yes - covered by earlier edits. {:/comment} {::comment} - [MB97]: Should there be a mode where conditional calls are invoked? + MB97: Should there be a mode where conditional calls are invoked? Or more generally to instruct some dependency? - [VP]: Explain in earlier subsections, repeats are out of scope. + VP: Explain in earlier subsections, repeats are out of scope. - [MK]: Edited. It should be clear now that repeats are out of scope. + MK: Edited. It should be clear now that repeats are out of scope. {:/comment} @@ -2830,21 +3653,24 @@ unconditionally compliant with Section 24 of [RFC2544]. - Goal Duration Sum = 60 seconds - Goal Loss Ratio = 0% - Goal Exceed Ratio = 0% + {::comment} - [MB98]: Not related but triggered by this, + MB98: Not related but triggered by this, can we have at the end of the document a table with all the default values/recommended for the various attributes defined in the document? - [VP]: Maybe? Revisit later to see if we have enough data to warrant table format. + VP: Maybe? Revisit later to see if we have enough data to warrant table format. - [MK]: TODO. This is not a bad idea. A section that in summary table + MK: WONTFIX213. This is not a bad idea. A section that in summary table lists common usage cases with recommended settings e.g. RFC2544, TST009, FD.io CSIT, examples of SUTs with certain behaviour e.g. suspected periodic SUT disruption. It will make it more concrete to the reader and verify their understanding of the spec. + VP: I think too low priority for draft11. + {:/comment} Goal Loss Ratio and Goal Exceed Ratio attributes, @@ -2892,19 +3718,30 @@ Goal Duration Sum is twice as long as Goal Final Trial Duration, so third full-length trial is never needed. # Methodology Rationale and Design Considerations + {::comment} - [MB99]: Please consider that a more explicit title that reflects the content. + MB99: Please consider that a more explicit title that reflects the content. + + VP: Yes, but not sure what would be a better title yet. - [VP]: Yes, but not sure what would be a better title yet. + MK: Edited. Also updated opening paragraph to motivate the reader. - [MK]: Edited. Also updated opening paragraph to motivate the reader. {:/comment} +{::comment} + + Manual processes, automation, implementation as library,... + + DONE: Recheck specification minimizes user/iplementation discussions. + + DONE low priority: Add those discussions here somewhere is useful. + +{:/comment} This section explains the Why behind MLRsearch. Building on the normative specification in Section -[MLRsearch Specification] (#mlrsearch-specification), +[MLRsearch Specification](#mlrsearch-specification), it contrasts MLRsearch with the classic [RFC2544] single-ratio binary-search procedure and walks through the key design choices: binary-search mechanics, stopping-rule precision, @@ -2944,11 +3781,11 @@ Search Goal, and the difference between the bounds implies whether the result precision is achieved. Therefore, it is not necessary to report the specific stopping condition used. -MLRsearch Implementations may use Goal Width +MLRsearch implementations may use Goal Width to allow direct control of result precision and indirect control of the Search Duration. -Other MLRsearch Implementations may use different stopping conditions: +Other MLRsearch implementations may use different stopping conditions: for example based on the Search Duration, trading off precision control for duration control. @@ -2960,15 +3797,17 @@ and its variance. ## Loss Ratios and Loss Inversion The biggest + {::comment} - [MB100]: We don’t need to say it if it is obvious ;) + MB100: We don't need to say it if it is obvious ;) - [VP]: Reformulate. + VP: Reformulate. - [MK]: Edited. + MK: Edited. {:/comment} + difference between MLRsearch and [RFC2544] binary search is in the goals of the search. [RFC2544] has a single goal, based on classifying a single full-length trial @@ -2982,21 +3821,23 @@ Each bound in [RFC2544] simple binary search is "hard", in the sense that all further Trial Load values are smaller than any current upper bound and larger than any current lower bound. -This is also possible for MLRsearch Implementations, +This is also possible for MLRsearch implementations, when the search is started with only one Search Goal instance. ### Multiple Goals and Loss Inversion MLRsearch Specification + {::comment} - [MB101]: Specification? + MB101: Specification? - [VP]: Ok. + VP: Ok. - [MK]: Edited. + MK: Edited. {:/comment} + supports multiple Search Goals, making the search procedure more complicated compared to binary search with single goal, but most of the complications do not affect the final results much. @@ -3056,7 +3897,7 @@ This also implies that if the SUT tested (or the Traffic Generator used) needs a warm-up, it should be warmed up before starting the Search, otherwise the first few measurements could become unjustly limiting. -For MLRsearch Implementations, it means it is better to measure +For MLRsearch implementations, it means it is better to measure at smaller Loads first, so bounds found earlier are less likely to get invalidated later. @@ -3065,16 +3906,18 @@ to get invalidated later. The idea of performing multiple Trials at the same Trial Load comes from a model where some Trial Results (those with high Trial Loss Ratio) are affected by infrequent effects, causing unsatisfactory repeatability + {::comment} - [MB102]: Or other similar terms, but not poor thing. + MB102: Or other similar terms, but not poor thing. Please consider the same change in other parts of the document. - [VP]: Ok, search&replace. + VP: Ok, search&replace. - [MK]: Edited. Searched and replaced all with unsatisfactory, unacceptable. + MK: Edited. Searched and replaced all with unsatisfactory, unacceptable. {:/comment} + of [RFC2544] Throughput results. Refer to Section [DUT in SUT](#dut-in-sut) for a discussion about noiseful and noiseless ends of the SUT performance spectrum. @@ -3118,16 +3961,16 @@ heuristics and controls are left to the discretion of the implementer. {::comment} - [MB103]: We may say that how this is exposed to a user/manager is implmentation specific. + MB103: We may say that how this is exposed to a user/manager is implmentation specific. - [VP]: Earlier subsection should explain when discussing implementations. + VP: Earlier subsection should explain when discussing implementations. - [MK]: Edited. + MK: Edited. {:/comment} -While MLRsearch Implementations are free to use any logic to select -Trial Input values, comparability between MLRsearch Implementations +While MLRsearch implementations are free to use any logic to select +Trial Input values, comparability between MLRsearch implementations is only assured when the Load Classification logic handles any possible set of Trial Results in the same way. @@ -3161,11 +4004,11 @@ performance limit. {::comment} - [MB104]: Not sure to parse this. + MB104: Not sure to parse this. - [VP]: Reformulate. + VP: Reformulate. - [MK]: Edited. + MK: Edited. {:/comment} @@ -3177,21 +4020,23 @@ e.g., a specific frames-per-second limit on the NIC (a common occurrence). Those limitations should be known and provided as Max Load, Section [Max Load](#max-load). + {::comment} - [MB105]: We may say that some implementation may expose their capabilities + MB105: We may say that some implementation may expose their capabilities using IPFIX/YANG, but such exposure is out of scope. - [VP]: Add capability exposition to earlier implementation subsections. + VP: Add capability exposition to earlier implementation subsections. Reformulate this sentence to be specific to hard limits. - [MK]: Edited. Capability exposition of SUT and DUT is out of scope + MK: Edited. Capability exposition of SUT and DUT is out of scope of this document. Do we need to state it in the opening somewhere? COTS NICs do not support network configuration protocols, they are configured using vendor specific registers and associated kernel or userspace drivers. {:/comment} + But if Max Load is set larger than what the interface can receive or transmit, there will be a "hard limit" behavior observed in Trial Results. @@ -3200,24 +4045,25 @@ Max Load is larger, and the Goal Loss Ratio is 0.5%. If DUT has no additional losses, 0.5% Trial Loss Ratio will be achieved at Relevant Lower Bound of 100.5025 Mfps. -Reporting a throughput that exceeds the SUT’s verified hard limit is -counter-intuitive. Accordingly, the RFC 2544 throughput metric should -be generalized — rather than relying solely on the Relevant Lower -Bound — to reflect realistic, limit-aware performance. +Reporting a throughput that exceeds the SUT's verified hard limit is +counter-intuitive. Accordingly, the [RFC2544] Throughput metric should +be generalized - rather than relying solely on the Relevant Lower +Bound - to reflect realistic, limit-aware performance. {::comment} - [MK]: Edited. Above paragraph was not reading well. Following from + MK: Edited. Above paragraph was not reading well. Following from MB105 I have updated it further to motivate generalization of throughput. {:/comment} + MLRsearch defines one such generalization, the [Conditional Throughput](#conditional-throughput). It is the Trial Forwarding Rate from one of the Full-Length Trials performed at the Relevant Lower Bound. The algorithm to determine which trial exactly is in -[Appendix B: Conditional Throughput](#appendix-b-conditional-throughput). +[Appendix B](#conditional-throughput-code). In the hard limit example, 100.5025 Mfps Load will still have only 100.0 Mfps forwarding rate, nicely confirming the known limitation. @@ -3230,9 +4076,16 @@ values just due to DUT performance variability. By comparing the best case (all Relevant Lower Bound trials have zero loss) and the worst case (all Trial Loss Ratios at Relevant Lower Bound -are equal to the Goal Loss Ratio), we find the possible Conditional Throughput +are equal to the Goal Loss Ratio), +one can prove that Conditional Throughput values may have up to the Goal Loss Ratio relative difference. +{::comment} + + VP: DONE: Reformulate to avoid "we" construct. + +{:/comment} + Setting the Goal Width below the Goal Loss Ratio may cause the Conditional Throughput for a larger Goal Loss Ratio to become smaller than a Conditional Throughput for a goal with a lower Goal Loss Ratio, @@ -3242,7 +4095,7 @@ than the Goal Loss Ratio of the higher-loss Search Goal. Although Conditional Throughput can fluctuate from one run to the next, it still offers a more discriminating basis for comparison than the -Relevant Lower Bound — particularly when deterministic load selection +Relevant Lower Bound - particularly when deterministic load selection yields the same Lower Bound value across multiple runs. # MLRsearch Logic and Example @@ -3250,8 +4103,8 @@ yields the same Lower Bound value across multiple runs. This section uses informal language to describe two aspects of MLRsearch logic: Load Classification and Conditional Throughput, reflecting formal pseudocode representation provided in -[Appendix A: Load Classification](#appendix-a-load-classification) -and [Appendix B: Conditional Throughput](#appendix-b-conditional-throughput). +[Appendix A](#load-classification-code) +and [Appendix B](#conditional-throughput-code). This is followed by example search. The logic is equivalent but not identical to the pseudocode @@ -3264,109 +4117,111 @@ and uses more intuitive names for the intermediate values. Note: For explanation clarity variables are taged as (I)nput, (T)emporary, (O)utput. + {::comment} - [MB106]: Move this to the terminology/convention section + MB106: Move this to the terminology/convention section - [VP]: I do not think these flags fit into terminology. + VP: I do not think these flags fit into terminology. For this long list, maybe divide into sublists? - [MK]: I agree - this is does not belong to draft terminology + MK: I agree - this is does not belong to draft terminology section. And I agree, for readability we could split the long list into groups with meaningful headers. See my attempt to do so below. {:/comment} -### Collect Trial Results +- Collect Trial Results: -- Take all Trial Result instances (I) measured at a given load. + - Take all Trial Result instances (I) measured at a given load. -### Aggregate Trial Durations +- Aggregate Trial Durations: -- Full-length high-loss sum (T) is the sum of Trial Effective Duration - values of all full-length high-loss trials (I). -- Full-length low-loss sum (T) is the sum of Trial Effective Duration - values of all full-length low-loss trials (I). -- Short high-loss sum is the sum (T) of Trial Effective Duration values - of all short high-loss trials (I). -- Short low-loss sum is the sum (T) of Trial Effective Duration values - of all short low-loss trials (I). + - Full-length high-loss sum (T) is the sum of Trial Effective Duration + values of all full-length high-loss trials (I). + - Full-length low-loss sum (T) is the sum of Trial Effective Duration + values of all full-length low-loss trials (I). + - Short high-loss sum is the sum (T) of Trial Effective Duration values + of all short high-loss trials (I). + - Short low-loss sum is the sum (T) of Trial Effective Duration values + of all short low-loss trials (I). -### Derive Goal-Based Ratios +- Derive goal-based ratios: -- Subceed ratio (T) is One minus the Goal Exceed Ratio (I). -- Exceed coefficient (T) is the Goal Exceed Ratio divided by the subceed - ratio. + - Subceed ratio (T) is One minus the Goal Exceed Ratio (I). + - Exceed coefficient (T) is the Goal Exceed Ratio divided by the subceed + ratio. -### Balance Short-Trial Effects +- Balance short-trial effects: -- Balancing sum (T) is the short low-loss sum - multiplied by the exceed coefficient. -- Excess sum (T) is the short high-loss sum minus the balancing sum. -- Positive excess sum (T) is the maximum of zero and excess sum. + - Balancing sum (T) is the short low-loss sum + multiplied by the exceed coefficient. + - Excess sum (T) is the short high-loss sum minus the balancing sum. + - Positive excess sum (T) is the maximum of zero and excess sum. -### Compute Effective Duration Totals +- Compute effective duration totals -- Effective high-loss sum (T) is the full-length high-loss sum - plus the positive excess sum. -- Effective full sum (T) is the effective high-loss sum - plus the full-length low-loss sum. -- Effective whole sum (T) is the larger of the effective full sum - and the Goal Duration Sum. -- Missing sum (T) is the effective whole sum minus the effective full sum. + - Effective high-loss sum (T) is the full-length high-loss sum + plus the positive excess sum. + - Effective full sum (T) is the effective high-loss sum + plus the full-length low-loss sum. + - Effective whole sum (T) is the larger of the effective full sum + and the Goal Duration Sum. + - Missing sum (T) is the effective whole sum minus the effective full sum. -### Estimate Exceed Ratios +- Estimate exceed ratios: -- Pessimistic high-loss sum (T) is the effective high-loss sum - plus the missing sum. -- Optimistic exceed ratio (T) is the effective high-loss sum - divided by the effective whole sum. -- Pessimistic exceed ratio (T) is the pessimistic high-loss sum - divided by the effective whole sum. + - Pessimistic high-loss sum (T) is the effective high-loss sum + plus the missing sum. + - Optimistic exceed ratio (T) is the effective high-loss sum + divided by the effective whole sum. + - Pessimistic exceed ratio (T) is the pessimistic high-loss sum + divided by the effective whole sum. -### Classify the Load +- Classify the Load: -- The load is classified as an Upper Bound (O) if the optimistic exceed - ratio is larger than the Goal Exceed Ratio. -- The load is classified as a Lower Bound (O) if the pessimistic exceed - ratio is not larger than the Goal Exceed Ratio. -- The load is classified as undecided (O) otherwise. + - The load is classified as an Upper Bound (O) if the optimistic exceed + ratio is larger than the Goal Exceed Ratio. + - The load is classified as a Lower Bound (O) if the pessimistic exceed + ratio is not larger than the Goal Exceed Ratio. + - The load is classified as undecided (O) otherwise. ## Conditional Throughput Logic -### Collect Trial Results +- Collect Trial Results + + - Take all Trial Result instances (I) measured at a given Load. -- Take all Trial Result instances (I) measured at a given Load. +- Sum Full-Length Durations: -### Sum Full-Length Durations + - Full-length high-loss sum (T) is the sum of Trial Effective Duration + values of all full-length high-loss trials (I). + - Full-length low-loss sum (T) is the sum of Trial Effective Duration + values of all full-length low-loss trials (I). + - Full-length sum (T) is the full-length high-loss sum (I) plus the + full-length low-loss sum (I). -- Full-length high-loss sum (T) is the sum of Trial Effective Duration - values of all full-length high-loss trials (I). -- Full-length low-loss sum (T) is the sum of Trial Effective Duration - values of all full-length low-loss trials (I). -- Full-length sum (T) is the full-length high-loss sum (I) plus the - full-length low-loss sum (I). +- Derive initial thresholds: -### Derive Initial Thresholds + - Subceed ratio (T) is One minus the Goal Exceed Ratio (I) is called. + - Remaining sum (T) initially is full-lengths sum multiplied by subceed + ratio. + - Current loss ratio (T) initially is 100%. -- Subceed ratio (T) is One minus the Goal Exceed Ratio (I) is called. -- Remaining sum (T) initially is full-lengths sum multiplied by subceed - ratio. -- Current loss ratio (T) initially is 100%. +- Iterate through ordered trials -### Iterate Through Ordered Trials + - For each full-length trial result, sorted in increasing order by Trial + Loss Ratio: -- For each full-length trial result, sorted in increasing order by Trial - Loss Ratio: - - If remaining sum is not larger than zero, exit the loop. - - Set current loss ratio to this trial's Trial Loss Ratio (I). - - Decrease the remaining sum by this trial's Trial Effective Duration (I). + - If remaining sum is not larger than zero, exit the loop. + - Set current loss ratio to this trial's Trial Loss Ratio (I). + - Decrease the remaining sum by this trial's Trial Effective Duration (I). -### Compute Conditional Throughput +- Compute Conditional Throughput -- Current forwarding ratio (T) is One minus the current loss ratio. -- Conditional Throughput (T) is the current forwarding ratio multiplied - by the Load value. + - Current forwarding ratio (T) is One minus the current loss ratio. + - Conditional Throughput (T) is the current forwarding ratio multiplied + by the Load value. ### Conditional Throughput and Load Classification @@ -3460,140 +4315,426 @@ This can be possible only for non-zero Goal Loss Ratio, for example if SUT needs to "warm up" to best performance within each trial. Not commonly seen in practice. -## Example Search -{::comment} +# IANA Considerations - [MB107]: We may move this section to an appendix +This document does not make any request to IANA. - [VP]: Ok. +# Security Considerations - [MK]: TODO. Move to Appendix A, before the pseudocode Appendices. - Keeping it here for now to finish editing with clean change - tracking in gerrit. +Benchmarking activities as described in this memo are limited to +technology characterization of a DUT/SUT using controlled stimuli in a +laboratory environment, with dedicated address space and the constraints +specified in the sections above. -{:/comment} +The benchmarking network topology will be an independent test setup and +MUST NOT be connected to devices that may forward the test traffic into +a production network or misroute traffic to the test management network. -The following example Search is related to -one hypothetical run of a Search test procedure -that has been started with multiple Search Goals. -Several points in time are chosen, to show how the logic works, -with specific sets of Trial Result available. -The trial results themselves are not very realistic, as -the intention is to show several corner cases of the logic. +Further, benchmarking is performed on an "opaque" basis, relying +solely on measurements observable external to the DUT/SUT. -In all Trials, the Effective Trial Duration is equal to Trial Duration. +The DUT/SUT SHOULD NOT include features that serve only to boost +benchmark scores - such as a dedicated "fast-track" test mode that is +never used in normal operation. -Only one Trial Load is in focus, its value is one million frames per second. -Trial Results at other Trial Loads are not mentioned, -as the parts of logic present here do not depend on those. -In practice, Trial Results at other Load values would be present, -e.g., MLRsearch will look for a Lower Bound smaller than any Upper Bound found. +{::comment} -At any given moment, exactly one Search Goal is designated as in focus. -This designation affects only the Trial Duration chosen for new trials; -it does not alter the rest of the decision logic. + MB109: Some more elaboration is needed -An MLRsearch implementation is free to evaluate several goals -simultaneously — the “focus” mechanism is optional and appears here only -to show that a load can still be classified against goals that are not -currently in focus. + VP: This needs BMWG discussion as this chapter is a "boilerplate" + copied from earlier BMWG documents. -### Example Goals + MK: Edited -The following four Search Goal instances are selected for the example Search. -Each goal has a readable name and dense code, -the code is useful to show Search Goal attribute values. + VP: Ok, for draft11, but we can start discussing on bmwg for later versions. -As the variable "exceed coefficient" does not depend on trial results, -it is also precomputed here. +{:/comment} -Goal 1: +Any implications for network security arising from the DUT/SUT SHOULD be +identical in the lab and in production networks. - name: RFC2544 - Goal Final Trial Duration: 60s - Goal Duration Sum: 60s - Goal Loss Ratio: 0% - Goal Exceed Ratio: 0% - exceed coefficient: 0% / (100% / 0%) = 0.0 - code: 60f60d0l0e +{::comment} -Goal 2: + MB110: Why? We can accept some relax rule in controlled environnement, + but this not acceptable in deployement. I would adjust accordingly. - name: TST009 - Goal Final Trial Duration: 60s - Goal Duration Sum: 120s - Goal Loss Ratio: 0% - Goal Exceed Ratio: 50% - exceed coefficient: 50% / (100% - 50%) = 1.0 - code: 60f120d0l50e + VP: Explain and discuss in BMWG. -Goal 3: + MK: Keeping as is. It is a BMWG standard text that applies here. + You can see it verbatim in RFC 6815 (section 7), RFC 6414 (section 4.1), RFC + 9004 (section 8), and several BMWG Internet-Drafts. Its purpose is to + remind implementers and testers that the device under test must not + be re-configured into an unrealistic or less-secure state merely to + obtain benchmark data - a principle that complements the adjacent + sentence about avoiding "special benchmarking modes." Including + the sentence therefore maintains consistency with BMWG precedent + and reinforces a key security expectation. - name: 1s final - Goal Final Trial Duration: 1s - Goal Duration Sum: 120s - Goal Loss Ratio: 0.5% - Goal Exceed Ratio: 50% - exceed coefficient: 50% / (100% - 50%) = 1.0 - code: 1f120d.5l50e +{:/comment} -Goal 4: +{::comment} - name: 20% exceed - Goal Final Trial Duration: 60s - Goal Duration Sum: 60s - Goal Loss Ratio: 0.5% - Goal Exceed Ratio: 20% - exceed coefficient: 20% / (100% - 20%) = 0.25 - code: 60f60d0.5l20e + MB111: I would some text to basically + say that the benchmarking results should be adequately + protected and guards top prevent leaks to unauthorized + entities. + Otherwise, the benchmark results can be used by + attacker to better adjust their attacks and perform + attacks that would lead to DDoS a node of the DUT in a + live network, infer the limitation of a DUT that can be + used for overflow attacks, etc. + Also, we can say that the benchmark is agnostic to trafic + and does not manipulate real traffic. As such, Privacy is + not a concern. -The first two goals are important for compliance reasons, -the other two cover less frequent cases. + VP: To BMWG. -### Example Trial Results + MK: Keeping as is. See my comments above at MB110. -The following six sets of trial results are selected for the example Search. -The sets are defined as points in time, describing which Trial Results -were added since the previous point. +{:/comment} -Each point has a readable name and dense code, -the code is useful to show Trial Output attribute values -and number of times identical results were added. +# Acknowledgements -Point 1: +Special wholehearted gratitude and thanks to the late Al Morton for his +thorough reviews filled with very specific feedback and constructive +guidelines. Thank You Al for the close collaboration over the years, Your Mentorship, +Your continuous unwavering encouragement full of empathy and energizing +positive attitude. Al, You are dearly missed. - name: first short good - goal in focus: 1s final (1f120d.5l50e) - added Trial Results: 59 trials, each 1 second and 0% loss - code: 59x1s0l +Thanks to Gabor Lencse, Giuseppe Fioccola and BMWG contributors for good +discussions and thorough reviews, guiding and helping us to improve the +clarity and formality of this document. -Point 2: +Many thanks to Alec Hothan of the OPNFV NFVbench project for a thorough +review and numerous useful comments and suggestions in the earlier +versions of this document. - name: first short bad - goal in focus: 1s final (1f120d.5l50e) - added Trial Result: one trial, 1 second, 1% loss - code: 59x1s0l+1x1s1l +--- back -Point 3: +# Load Classification Code - name: last short bad - goal in focus: 1s final (1f120d.5l50e) - added Trial Results: 59 trials, 1 second each, 1% loss each - code: 59x1s0l+60x1s1l +{::comment} -Point 4: + MB112: Move after references - name: last short good - goal in focus: 1s final (1f120d.5l50e) - added Trial Results: one trial 1 second, 0% loss - code: 60x1s0l+60x1s1l + VP: Ok. -Point 5: + MK: Move after references. - name: first long bad - goal in focus: TST009 (60f120d0l50e) - added Trial Results: one trial, 60 seconds, 0.1% loss - code: 60x1s0l+60x1s1l+1x60s.1l + VP: Done by moving "--- back" above. + +{:/comment} + +This appendix specifies how to perform the Load Classification. + +Any Trial Load value can be classified, +according to a given [Search Goal](#search-goal) instance. + +The algorithm uses (some subsets of) the set of all available Trial Results +from Trials measured at a given Load at the end of the Search. + +The block at the end of this appendix holds pseudocode +which computes two values, stored in variables named +`optimistic_is_lower` and `pessimistic_is_lower`. + +Although presented as pseudocode, the listing is syntactically valid +Python and can be executed without modification. + +{::comment} + + MB113: Where is that python code? + + VP: Reformulate. + + MK: Edited. + +{:/comment} + +If values of both variables are computed to be true, the Load in question +is classified as a Lower Bound according to the given Search Goal instance. +If values of both variables are false, the Load is classified as an Upper Bound. +Otherwise, the load is classified as Undecided. + +Some variable names are shortened to fit expressions in one line. +Namely, variables holding sum quantities end in `_s` instead of `_sum`, +and variables holding effective quantities start in `effect_` +instead of `effective_`. + +The pseudocode expects the following variables to hold the following values: + +- `goal_duration_s`: The Goal Duration Sum value of the given Search Goal. + +- `goal_exceed_ratio`: The Goal Exceed Ratio value of the given Search Goal. + +- `full_length_low_loss_s`: Sum of Trial Effective Durations across Trials + with Trial Duration at least equal to the Goal Final Trial Duration + and with Trial Loss Ratio not higher than the Goal Loss Ratio + (across Full-Length Low-Loss Trials). + +- `full_length_high_loss_s`: Sum of Trial Effective Durations across Trials + with Trial Duration at least equal to the Goal Final Trial Duration + and with Trial Loss Ratio higher than the Goal Loss Ratio + (across Full-Length High-Loss Trials). + +- `short_low_loss_s`: Sum of Trial Effective Durations across Trials + with Trial Duration shorter than the Goal Final Trial Duration + and with Trial Loss Ratio not higher than the Goal Loss Ratio + (across Short Low-Loss Trials). + +- `short_high_loss_s`: Sum of Trial Effective Durations across Trials + with Trial Duration shorter than the Goal Final Trial Duration + and with Trial Loss Ratio higher than the Goal Loss Ratio + (across Short High-Loss Trials). + +The code works correctly also when there are no Trial Results at a given Load. + +~~~ python + +exceed_coefficient = goal_exceed_ratio / (1.0 - goal_exceed_ratio) +balancing_s = short_low_loss_s * exceed_coefficient +positive_excess_s = max(0.0, short_high_loss_s - balancing_s) +effect_high_loss_s = full_length_high_loss_s + positive_excess_s +effect_full_length_s = full_length_low_loss_s + effect_high_loss_s +effect_whole_s = max(effect_full_length_s, goal_duration_s) +quantile_duration_s = effect_whole_s * goal_exceed_ratio +pessimistic_high_loss_s = effect_whole_s - full_length_low_loss_s +pessimistic_is_lower = pessimistic_high_loss_s <= quantile_duration_s +optimistic_is_lower = effect_high_loss_s <= quantile_duration_s + +~~~ + +{::comment} + + MB114: May display this a table for better readability + + VP: Ok. + + MK: Disagree. Can we have it in a proper code block instead? + + VP: DONE: block with tags. + +{:/comment} + +# Conditional Throughput Code + +This section specifies an example of how to compute Conditional Throughput, +as referred to in Section [Conditional Throughput](#conditional-throughput). + +Any Load value can be used as the basis for the following computation, +but only the Relevant Lower Bound (at the end of the Search) +leads to the value called the Conditional Throughput for a given Search Goal. + +The algorithm uses (some subsets of) the set of all available Trial Results +from Trials measured at a given Load at the end of the Search. + +The block at the end of this appendix holds pseudocode +which computes a value stored as variable `conditional_throughput`. + +Although presented as pseudocode, the listing is syntactically valid +Python and can be executed without modification. + +Some variable names are shortened in order to fit expressions in one line. +Namely, variables holding sum quantities end in `_s` instead of `_sum`, +and variables holding effective quantities start in `effect_` +instead of `effective_`. + +The pseudocode expects the following variables to hold the following values: + +- `goal_duration_s`: The Goal Duration Sum value of the given Search Goal. + +- `goal_exceed_ratio`: The Goal Exceed Ratio value of the given Search Goal. + +- `full_length_low_loss_s`: Sum of Trial Effective Durations across Trials + with Trial Duration at least equal to the Goal Final Trial Duration + and with Trial Loss Ratio not higher than the Goal Loss Ratio + (across Full-Length Low-Loss Trials). + +- `full_length_high_loss_s`: Sum of Trial Effective Durations across Trials + with Trial Duration at least equal to the Goal Final Trial Duration + and with Trial Loss Ratio higher than the Goal Loss Ratio + (across Full-Length High-Loss Trials). + +- `full_length_trials`: An iterable of all Trial Results from Trials + with Trial Duration at least equal to the Goal Final Trial Duration + (all Full-Length Trials), sorted by increasing Trial Loss Ratio. + One item `trial` is a composite with the following two attributes available: + + - `trial.loss_ratio`: The Trial Loss Ratio as measured for this Trial. + + - `trial.effect_duration`: The Trial Effective Duration of this Trial. + +The code works correctly only when there is at least one +Trial Result measured at a given Load. + +~~~ python + +full_length_s = full_length_low_loss_s + full_length_high_loss_s +whole_s = max(goal_duration_s, full_length_s) +remaining = whole_s * (1.0 - goal_exceed_ratio) +quantile_loss_ratio = None +for trial in full_length_trials: + if quantile_loss_ratio is None or remaining > 0.0: + quantile_loss_ratio = trial.loss_ratio + remaining -= trial.effect_duration + else: + break +else: + if remaining > 0.0: + quantile_loss_ratio = 1.0 +conditional_throughput = intended_load * (1.0 - quantile_loss_ratio) + +~~~ + +{::comment} + + MB115: Please use and markers. + + VP: Also table? Ok. + + MK: Not table, it's code. Can we have it in a proper code + block instead? + + VP: DONE: block with tags. + +{:/comment} + +# Example Search + +{::comment} + + MB107: We may move this section to an appendix + + VP: Ok. + + MK: Move to Appendix A, before the pseudocode Appendices. + Keeping it here for now to finish editing with clean change + tracking in gerrit. + + VP: DONE: Appendix C now. A and B are for pseudocode as that is more important. + +{:/comment} + +The following example Search is related to +one hypothetical run of a Search test procedure +that has been started with multiple Search Goals. +Several points in time are chosen, to show how the logic works, +with specific sets of Trial Result available. +The trial results themselves are not very realistic, as +the intention is to show several corner cases of the logic. + +In all Trials, the Effective Trial Duration is equal to Trial Duration. + +Only one Trial Load is in focus, its value is one million frames per second. +Trial Results at other Trial Loads are not mentioned, +as the parts of logic present here do not depend on those. +In practice, Trial Results at other Load values would be present, +e.g., MLRsearch will look for a Lower Bound smaller than any Upper Bound found. + +At any given moment, exactly one Search Goal is designated as in focus. +This designation affects only the Trial Duration chosen for new trials; +it does not alter the rest of the decision logic. + +An MLRsearch implementation is free to evaluate several goals +simultaneously - the "focus" mechanism is optional and appears here only +to show that a load can still be classified against goals that are not +currently in focus. + +## Example Goals + +The following four Search Goal instances are selected for the example Search. +Each goal has a readable name and dense code, +the code is useful to show Search Goal attribute values. + +As the variable "exceed coefficient" does not depend on trial results, +it is also precomputed here. + +Goal 1: + + name: RFC2544 + Goal Final Trial Duration: 60s + Goal Duration Sum: 60s + Goal Loss Ratio: 0% + Goal Exceed Ratio: 0% + exceed coefficient: 0% / (100% / 0%) = 0.0 + code: 60f60d0l0e + +Goal 2: + + name: TST009 + Goal Final Trial Duration: 60s + Goal Duration Sum: 120s + Goal Loss Ratio: 0% + Goal Exceed Ratio: 50% + exceed coefficient: 50% / (100% - 50%) = 1.0 + code: 60f120d0l50e + +Goal 3: + + name: 1s final + Goal Final Trial Duration: 1s + Goal Duration Sum: 120s + Goal Loss Ratio: 0.5% + Goal Exceed Ratio: 50% + exceed coefficient: 50% / (100% - 50%) = 1.0 + code: 1f120d.5l50e + +Goal 4: + + name: 20% exceed + Goal Final Trial Duration: 60s + Goal Duration Sum: 60s + Goal Loss Ratio: 0.5% + Goal Exceed Ratio: 20% + exceed coefficient: 20% / (100% - 20%) = 0.25 + code: 60f60d0.5l20e + +The first two goals are important for compliance reasons, +the other two cover less frequent cases. + +## Example Trial Results + +The following six sets of trial results are selected for the example Search. +The sets are defined as points in time, describing which Trial Results +were added since the previous point. + +Each point has a readable name and dense code, +the code is useful to show Trial Output attribute values +and number of times identical results were added. + +Point 1: + + name: first short good + goal in focus: 1s final (1f120d.5l50e) + added Trial Results: 59 trials, each 1 second and 0% loss + code: 59x1s0l + +Point 2: + + name: first short bad + goal in focus: 1s final (1f120d.5l50e) + added Trial Result: one trial, 1 second, 1% loss + code: 59x1s0l+1x1s1l + +Point 3: + + name: last short bad + goal in focus: 1s final (1f120d.5l50e) + added Trial Results: 59 trials, 1 second each, 1% loss each + code: 59x1s0l+60x1s1l + +Point 4: + + name: last short good + goal in focus: 1s final (1f120d.5l50e) + added Trial Results: one trial 1 second, 0% loss + code: 60x1s0l+60x1s1l + +Point 5: + + name: first long bad + goal in focus: TST009 (60f120d0l50e) + added Trial Results: one trial, 60 seconds, 0.1% loss + code: 60x1s0l+60x1s1l+1x60s.1l Point 6: @@ -3622,12 +4763,12 @@ Comments on point in time naming: is a High-Loss Trial for goals "RFC2544" and "TST009", but it is a Low-Loss Trial for the two other goals. -### Load Classification Computations +## Load Classification Computations This section shows how Load Classification logic is applied by listing all temporary values at the specific time point. -#### Point 1 +### Point 1 This is the "first short good" point. Code for available results is: 59x1s0l @@ -3650,19 +4791,23 @@ Pessimistic high-loss sum | 60s | 120s | 61s | 60s Optimistic exceed ratio | 0% | 0% | 0% | 0% Pessimistic exceed ratio | 100% | 100% | 50.833% | 100% Classification Result | Undecided | Undecided | Undecided | Undecided + {::comment} - [MB108]: Please add a table legend. Idem for all tables + MB108: Please add a table legend. Idem for all tables + + VP: Ok. Figure out how. - [VP]: Ok. Figure out how. + MK: Kramdown magic. - [MK]: TODO. Kramdown magic. + VP: WONTFIX as currently not possible without coding effort: + https://github.com/gettalong/kramdown/issues/593 {:/comment} This is the last point in time where all goals have this load as Undecided. -#### Point 2 +### Point 2 This is the "first short bad" point. Code for available results is: 59x1s0l+1x1s1l @@ -3695,7 +4840,7 @@ is classified as an Upper Bound for RFC2544 this early. This is an example how significant time can be saved, compared to 60-second trials. -#### Point 3 +### Point 3 This is the "last short bad" point. Code for available trial results is: 59x1s0l+60x1s1l @@ -3726,7 +4871,7 @@ but its result will decide the classification result. The "20% exceed" started to classify this load as an Upper Bound somewhere between points 2 and 3. -#### Point 4 +### Point 4 This is the "last short good" point. Code for available trial results is: 60x1s0l+60x1s1l @@ -3754,7 +4899,7 @@ The one missing trial for "1s final" was Low-Loss, half of trial results are Low-Loss which exactly matches 50% exceed ratio. This shows time savings are not guaranteed. -#### Point 5 +### Point 5 This is the "first long bad" point. Code for available trial results is: 60x1s0l+60x1s1l+1x60s.1l @@ -3790,7 +4935,7 @@ But the 0.1% Trial Loss Ratio is lower than "20% exceed" Goal Loss Ratio, so this unexpected Full-Length Low-Loss trial changed the classification result of this Load to Lower Bound. -#### Point 6 +### Point 6 This is the "first long good" point. Code for available trial results is: 60x1s0l+60x1s1l+1x60s.1l+1x60s0l @@ -3819,7 +4964,7 @@ This Load is now classified for all goals; the search may end. Or, more realistically, it can focus on larger load only, as the three goals will want an Upper Bound (unless this Load is Max Load). -### Conditional Throughput Computations +## Conditional Throughput Computations At the end of this hypothetical search, the "RFC2544" goal labels the load as an Upper Bound, making it ineligible for Conditional-Throughput @@ -3829,7 +4974,7 @@ can compute Conditional-Throughput values for each of them. (The load under discussion is 1 000 000 frames per second.) -#### Goal 2 +### Goal 2 The Conditional Throughput is computed from sorted list of Full-Length Trial results. As TST009 Goal Final Trial Duration is 60 seconds, @@ -3856,7 +5001,7 @@ One has Trial Loss Ratio of 0%, the other of 0.1%. - Conditional Throughput is the current forwarding ratio multiplied by the Load value. - Conditional Throughput is one million frames per second. -#### Goal 3 +### Goal 3 The "1s final" has Goal Final Trial Duration of 1 second, so all 122 Trial Results are considered Full-Length Trials. @@ -3881,7 +5026,7 @@ The result does not depend on the order of 0% loss trials. - Set current loss ratio to this trial's Trial Loss Ratio which is 0%. - Decrease the remaining sum by this trial's Trial Effective Duration. - New remaining sum varies. -- After 61 trials, we have subtracted 60x1s + 1x60s from 120s, remaining 0s. +- After 61 trials, duration of 60x1s + 1x60s has been subtracted from 120s, leaving 0s. - For 62-th result (duration 60s, loss 0.1%): - Remaining sum is not larger than zero, exiting the loop. - Current forwarding ratio was most recently set to 0%. @@ -3890,7 +5035,7 @@ The result does not depend on the order of 0% loss trials. - Conditional Throughput is the current forwarding ratio multiplied by the Load value. - Conditional Throughput is one million frames per second. -#### Goal 4 +### Goal 4 The Conditional Throughput is computed from sorted list of Full-Length Trial results. As "20% exceed" Goal Final Trial Duration @@ -3924,368 +5069,35 @@ One has Trial Loss Ratio of 0%, the other of 0.1%. Due to stricter Goal Exceed Ratio, this Conditional Throughput is smaller than Conditional Throughput of the other two goals. -# IANA Considerations - -This document does not make any request to IANA. - -# Security Considerations - -Benchmarking activities as described in this memo are limited to -technology characterization of a DUT/SUT using controlled stimuli in a -laboratory environment, with dedicated address space and the constraints -specified in the sections above. - -The benchmarking network topology will be an independent test setup and -MUST NOT be connected to devices that may forward the test traffic into -a production network or misroute traffic to the test management network. - -Further, benchmarking is performed on an "opaque" basis, relying -solely on measurements observable external to the DUT/SUT. - -The DUT/SUT SHOULD NOT include features that serve only to boost -benchmark scores — such as a dedicated “fast-track” test mode that is -never used in normal operation. - -{::comment} - - [MB109]: Some more elaboration is needed - - [VP]: This needs BMWG discussion as this chapter is a “boilerplate” - copied from earlier BMWG documents. - - [MK]: Edited -{:/comment} - -Any implications for network security arising from the DUT/SUT SHOULD be -identical in the lab and in production networks. - -{::comment} - - [MB110]: Why? We can accept some relax rule in controlled environnement, - but this not acceptable in deployement. I would adjust accordingly. - - [VP]: Explain and discuss in BMWG. - - [MK]: Keeping as is. It is a BMWG standard text that applies here. - You can see it verbatim in RFC 6815 (§7), RFC 6414 (§4.1), RFC - 9004 (§8), and several BMWG Internet-Drafts. Its purpose is to - remind implementers and testers that the device under test must not - be re-configured into an unrealistic or less-secure state merely to - obtain benchmark data — a principle that complements the adjacent - sentence about avoiding “special benchmarking modes.” Including - the sentence therefore maintains consistency with BMWG precedent - and reinforces a key security expectation. - -{:/comment} - -{::comment} - - [MB111]: I would some text to basically - say that the benchmarking results should be adequately - protected and guards top prevent leaks to unauthorized - entities. - Otherwise, the benchmark results can be used by - attacker to better adjust their attacks and perform - attacks that would lead to DDoS a node of the DUT in a - live network, infer the limitation of a DUT that can be - used for overflow attacks, etc. - Also, we can say that the benchmark is agnostic to trafic - and does not manipulate real traffic. As such, Privacy is - not a concern. - - [VP]: To BMWG. - - [MK]: Keeping as is. See my comments above at MB110. - -{:/comment} - -# Acknowledgements - -Special wholehearted gratitude and thanks to the late Al Morton for his -thorough reviews filled with very specific feedback and constructive -guidelines. Thank You Al for the close collaboration over the years, Your Mentorship, -Your continuous unwavering encouragement full of empathy and energizing -positive attitude. Al, You are dearly missed. - -Thanks to Gabor Lencse, Giuseppe Fioccola and BMWG contributors for good -discussions and thorough reviews, guiding and helping us to improve the -clarity and formality of this document. - -Many thanks to Alec Hothan of the OPNFV NFVbench project for a thorough -review and numerous useful comments and suggestions in the earlier -versions of this document. - -# Appendix A: Load Classification - -{::comment} - - [MB112]: Move after references - - [VP]: Ok. - - [MK]: TODO. Move after references. -{:/comment} - -This appendix specifies how to perform the Load Classification. - -Any Trial Load value can be classified, -according to a given [Search Goal](#search-goal) instance. - -The algorithm uses (some subsets of) the set of all available Trial Results -from Trials measured at a given Load at the end of the Search. - -The block at the end of this appendix holds pseudocode -which computes two values, stored in variables named -`optimistic_is_lower` and `pessimistic_is_lower`. - -Although presented as pseudocode, the listing is syntactically valid -Python and can be executed without modification. -{::comment} - - [MB113]: Where is that python code? - - [VP]: Reformulate. - - [MK]: Edited. -{:/comment} - -If values of both variables are computed to be true, the Load in question -is classified as a Lower Bound according to the given Search Goal instance. -If values of both variables are false, the Load is classified as an Upper Bound. -Otherwise, the load is classified as Undecided. - -Some variable names are shortened to fit expressions in one line. -Namely, variables holding sum quantities end in `_s` instead of `_sum`, -and variables holding effective quantities start in `effect_` -instead of `effective_`. - -The pseudocode expects the following variables to hold the following values: - -- `goal_duration_s`: The Goal Duration Sum value of the given Search Goal. - -- `goal_exceed_ratio`: The Goal Exceed Ratio value of the given Search Goal. - -- `full_length_low_loss_s`: Sum of Trial Effective Durations across Trials - with Trial Duration at least equal to the Goal Final Trial Duration - and with Trial Loss Ratio not higher than the Goal Loss Ratio - (across Full-Length Low-Loss Trials). - -- `full_length_high_loss_s`: Sum of Trial Effective Durations across Trials - with Trial Duration at least equal to the Goal Final Trial Duration - and with Trial Loss Ratio higher than the Goal Loss Ratio - (across Full-Length High-Loss Trials). - -- `short_low_loss_s`: Sum of Trial Effective Durations across Trials - with Trial Duration shorter than the Goal Final Trial Duration - and with Trial Loss Ratio not higher than the Goal Loss Ratio - (across Short Low-Loss Trials). - -- `short_high_loss_s`: Sum of Trial Effective Durations across Trials - with Trial Duration shorter than the Goal Final Trial Duration - and with Trial Loss Ratio higher than the Goal Loss Ratio - (across Short High-Loss Trials). - -The code works correctly also when there are no Trial Results at a given Load. - -~~~ python -exceed_coefficient = goal_exceed_ratio / (1.0 - goal_exceed_ratio) - -balancing_s = short_low_loss_s * exceed_coefficient - -positive_excess_s = max(0.0, short_high_loss_s - balancing_s) - -effect_high_loss_s = full_length_high_loss_s + positive_excess_s - -effect_full_length_s = full_length_low_loss_s + effect_high_loss_s - -effect_whole_s = max(effect_full_length_s, goal_duration_s) - -quantile_duration_s = effect_whole_s * goal_exceed_ratio - -pessimistic_high_loss_s = effect_whole_s - full_length_low_loss_s - -pessimistic_is_lower = pessimistic_high_loss_s <= quantile_duration_s - -optimistic_is_lower = effect_high_loss_s <= quantile_duration_s -~~~ {::comment} - [MB114]: May display this a table for better readability - - [VP]: Ok. + WONTFIX214: There are long lines. + Too late for draft-11. - [MK]: TODO. Disagree. Can we have it in a proper code block instead? {:/comment} -# Appendix B: Conditional Throughput - -This section specifies an example of how to compute Conditional Throughput, -as referred to in Section [Conditional Throughput](#conditional-throughput). - -Any Load value can be used as the basis for the following computation, -but only the Relevant Lower Bound (at the end of the Search) -leads to the value called the Conditional Throughput for a given Search Goal. - -The algorithm uses (some subsets of) the set of all available Trial Results -from Trials measured at a given Load at the end of the Search. - -The block at the end of this appendix holds pseudocode -which computes a value stored as variable `conditional_throughput`. - -Although presented as pseudocode, the listing is syntactically valid -Python and can be executed without modification. - -Some variable names are shortened in order to fit expressions in one line. -Namely, variables holding sum quantities end in `_s` instead of `_sum`, -and variables holding effective quantities start in `effect_` -instead of `effective_`. - -The pseudocode expects the following variables to hold the following values: - -- `goal_duration_s`: The Goal Duration Sum value of the given Search Goal. - -- `goal_exceed_ratio`: The Goal Exceed Ratio value of the given Search Goal. - -- `full_length_low_loss_s`: Sum of Trial Effective Durations across Trials - with Trial Duration at least equal to the Goal Final Trial Duration - and with Trial Loss Ratio not higher than the Goal Loss Ratio - (across Full-Length Low-Loss Trials). - -- `full_length_high_loss_s`: Sum of Trial Effective Durations across Trials - with Trial Duration at least equal to the Goal Final Trial Duration - and with Trial Loss Ratio higher than the Goal Loss Ratio - (across Full-Length High-Loss Trials). - -- `full_length_trials`: An iterable of all Trial Results from Trials - with Trial Duration at least equal to the Goal Final Trial Duration - (all Full-Length Trials), sorted by increasing Trial Loss Ratio. - One item `trial` is a composite with the following two attributes available: - - - `trial.loss_ratio`: The Trial Loss Ratio as measured for this Trial. - - - `trial.effect_duration`: The Trial Effective Duration of this Trial. - -The code works correctly only when there is at least one -Trial Result measured at a given Load. - -~~~ python -full_length_s = full_length_low_loss_s + full_length_high_loss_s -whole_s = max(goal_duration_s, full_length_s) -remaining = whole_s * (1.0 - goal_exceed_ratio) -quantile_loss_ratio = None -for trial in full_length_trials: - if quantile_loss_ratio is None or remaining > 0.0: - quantile_loss_ratio = trial.loss_ratio - remaining -= trial.effect_duration - else: - break -else: - if remaining > 0.0: - quantile_loss_ratio = 1.0 -conditional_throughput = intended_load * (1.0 - quantile_loss_ratio) -~~~ {::comment} - [MB115]: Please use and markers. - - [VP]: Also table? Ok. - - [MK]: TODO. Not table, it's code. Can we have it in a proper code - block instead? + VP: DONE: Fix warnings from kramdown. {:/comment} -# Index - -{::comment} - - TODO-P2: There are long lines. - -{:/comment} - -{::comment} - - [MB112]: Move after references - - [VP]: Ok. - - [MK]: TODO - -{:/comment} - -- Bound: Lower Bound or Upper Bound. -- Bounds: Lower Bound and Upper Bound. -- Conditional Throughput: defined in [Conditional Throughput](#conditional-throughput), discussed in [Generalized Throughput](#generalized-throughput). -- Controller: introduced in [Overview ](#overview), defined in [Controller ](#controller). -- Controller Input: defined in [Controller Input](#controller-input). -- Controller Output: defined in [Controller Output](#controller-output). -- Full-Length Trial: defined in [Full-Length Trial](#full-length-trial). -- Goal Duration Sum: defined in [Goal Duration Sum](#goal-duration-sum), discussed in [Exceed Ratio and Multiple Trials](#exceed-ratio-and-multiple-trials). -- Goal Exceed Ratio: defined in [Goal Exceed Ratio](#goal-exceed-ratio), discussed in [Exceed Ratio and Multiple Trials](#exceed-ratio-and-multiple-trials). -- Goal Final Trial Duration: defined in [Goal Final Trial Duration](#goal-final-trial-duration). -- Goal Initial Trial Duration: defined in [Goal Initial Trial Duration](#goal-initial-trial-duration). -- Goal Loss Ratio: defined in [Goal Loss Ratio](#goal-loss-ratio). -- Goal Result: defined in [Goal Result](#goal-result). -- Goal Width: defined in [Goal Width](#goal-width). -- Exceed Probability: defined in [Exceed Probability](#exceed-probability) -- High-Loss Trial: defined in [High-Loss Trial](#high-loss-trial). -- Intended Load: defined in [RFC2285] (Section 3.5.1). -- Irregular Goal Result: defined in [Irregular Goal Result](#irregular-goal-result). -- Load: introduced in [Trial Load](#trial-load). -- Load Classification: Introduced in [Overview ](#overview), defined in [Load Classification](#load-classification), discussed in [Load Classification Logic](#load-classification-logic). -- Loss Inversion: Situation introduced in [Inconsistent Trial Results](#inconsistent-trial-results), defined in [Loss Ratios and Loss Inversion](#loss-ratios-and-loss-inversion). -- Low-Loss Trial: defined in [Low-Loss Trial](#low-loss-trial). -- Lower Bound: defined in [Lower Bound](#lower-bound). -- Manager: introduced in [Overview ](#overview), defined in [Manager ](#manager). -- Max Load: defined in [Max Load](#max-load). -- Measurer: introduced in [Overview ](#overview), defined in [Measurer ](#measurer). -- Min Load: defined in [Min Load](#min-load). -- MLRsearch Specification: introduced in [Purpose and Scope](#purpose-and-scope) - and in [Overview ](#overview), defined in [Test Procedure Compliant with MLRsearch](#test-procedure-compliant-with-mlrsearch). -- MLRsearch Implementation: defined in [Test Procedure Compliant with MLRsearch](#test-procedure-compliant-with-mlrsearch). -- Offered Load: defined in [RFC2285] (Section 3.5.2). -- Regular Goal Result: defined in [Regular Goal Result](#regular-goal-result). -- Relevant Bound: Relevant Lower Bound or Relevant Upper Bound. -- Relevant Bounds: Relevant Lower Bound and Relevant Upper Bound. -- Relevant Lower Bound: defined in [Relevant Lower Bound](#relevant-lower-bound), discussed in [Conservativeness and Relevant Bounds](#conservativeness-and-relevant-bounds). -- Relevant Upper Bound: defined in [Relevant Upper Bound](#relevant-upper-bound). -- Search: defined in [Overview ](#overview). -- Search Duration: introduced in [Purpose and Scope](#purpose-and-scope) and in [Long Search Duration](#long-search-duration), discussed in [Stopping Conditions and Precision](#stopping-conditions-and-precision). -- Search Goal: defined in [Search Goal](#search-goal). -- Search Result: defined in [Search Result](#search-result). -- Short Trial: defined in [Short Trial](#short-trial). -- Throughput: defined in Section 3.17 of [RFC1242], Methodology specified in Section 26.1 of [RFC2544]. -- Trial: defined in [Trial ](#trial). -- Trial Duration: defined in [Trial Duration](#trial-duration). -- Trial Effective Duration: defined in [Trial Effective Duration](#trial-effective-duration). -- Trial Forwarding Rate: defined in [Trial Forwarding Rate](#trial-forwarding-rate). -- Trial Forwarding Ratio: defined in [Trial Forwarding Ratio](#trial-forwarding-ratio). -- Trial Input: defined in [Trial Input](#trial-input). -- Trial Loss Ratio: defined in [Trial Loss Ratio](#trial-loss-ratio). -- Trial Load: defined in [Trial Load](#trial-load). -- Trial Output: defined in [Trial Output](#trial-output). -- Trial Result: defined in [Trial Result](#trial-result). -- Undecided: defined in [Undecided ](#undecided). -- Upper Bound: defined in [Upper Bound](#upper-bound). - ---- back - {::comment} [Final checklist.] - [VP] Final Checks. Only mark as done when there are no active todos above. + VP WONTFIX215 Final Checks. Only mark as done when there are no active todos above. - [VP] Rename chapter/sub-/section to better match their content. + VP Rename chapter/sub-/section to better match their content. - MKP3 [VP] TODO: Recheck the definition dependencies go bottom-up. + MKP3 VP WONTFIX216: Recheck the definition dependencies go bottom-up. - [VP] TODO: Unify external reference style (brackets, spaces, section numbers and names). + VP DONE217: Unify external reference style (brackets, spaces, section numbers and names). - MKP2 [VP] TODO: Capitalization of New Terms: useful when editing and reviewing, + MKP2 VP DONE218: Capitalization of New Terms: useful when editing and reviewing, but I still vote to remove capitalization before final submit, because all other RFCs I see only capitalize due to being section title. - [VP] TODO: If time permits, keep improving formal style (e.g., using AI). + VP WONTFIX219: If time permits, keep improving formal style (e.g., using AI). {:/comment} diff --git a/docs/ietf/process.txt b/docs/ietf/process.txt index 869c66b613..e6cb7d7d89 100644 --- a/docs/ietf/process.txt +++ b/docs/ietf/process.txt @@ -25,47 +25,9 @@ $ sudo gem install kramdown-rfc $ kdrfc --version Main: -$ kdrfc draft-ietf-bmwg-mlrsearch-10.md +$ kdrfc draft-ietf-bmwg-mlrsearch-11.md If that complains, do it manually at https://author-tools.ietf.org/ Finally: https://author-tools.ietf.org/idnits - - -Now that I have forced the Canvas editor into source-editing style (as opposed to rendered style), -I have real tasks for you, they involve multiple document files. -I am working on an upcoming IETF document, currently still in the draft state. -I have attached two files, one is a markdown source of the document as uploaded recently (draft version 10). -The other attached file is a .doc feedback from AD editor (in e-mails signing as "Med"). -The attached markdown file already contains few comments, enclosed between tags `{::comment}` and `{:/comment}`. -The .doc file contains in-place edits proposed by the AD editor, and also Word-specific review-comments from him and my reactions. -I have copied the work-in-progress state of the markdown source into the Canvas editor. -The issue is the whole markdown file is too long for your output tokens limit. -That means everytime I type something into this main chat (Enter a prompt for Gemini) dialog -and you decide to edit something in the Canvas, you spent 20 minutes writing mostly unedited text into Canvas, -during which the Editor is in rendering mode (making your outputs not usable as markdown source yet) -but you run out of tokens, GUI reports "Network error" and I cannot easily make the editor switch back to source-editing mode. -At which point I have to create a new conversation and start over. -So, I would appreciate you not touching the Canvas content, at all. -Even if I tell you otherwise, you should ask for confirmation first. -The other dialog (Ask Gemini), when only a small part of text is selected in Canvas, works only as a pointer. -If you decide to edit the selected area you still attempt to write the whole file and fail. -The current editing phase focuses on copying review-comments from the .doc file into markdown as kramdown-friendly comment blocks. -I already see at least two mistakes, missing or misplaced comment from Med or reply from me (VP). -What is the first mistake you see in the Canvas state compared to .doc review comments? -(You can write the correction to the main chat, but be careful not to touch Canvas content.) - - - -(do not forget to enable canvas) - -I have an example markdown source below, including a Python code sub-block. Put that markdown source into Canvas, as a new code immersive with the markdown language type. - -### Example subsection - -``` python -print("Output text example.") -``` - -Trailing text example.