From: Vratko Polak Date: Tue, 2 Sep 2025 12:21:09 +0000 (+0200) Subject: fix(ietf): Remove comment blocks from mlrsearch X-Git-Url: https://gerrit.fd.io/r/gitweb?a=commitdiff_plain;h=refs%2Fchanges%2F50%2F43650%2F3;p=csit.git fix(ietf): Remove comment blocks from mlrsearch Not squashing as reviewers got specific URLs. Change-Id: I567319b808e62a1cb4bc2ac9609bbe80b817b63f Signed-off-by: Vratko Polak --- diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-12.md b/docs/ietf/draft-ietf-bmwg-mlrsearch-12.md index a85a6ff5a4..00a7d88620 100644 --- a/docs/ietf/draft-ietf-bmwg-mlrsearch-12.md +++ b/docs/ietf/draft-ietf-bmwg-mlrsearch-12.md @@ -39,28 +39,9 @@ normative: informative: RFC5180: -# Comment: This part before first --- is not markdown but YAML, so comments need different escape. -#{::comment} -# -# MB116: Please move to information, as this was provided only as an example. -# -# VP: Ok. -# -# MK: Moved. -# -#{:/comment} RFC6349: RFC6985: RFC8219: -#{::comment} -# -# MB117: Idem as the other entry. -# -# VP: Ok. -# -# MK: Moved. -# -#{:/comment} TST009: target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf title: "TST 009" @@ -78,23 +59,6 @@ informative: Lencze-Shima: target: https://datatracker.ietf.org/doc/html/draft-lencse-bmwg-rfc2544-bis-00 title: "An Upgrade to Benchmarking Methodology for Network Interconnect Devices - expired" -#{::comment} -# -# MB118: This was expired since 2020. Please remove. Idem for all similar entries -# -# VP: Hmm, ok. -# -# MK: Disagree. It is still a useful reference. Marking as expired, -# but keeping it here. Can we add following entry: -# -# [Lencze-Shima] Lencse, G., "Benchmarking Methodology for IP -# Forwarding Devices - RFC 2544bis", Work in Progress, -# Internet-Draftdraft-lencse-bmwg-rfc2544-bis-009 March 2015. -# (Expired.) -# -# VP: DONE201: Delete or update or replace the active item. -# -#{:/comment} Lencze-Kovacs-Shima: target: http://dx.doi.org/10.11601/ijates.v9i2.288 title: "Gaming with the Throughput and the Latency Benchmarking Measurement Procedures of RFC 2544" @@ -114,63 +78,11 @@ defining a new methodology called Multiple Loss Ratio search support multiple loss ratio searches, and improve result repeatability and comparability. -{::comment} - - MB2: The abstract should self-contained. Hence the need to expand the RFC title. - - VP: Ok. - - MK: Ok. Edited. - -{:/comment} - -{::comment} - - MB1: This may trigger automatically a comment whether we change (update or amend) any of RFC2544 text. - Do we? - - VP: Pending BMWG decision. - VP: For draft11: Officially independent. - - MK: MLRsearch extends RFC2544. Does not change it, nor does it amend it. - - VP: The idea was to extend in sense of adding one new benchmark. - But as we added more requirements and possible deviations around trials, - the new methodology is independent from (while possible to combine with) RFC2544. - - MK: LGTM. - -{:/comment} - MLRsearch is motivated by the pressing need to address the challenges of evaluating and testing the various data plane solutions, especially in software- based networking systems based on Commercial Off-the-Shelf (COTS) CPU hardware vs purpose-built ASIC / NPU / FPGA hardware. -{::comment} - - MB3: What is meant here? What is specific to these systems? - Do we need to have this mention at this stage? - - VP: Do not distinguish in abstract - - MK: Updated text to focus on COTS hardware vs purpose-built - hardware. Let us know if this requires further text in abstract. - (We should keep it concise.) - -{:/comment} - -{::comment} - - MB4: Too detailed for an abstract. Can be mentioned in an overview/introduction section - - VP: Agreed, we no not need to list the options here. - - MK: OK. - MK: Removed. - -{:/comment} - --- middle {::comment} @@ -197,65 +109,14 @@ Network Function running on shared servers in the compute cloud). ## Purpose -{::comment} - - MK: Suggest to change title to Purpose, as it does not provide - brief overview of the document's structure and key content areas. - - VP: Done. - -{:/comment} - The purpose of this document is to describe the Multiple Loss Ratio search (MLRsearch) methodology, optimized for determining data plane throughput in software-based networking devices and functions. -{::comment} - - MB6: Should be defined. - Not sure what is specific as any networking device is a software-based device. Even hardware, it is not more than frozen software ;) - - VP: We can mention "noisiness" here, not sure how detailed - - MK: Good point. Added text clarifying the difference. See if this - is good enough, or does this need any more explanation. - MK: Edited. - -{:/comment} - Applying the vanilla throughput binary search, as specified for example in [TST009] and [RFC2544] to software devices under test (DUTs) results in several problems: -{::comment} - - MB7: Can we have an explicit reference for the method? - - VP: Need to search but should be doable - - MK: RFC2544 mentions binary-search style procedure without fully - specifying the algorithm. The only other standard that defines is - ETSI GS NFV-TST 009 - adding it here. - MK: Edited. - - VP: Removed RFC 2544 as I understand MB wants reference to specifics. - - MK: But section 24 of RFC 2544 does list "binary search", and it is - this that has been adopted as a defacto standard. Re-adding it back - in :) - -{:/comment} - -{::comment} - - MB8: Expand - - VP: Ok (point to DUT). - - MK: Edited. - -{:/comment} - - Binary search takes long as most trials are done far from the eventually found throughput. - The required final trial duration and pauses between trials @@ -270,40 +131,6 @@ to software devices under test (DUTs) results in several problems: throughput metric can no longer be pinned to a single, unambiguous value.) -{::comment} - - MB9: Can we have a public reference to share here? - - VP: Need to search but should be doable). - - MK: Removed "too". Explanation and public references are provided - in the Identified Problems section. - MK: Edited. - -{:/comment} - -{::comment} - - MB10: What is meant there? - - VP: Expand (industry). - - MK: Improved clarity, by referring to loss tolerance. Added references. - MK: Edited. - -{:/comment} - -{::comment} - - MB11: Can we expand on this one? - - VP: Some soft intro to inconsistent trials may be needed here. - - MK: Added text in brackets. See if it is sufficient. - MK: Edited. - -{:/comment} - To address these problems, early MLRsearch implementations employed the following enhancements: @@ -322,65 +149,9 @@ early MLRsearch implementations employed the following enhancements: 5. Apply several time-saving load selection heuristics that deliberately prevent the bounds from narrowing unnecessarily. -{::comment} - - MB12: There is no such section in the document. - Do you meant Section 3.6.2 of [RFC2285]? - If so, please update accordingly. - Idem for all similar occurrences in the document. Thanks. - - VP: Clarify. Check for every external section referenced. - - MK: Yes Section 3.6.2 of [RFC2285] defining FRMOL. - MK: Edited. - -{:/comment} - -{::comment} - - MB13: Maximizing means? - - VP: Reformulate. - - MK: Edited. - -{:/comment} - -{::comment} - - VP: Item 3 is also mostly out of scope, - if we do not count Goal Initial Trial Duration - (it is and example of optional attribute, not a recommendation). - - TODO202: Either say the list talks about CSIT implementation, - or downgrade item 3 to level of item 5 (example optimization - that is ultimately out of scope of MLRsearch Specification). - -{:/comment} - Enhacements 1, 2 and partly 4 are formalized as MLRsearch Specification within this document, other implementation details are out the scope. -{::comment} - - MB14: Which ones? - - VP: Describe the lists better so "some" is not needed here. - - MK: Edited. - -{:/comment} - -{::comment} - - MB15: Where? In this document? - - VP: Yes. - - MK: Edited. - -{:/comment} - The remaining enhancements are treated as implementation details, thus achieving high comparability without limiting future improvements. @@ -396,88 +167,8 @@ Exact settings are not specified, but see the discussion in [Overview of RFC 2544 Problems](#overview-of-rfc-2544-problems) for the impact of different settings on result quality. -{::comment} - - MB16: Where are those defined? Please add a pointer to the appropriate section. - - VP: Add pointer. - - MK: DONE203. We do not have any section in this document covering - implementation details, as these are out of scope. Shall we add a - note to that regard? - -{:/comment} - -{::comment} - - MB17: "flexibe" is ambiguous. Simply, state what we do. - - VP: Reformulate. - - MK: Edited. - -{:/comment} - -{::comment} - - MB18: Add pointers where this is further elaborated. - - VP: Point to specific subsection. - - MK: Added. - -{:/comment} - This document does not change or obsolete any part of [RFC2544]. -{::comment} - - MB19: List the set of terms/definitions used in this document. - I guess we should at least leverage terms defined in 2544/1242. - - VP: Move list of terms here? - - MK: Relevant existing terms, including the ones from rfcs 1242, - 2285 and 2544, are captured in section 4.3 Existing Terms, followed - by the new terms that form the MLRsearch Specification. We went - through quite a few iterations of getting it right, including a - separate terminology section at the beginning of the document, and - following BMWG comments and reviews ended up with the current - document structure. Reworking it back is substantial work - - MK: Instead I propose we list one liners explaining the term in - the context of the benchmarking domain. - - VP: See the comment in first Specification paragraph. - For specific MB comment, I propose to say no edit needed, - but ask on bmwg mailer to confirm. - -{:/comment} - -{::comment} - - MB20: Also, please add a statement that the convention used in bmwg - are followed here as well (def, discussion, etc.) - - VP: Ok - - MK: The Requirements Language text is the standard one we use in - BMWG. There are no any strict BMWG conventions that are followed in - this document. Rather, the convention used for terms that are - specific to this document, is described in the Section 4 of this - document, and forms part of the MLRsearch Specification. - - VP: I think this is done, covered by edits elsewhere. - -{:/comment} - -{::comment} - - WONTFIX204: Update the subsection above when the subsections below are complete enough. - Too late. - -{:/comment} - ## Positioning within BMWG Methodologies The Benchmarking Methodology Working Group (BMWG) produces recommendations (RFCs) @@ -530,18 +221,6 @@ has increased both the number of performance tests required to verify the DUT update and the frequency of running those tests. This makes the overall test execution time even more important than before. -{::comment} - - MB21: Is this really new? - - VP: Not sure, ask Maciek - - MK: Changed "emergence" to "proliferation". And yes, the - proliferation and their importance is new. - MK: Edited. - -{:/comment} - The throughput definition per [RFC2544] restricts the potential for time-efficiency improvements. The bisection method, when used in a manner unconditionally compliant @@ -550,70 +229,12 @@ with [RFC2544], is excessively slow due to two main factors. Firstly, a significant amount of time is spent on trials with loads that, in retrospect, are far from the final determined throughput. -{::comment} - - MB22: Won't age well - - VP: I agree, should be reformulated, not sure how. - - MK: Accepted proposed text change. - MK: Edited. - -{:/comment} - -{::comment} - - MB23: Concretely, be affirmative if we provide an elaborated def, - otherwise this statement can be removed. - - VP: Reformulate to affirm and point. - - MK: Agree. This is problem statement, not solution description, so - removed this paragraph. - MK: Removed. - -{:/comment} - -{::comment} - - MB24: Can we have a reference? - - VP: Find references. - - MK: Added wording connecting to the following paragraphs with - explanations. - MK: Edited. - -{:/comment} - -{::comment} - - MB25: Define "users". - - VP: Yes, we should be more careful around role names. - - MK: Added text. - MK: Edited. - -{:/comment} - Secondly, [RFC2544] does not specify any stopping condition for throughput search, so users of testing equipment implementing the procedure already have access to a limited trade-off between search duration and achieved precision. However, each of the full 60-second trials doubles the precision. -{::comment} - - MB26: Can we include a reminder of the 2544 search basics? (no need to be verbose, though)? - - VP: Maybe, not sure how feasible. - - MK: Added. - MK: Edited. - -{:/comment} - As such, not many trials can be removed without a substantial loss of precision. For reference, here is a brief [RFC2544] throughput binary @@ -634,17 +255,6 @@ DUT as: - The network frame forwarding device to which stimulus is offered and response measured Section 3.1.1 of [RFC2285]. -{::comment} - - MB27: Double check - - VP: Ok. - - MK: Checked. OK. - MK: Edited. - -{:/comment} - SUT as: - The collective set of network devices as a single entity to which @@ -661,51 +271,16 @@ DUT, but the entire execution environment: host hardware, firmware and kernel/hypervisor services, as well as any other software workloads that share the same CPUs, memory and I/O resources. -{::comment} - - MB28: This makes assumptions on the software architecture. We need to make sure this is generic enough. - For example, what is a server? Etc. - Does it applies to container, microservice, SF a la RFC7665, VNF a la ETSI, etc.? - - VP: Ask Maciek. - - MK: Rewritten it a bit to make it more generic. See if this helps. - MK: Edited. - -{:/comment} - Given that a SUT is a shared multi-tenant environment, the DUT might inadvertently experience interference from the operating system or other software operating on the same server. -{::comment} - - MB29: Such as? - - VP: We should reformulate. Other components may differ (give few examples) but interference is general. - - MK: Removed surplus text, as it is now explained in preceding paragraph. - MK: Edited. - -{:/comment} - Some of this interference can be mitigated. For instance, in multi-core CPU systems, pinning DUT program threads to specific CPU cores and isolating those cores can prevent context switching. -{::comment} - - MB30: If many? Or do we assume there are always many? - - VP: Reformulate. - - MK: Made it explicit for this paragraph. - MK: Edited. - -{:/comment} - Despite taking all feasible precautions, some adverse effects may still impact the DUT's network performance. In this document, these effects are collectively @@ -747,17 +322,6 @@ to be observable, this time because minor noise events almost always occur during each trial, nudging the measured performance slightly below the theoretical maximum. -{::comment} - - MB31: I don't parse this one. Please reword. - - VP: Ok. - - MK: Rephrased. Hope it reads better now. - MK: Edited. - -{:/comment} - Unless specified otherwise, this document's focus is on the potentially observable ends of the SUT performance spectrum, as opposed to the extreme ones. @@ -769,58 +333,12 @@ as there are no realistic enough models that would be capable to distinguish SUT noise from DUT fluctuations (based on the available literature at the time of writing). -{::comment} - - MB32: As we need to reflect the view of the WG/IETF, not only authors - - VP: Ask Maciek. - - MK: Proposed text looks good. OK. - MK: Edited. - -{:/comment} - Provided SUT execution environment and any co-resident workloads place only negligible demands on SUT shared resources, so that the DUT remains the principal performance limiter, the DUT's ideal noiseless performance is defined as the noiseless end of the SUT performance spectrum. -{::comment} - - MB33: That is? - - VP: Reformulate. - - MK: Clarified. - MK: Edited. - -{:/comment} - -{::comment} - - MB34: Please avoid "we" constructs. - - VP: Ok. Search and replace all into passive voice. - - MK: for the whole document. - - VP: Done here, created separate comments elsewhere. - -{:/comment} - -{::comment} - - MB35: Can we cite an example? - - VP: Yes for latency - - MK: Focus of mlrsearch is finding throughput. On 2nd thought, - removing reference to latency as it is not applicable. - MK: Edited. - -{:/comment} - Note that by this definition, DUT noiseless performance also minimizes the impact of DUT fluctuations, as much as realistically possible for a given trial duration. @@ -835,16 +353,6 @@ explicitly model SUT-generated noise, enabling to derive surrogate metrics that approximate the (proxies for) DUT noiseless performance across a range of SUT noise-tolerance levels. -{::comment} - - MB36: ? - - VP: Reformulate. - - MK: Edited. - -{:/comment} - ## Repeatability and Comparability [RFC2544] does not suggest repeating throughput search. Also, note that @@ -884,17 +392,6 @@ An alternative option is to simply run a search multiple times, and report some statistics (e.g., average and standard deviation, and/or percentiles like p95). -{::comment} - - MB37: What about at some other representative percentiles? - - VP: Ok. - - MK: Added percentiles. - MK: Edited. - -{:/comment} - This can be used for a subset of tests deemed more important, but it makes the search duration problem even more pronounced. @@ -919,77 +416,19 @@ Motivations are many: - Networking protocols tolerate frame loss better, compared to the time when [RFC1242] and [RFC2544] were specified. -{::comment} - - MB38: 1242 was also modern at the time they were published ;) - This can be easily stale. Let's avoid that - - VP: Ok. - - MK: OK. - -{:/comment} - -- Increased link speeds require trials sending way more frames within the same duration, - increasing the chance of a small SUT performance fluctuation - being enough to cause frame loss. - -{::comment} - - MB39: Won't age well. - - VP: Ok, but some things did change over time (in focus if not in existence). Ask Maciek. - - MK: Edited. - -{:/comment} - -- Because noise-related drops usually arrive in small bursts, their - impact on the trial's overall frame loss ratio is diluted by the - longer intervals in which the SUT operates close to its noiseless - performance; consequently, the averaged Trial Loss Ratio can still - end up below the specified Goal Loss Ratio value. - -{::comment} - - MB40: Please split. Too long - - VP: At this point we probably should add a subsection somewhere, - discussing how short-time performance may fluctuate within reasonable-duration trial - (even as short as 1s). - - MK: Split with some rewording. - MK: Edited. - -{:/comment} - -- If an approximation of the SUT noise impact on the Trial Loss Ratio is known, - it can be set as the Goal Loss Ratio (see definitions of - Trial and Goal terms in [Trial Terms](#trial-terms) and [Goal Terms](#goal-terms)). - -{::comment} - - MB41: Help readers find where to look for an authoritative definition. - - VP: The original paragraph maybe describes periodic processes eating CPU or even impact - of reconfiguration during traffic, but both may be too exotic for this specification. - I recommend to delete this paragraph. Otherwise, add link. - - MK: Added. - MK: Edited. - -{:/comment} - -{::comment} - - MB42: Help readers find where to look for an authoritative definition. - - VP: Add link if not deleted? +- Increased link speeds require trials sending way more frames within the same duration, + increasing the chance of a small SUT performance fluctuation + being enough to cause frame loss. - MK: Added. - MK: Edited. +- Because noise-related drops usually arrive in small bursts, their + impact on the trial's overall frame loss ratio is diluted by the + longer intervals in which the SUT operates close to its noiseless + performance; consequently, the averaged Trial Loss Ratio can still + end up below the specified Goal Loss Ratio value. -{:/comment} +- If an approximation of the SUT noise impact on the Trial Loss Ratio is known, + it can be set as the Goal Loss Ratio (see definitions of + Trial and Goal terms in [Trial Terms](#trial-terms) and [Goal Terms](#goal-terms)). - For more information, see an earlier draft [Lencze-Shima] (Section 5) and references there. @@ -999,18 +438,6 @@ support for non-zero loss goals makes a search algorithm more user-friendly. [RFC2544] throughput is not user-friendly in this regard. -{::comment} - - MB43: We cant claim that - - VP: Ok, but also current sentence has circular dependency between non-zero rates - and specific user-friendliness. Reformulate. - - MK: done. - MK: Edited. - -{:/comment} - Furthermore, allowing users to specify multiple loss ratio values, and enabling a single search to find all relevant bounds, significantly enhances the usefulness of the search algorithm. @@ -1036,18 +463,6 @@ Section 3 of [RFC6349] for loss ratios acceptable for an accurate measurement of TCP throughput, and [Ott-Mathis-Semke-Mahdavi] for models and calculations of TCP performance in presence of packet loss. -{::comment} - - MB44: Among? - Also, indicate "at the time of writing". - - VP: Ok. - - MK: done. - MK: Edited. - -{:/comment} - ## Inconsistent Trial Results While performing throughput search by executing a sequence of @@ -1073,28 +488,6 @@ where two successive zero-loss trials are recommended, presumably because after one zero-loss trial there can be a subsequent inconsistent non-zero-loss trial. -{::comment} - - MB45: ?? - - VP: Full reference is needed. - - MK: done. - MK: Edited. - -{:/comment} - -{::comment} - - MB46: ?? - - VP: Also full reference. - - MK: done. - MK: Edited. - -{:/comment} - A robust throughput search algorithm needs to decide how to continue the search in the presence of such inconsistencies. Definitions of throughput in [RFC1242] and [RFC2544] are not specific enough @@ -1111,35 +504,11 @@ Relevant Lower Bound is the MLRsearch term that addresses this problem. # Requirements Language -{::comment} - - MB5: Move after the intro - - VP: Ok. - - MK: OK. - MK: Moved. - - VP: Currently the "intro" is quite long, so moved after "problems" now - so this is situated closer to Specification. - -{:/comment} - The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, [RFC2119] and [RFC8174] when, and only when, they appear in all capitals, as shown here. -{::comment} - - The two references have to come one after another to avoid boilerplate nit, - but the xml2rfc processing (web service) is buggy and strips rfc2119 brackets. - Luckily, having this comment here avoids the bug and creates correct .xml file. - - VP: DONE: Verify the .txt render is still ok. - -{:/comment} - This document is categorized as an Informational RFC. While it does not mandate the adoption of the MLRsearch methodology, it uses the normative language of BCP 14 to provide an unambiguous specification. @@ -1148,14 +517,6 @@ it MUST adhere to all the absolute requirements defined herein. The use of normative language is intended to promote repeatable and comparable results among those who choose to implement this methodology. -{::comment} - - VP: WONTFIX205: Mention conditional requirements if not clear from usage. - For example, RFC 2544 Trial requirements must be either honored or deviations must be reported. - Only mention in e-mail. - -{:/comment} - # MLRsearch Specification This chapter provides all technical definitions @@ -1172,86 +533,12 @@ fully specified and discussed in their own subsections, under sections titled "Terms". This way, the list of terms is visible in table of contents. -{::comment} - - VP: DONE: Explain there is no separate list for terminology, - subsections under "Terms" sections are terminology items, - packaged together with related requirements and discussions. - - MK: Edited. Reworded to avoid repetition of subsections and such. - Should read better now. - -{:/comment} - -{::comment} - - VP: DONE206: Should we explicitly mention that - the table of contents also acts as a terminology list? - -{:/comment} - -{::comment} - - MB47: Please move this to the terminology section - where we can group all conventions used in the document. - - VP: Ok. - - MK: There is no terminology section per se in this - document. See my note to your comments in the Requirements Language - section. - -{:/comment} - Each per term subsection contains a short *Definition* paragraph containing a minimal definition and all strict requirements, followed by *Discussion* paragraphs focusing on important consequences and recommendations. Requirements about how other components can use the defined quantity are also included in the discussion. -{::comment} - - MB48: Not sure this brings much - - VP: Ok, delete. - - MK: done. - MK: Edited. - -{:/comment} - -{::comment} - - Terminology structure - Second proposal: - - The normative part of the MLRsearch specification can be decomposed - into a directed acyclic graph, where each node is a "term" - with its definition and requirements. The links in the graph are - dependencies, "later" term can only be fully defined - when all its "earlier" terms are already defined. - - Some terms define composite quantities, subsections could be used - to hold definitions of all the attributes. - - For readability, informative "discussion" text could be added, - but frequently it is convenient to use a later term - when discussing an earlier term. - - The currect structure of sections is a compromise between these motivations. - - VP: Describe this informal principle in official text? - VP: DONE: Not in draft, maybe in e-mail. - - VP: Distinguish requirements from other discussions? - VP: DONE: Not in draft11, maybe in darf12? - - MK: I suggest we refrain from introducing and using DAG or other - analogies from graph theory to explain how terms and terminology - are managed in this document. - -{:/comment} - ## Scope This document specifies the Multiple Loss Ratio search (MLRsearch) methodology. @@ -1314,14 +601,6 @@ The following aspects are explicitly out of the normative scope of this document ## Architecture Overview -{::comment} - - DONE: Calls/invocations, interfaces. - Low priority but useful for cleaning up the language. - Maybe already fixed by MB49? - -{:/comment} - Although the normative text references only terminology that has already been introduced, explanatory passages beside it sometimes profit from terms that are defined later in the document. To keep the initial @@ -1336,13 +615,6 @@ is purely conceptual; actual implementations need not exchange explicit messages. When the text contrasts alternative behaviours, it refers to the different implementations of the same component. -{::comment} - - VP: DONE207: Re-check usage of Implementation. - Mention in e-mail. - -{:/comment} - A test procedure is considered compliant with the MLRsearch Specification if it can be conceptually decomposed into the abstract components defined herein, and each component satisfies the @@ -1363,36 +635,6 @@ The Manager calls a Controller once, and the Controller then invokes the Measurer repeatedly until Controler decides it has enough information to return outputs. -{::comment} - - MB49: Invoke? - Maybe better to clarify what is actually meant by "calls". - - VP: Mention function calls in first sentence of this subsection. - - MK: done. - MK: Edited. - - VP: Imperative programming introduced, "calls" is not correct. - - MK: OK. - -{:/comment} - -{::comment} - - MB50: Is there only one? Always? - Include a provision to have many - - VP: Add a sentence about one search. - Complete test suite may perform multiple searches, using maybe different controllers - - MK: Not sure what you mean. It already says "stopping conditions", - implying there are many. - MK: New text LGTM. - -{:/comment} - The part during which the Controller invokes the Measurer is termed the Search. Any work the Manager performs either before invoking the Controller or after Controller returns, falls outside the scope of the @@ -1401,17 +643,6 @@ Search. MLRsearch Specification prescribes Regular Search Results and recommends corresponding search completion conditions. -{::comment} - - MB51: Does this also cover "abort" (before completion) to handle some error conditions? - Or this is more a "stop execution"? - - VP: Add sentences about regular exits, irregular errors and user aborts? - - MK: Stop execution of the search. - -{:/comment} - Irregular Search Results are also allowed, they have different requirements and their corresponding stopping conditions are out of scope. @@ -1425,31 +656,6 @@ according to Goal Width, the Regular Goal Result is found. Search stops when all Regular Goal Results are found, or when some Search Goals are proven to have only Irregular Goal Results. -{::comment} - - Note: - This comment was about load classifications being equivalent among implementations. - We deleted tat sentence, keeping this block just for tracking purposes. - - MB52: Do we have taxonomoy/means to make that equivalence easy to put in place? - - VP: Add links to Goal Result or Load Classification. - But maybe this sentence is not needed in this subsection? - - MK: It is covered in Sections 4.6.2 Load Classification and 6.1 - Load Classification Logic and 6.4.3 Load Classification - Computations. - -{:/comment} - -{::comment} - - Repeating the same search is possible, this is about single search. - DONE by mentioning repeated benchmarks. - - -{:/comment} - ### Test Report A primary responsibility of the Manager is to produce a Test Report, @@ -1476,19 +682,6 @@ MLRsearch Specification by itself does not guarantee that the Search ends in finite time, as the freedom the Controller has for Load selection also allows for clearly deficient choices. -{::comment} - - MB53: I suggest we be factual and avoid use of "believe" and so on. - - VP: Ok. - - MK: Ok. - MK: Removed. - - VP: Verified, currently fixed everywhere. - -{:/comment} - For deeper insights on these matters, refer to [FDio-CSIT-MLRsearch]. The primary MLRsearch implementation, used as the prototype @@ -1500,18 +693,6 @@ MLRsearch Specification uses a number of specific quantities, some of them can be expressed in several different units. -{::comment} - - MB54: "S" is used in the previous section, - Please pick one form and be consistent through the document. - - VP: S - - MK: MLRsearch Specification. Done all. - MK: Edited. - -{:/comment} - In general, MLRsearch Specification does not require particular units to be used, but it is REQUIRED for the test report to state all the units. For example, ratio quantities can be dimensionless numbers between zero and one, @@ -1521,17 +702,6 @@ For convenience, a group of quantities can be treated as a composite quantity. One constituent of a composite quantity is called an attribute. A group of attribute values is called an instance of that composite quantity. -{::comment} - - MB55: Please check - - VP: Reformulate. - - MK: Fixed punctuation and broken sentence. - MK: Edited. - -{:/comment} - Some attributes may depend on others and can be calculated from other attributes. Such quantities are called derived quantities. @@ -1559,17 +729,6 @@ value of "duration" is expected to be equal to "final duration" value. ## Existing Terms -{::comment} - - MB56: I would delete. - - VP: Not sure yet. - - MK: Edited, instead of deleting. - MK: Edited. - -{:/comment} - This specification relies on the following three documents that should be consulted before attempting to make use of this document: @@ -1587,28 +746,6 @@ be consulted before attempting to make use of this document: Definitions of some central terms from above documents are copied and discussed in the following subsections. -{::comment} - - MB57: Please move this to a terminology section suggested above - - VP: Ok for paragraph text... - - MK: See my note re your comment to the Requirements Language - section. We ended up keeping the Existing Terms section just before - the MLRsearch specific terms for clarity and easier reading, based - on feedback from BMWG. - -{:/comment} - -{::comment} - - Some discussed aspects are important and specific to MLRsearch Specification, - that is why these terms get full subsections, instead of just external references. - - VP: DONE208: Downgrade the last sentence from draft to e-mail? - -{:/comment} - ### SUT Defined in Section 3.1.2 of [RFC2285] as follows. @@ -1624,21 +761,6 @@ Discussion:   : An SUT consisting of a single network device is allowed by this definition. -{::comment} - - MB58: Do we need to include this? - I would only introduce deviation from base specs. - - VP: Ok on deviation, not sure on base definition. - - MK: We do need to include this, as the SUT and DUT terms are used - repeatedly and are fundamental to understanding this - specification. - - VP: Edited. - -{:/comment} -   : In software-based networking SUT may comprise multitude of networking applications and the entire host hardware and software @@ -1658,19 +780,6 @@ Definition: : The network forwarding device to which stimulus is offered and response measured. -{::comment} - - MB59: This reasons about "device", should we say that we extends this to "function"? - - VP: Yes. Extend discussion. If device requires medium/cables, - function can be working with something software-like - (packet vectors, shared memory regions). - - MK: added text covering this. - MK: Edited. - -{:/comment} - Discussion:   @@ -1683,16 +792,6 @@ MLRsearch Specification, but is of key relevance for its motivation. The device can represent a software-based networking functions running on commodity x86/ARM CPUs (vs purpose-built ASIC / NPU / FPGA). -{::comment} - - MB60: Idem as SUT - - VP: Yes. - - MK: See my note re SUT. - -{:/comment} -   : A well-designed SUTs should have the primary DUT as their performance bottleneck. The ways to achieve that are outside of MLRsearch Specification scope. @@ -1743,20 +842,6 @@ Discussion: as sent and received by a tester, as implicitly defined in Section 6 of [RFC2544]. -{::comment} - - MB61: Is there any aspect new to MLRS? - - VP: No, make clear. - - MK: Yes, it is covered in detail in the following sections. The - important part in this section, apart from quoting the original - definition, is the discussion part, that sets the convention of how - deviations from the original definition are captured in this - document. - -{:/comment} -   : The definition describes some traits, not using capitalized verbs to signify strength of the requirements. @@ -1766,33 +851,6 @@ but any such deviation MUST be described explicitly in the Test Report. It is still RECOMMENDED to not deviate from the description, as any deviation weakens comparability. -{::comment} - - MB62: Not a normative language - - VP: Reformulate. - - MK: ok. changed from ALLOWED to allowed. is anything else needed? - MK: Edited. - - VP: I feel this is important. Not only as a notable deviation from RFC 2544, - but also as an example of normative language usage. - Where RFC 2544 says you MUST do A or you CANNOT do B, - MLRsearch may say there are specific conditions where you do not have to do A or can de B. - Med had few comments like "since there is exception, the requirement is not universal", - and I say "there are clear conditions, the requirement is universal if the conditions are satisfied". - - VP: Contruct appropriate "conditional requirement" sentence. - DONE: Sentence improved. - -{:/comment} - -{::comment} - - VP: DONE: No time to mention "allowed if worse" principle in draft11 cycle. - -{:/comment} -   : An example of deviation from [RFC2544] is using shorter wait times, compared to those described in phases a), b), d) and e). @@ -1806,64 +864,22 @@ any such time-sensitive per-trial configuration method, with bridge MAC learning being only one possible examples. Appendix C.2.4.1 of [RFC2544] lists another example: ARP with wait time of 5 seconds. -{::comment} - - MB63: Not a normative term - - VP: Ok. - - MK: ok. MB and MK edits applied. - MK: Edited. - -{:/comment} - -{::comment} - - VP: DONE: Emphasize that this is a single trial. - Any recurring tests count as separate trials, - because they give different results. - -{:/comment} -   : Some methodologies describe recurring tests. If those are based on Trials, they are treated as multiple independent Trials. ## Trial Terms -{::comment} - - WONTFIX209: Separate short descriptions from further discussions. Everywhere. - - Too late for draft-11. - -{:/comment} - This section defines new and redefine existing terms for quantities relevant as inputs or outputs of a Trial, as used by the Measurer component. This includes also any derived quantities related to results of one Trial. ### Trial Duration -Definition: - -  -: Trial Duration is the intended duration of the phase c) of a Trial. - -{::comment} - - MB64: Does this cover also recurrences? - See, e.g., draft-ietf-netmod-schedule-yang-05 - A Common YANG Data Model for Scheduling - or draft-ietf-opsawg-scheduling-oam-tests-00? - - VP: No, mention that probably already in trial definition. - - MK: No, it does not cover recurrences as specified in above two - drafts, as it does involve scheduled events. - - VP: Created comment block at appropriate subsections. +Definition: -{:/comment} +  +: Trial Duration is the intended duration of the phase c) of a Trial. Discussion: @@ -1877,17 +893,6 @@ nearest integer in seconds. In that case, it is RECOMMENDED to give such inputs to the Controller so that the Controller only uses the accepted values. -{::comment} - - MB65: To? - - VP: Reformulate. - - MK: Edited "proposes" => "uses". - MK: Edited. - -{:/comment} - ### Trial Load Definition: @@ -1911,18 +916,6 @@ as specified in Section 3.4 of [RFC1242]). Informally, Traffic Load is a single number that can "scale" any traffic pattern as long as the intuition of load intended against a single interface can be applied. -{::comment} - - MB66: Please fix all similar ones in the doc - - VP: Ok. - - MK: ok. fixed only here for now. - MK: DONE fix everywhere. - MK: Edited. - -{:/comment} -   : It MAY be possible to use a Trial Load value to describe a non-constant traffic (using average load when the traffic consists of repeated bursts of frames @@ -1949,45 +942,6 @@ Trial Load is the data rate per direction, half of aggregate data rate. : Traffic patterns where a single Trial Load does not describe their scaling cannot be used for MLRsearch benchmarks. -{::comment} - - VP: DONE: Put bursty and other non-constant loads outside of the scope? - -{:/comment} - -{::comment} - - VP: DONE: What about multiple-interface loads if not equal among interfaces? - - VP: DONE: What about interfaces with different medium capacity (bandwidth or pps)? - -{:/comment} - -{::comment} - - MB67: Example of an example. :) Please reword. - - VP: Ok. - - MK: Edited. - -{:/comment} - -{::comment} - - MB68: Can we also cover load percentiles? - The avg may not be representative to stress functions - with anti-ddos guards, for example. - - VP: Not here. The average woks with aggregate counters used in loss definition. - Maybe discuss anti-ddos in Traffic Profile subsection. - - MK: Definition of burst traffic profiles is out of scope. - - VP: DONE: Re-check the current text. - -{:/comment} -   : Similarly to Trial Duration, some Measurers MAY limit the possible values of Trial Load. Contrary to Trial Duration, @@ -1995,27 +949,10 @@ documenting such behavior in the test report is OPTIONAL. This is because the load differences are negligible (and frequently undocumented) in practice. -{::comment} - - MB69: Inappropriate use of normative language - - VP: Maybe disagree? - Reformulate other parts to stress test report is subject to requirements. - - MK: Edited. - -{:/comment} -   : The Controller MAY select Trial Load and Trial Duration values in a way that would not be possible to achieve using any integer number of data frames. -{::comment} - - VP: DONE: Use normative MAY somewhere. - -{:/comment} -   : If a particular Trial Load value is not tied to a single Trial, e.g., if there are no Trials yet or if there are multiple Trials, @@ -2031,30 +968,10 @@ port), or (iii) the total across every interface. For any aggregate load value, the report MUST also give the fixed conversion factor that links the per-interface and multi-interface load values. -{::comment} - - MB70: The causality effect may not be evident for the subset case, at least. - - VP: Reformulate. - - MK: Edited. - -{:/comment} -   : The per-interface value remains the primary unit, consistent with prevailing practice in [RFC1242], [RFC2544], and [RFC2285]. -{::comment} - - MB71: Which ones? - - VP: List the common examples. - - MK: Edited. - -{:/comment} -   : The last paragraph also applies to other terms related to Load. @@ -2128,35 +1045,6 @@ are outside of the scope of this document. An example standardization effort is [Vassilev], a draft at the time of writing. -{::comment} - - VP: DONE210: Mention the YANG draft as a possible avenue? - -{:/comment} - -{::comment} - - MB72: Can we provide an example how to make that? - - VP: Nope. Say it is an integration effort. - - MK: Edited. - - VP: DONE. - -{:/comment} - -{::comment} - - MB73: This is too vague. Unless we reword top better reflect the requirement, - I don't think we can use the normative language here - - VP: Reformulate. - - MK: Edited. - -{:/comment} -   : Examples of traffic properties include: - Data link frame size @@ -2168,26 +1056,6 @@ a draft at the time of writing. - Symmetric bidirectional traffic - Section 14 of [RFC2544]. -{::comment} - - MB74: Inappropriate use of normative language - - VP: Reformulate. - - MK: Edited. - -{:/comment} - -{::comment} - - MB75: Idem as above. MUST is not appropriate here. - - VP: Reformulate. - - MK: Edited. - -{:/comment} -   : Other traffic properties that need to be somehow specified in Traffic Profile, and MUST be mentioned in Test Report @@ -2199,14 +1067,6 @@ if they apply to the benchmark, include: - modifiers from Section 11 of [RFC2544]. - IP version mixing from Section 5.3 of [RFC8219]. -{::comment} - - VP: Multiple traffic profiles (at least frame sizes) in RFC2544, - this is about single SUT+config+profile benchmark. - DONE: I thihnk the current sentences are good enough. - -{:/comment} - ### Trial Forwarding Ratio Definition: @@ -2227,27 +1087,6 @@ This SHOULD be the default interpretation. Only if this is not the case, the test report MUST describe the Traffic Profile in a detail sufficient to imply how Trial Forwarding Ratio should be calculated. -{::comment} - - MB76: MUST is an absolute requirement (i.e., there is no exception): - 1. MUST This word, or the terms "REQUIRED" or "SHALL", - mean that the definition is an absolute requirement of - the specification. - SHOULD This word, or the adjective "RECOMMENDED", - mean that there may exist valid reasons in particular - circumstances to ignore a particular item, but the full - implications must be understood and carefully weighed - before choosing a different course. - - VP: Reformulate. - - MK: Edited. - - VP: DONE: Apply stricter conditional requirements. - E-mail: explain conditional requirements. - -{:/comment} -   : Trial Forwarding Ratio MAY be expressed in other units (e.g., as a percentage) in the test report. @@ -2265,20 +1104,6 @@ even though the final value is "rate" that is still per-interface. if one direction is forwarded without losses, but the opposite direction does not forward at all, the Trial Forwarding Ratio would be 0.5 (50%). -{::comment} - - MB77: Should we call for more granularity to be provided/characterized? - - VP: No, include sentence on why. - - MK: What is the granularity that is needed here? The test - procedure is about testing SUT as a single system, not parts of - it. - - VP: DONE: Add the missing sentence. - -{:/comment} -   : In future extensions, more general ways to compute Trial Forwarding Ratio may be allowed, but the current MLRsearch Specification relies on this specific @@ -2291,18 +1116,6 @@ Definition:   : The Trial Loss Ratio is equal to one minus the Trial Forwarding Ratio. -{::comment} - - MB78: For all sections, please indent so that we separate the def/discussion vs. description - - VP: Ok. - - MK: Edited. Indented 2 spaces, will kramdown renderer take it? - - VP: Applied the way from https://stackoverflow.com/a/59612110 instead. - -{:/comment} - Discussion:   @@ -2338,32 +1151,6 @@ Section 14 of [RFC2544], the Trial Forwarding Rate is numerically equal to the arithmetic average of the individual per-interface forwarding rates that would be produced by the RFC 2285 procedure. -{::comment} - - MB79: Do we have an authoritative reference where this is defined? - If not, please add an definition entry early in the terminology section. - - VP: Add reference. - - MK: Edited. Added reference to RFC2544. - -{:/comment} - -{::comment} - - MB80: Why both? - - VP: Add explanations to Traffic Profile subsection. - - MK: Edited. But shouldn't it say "sum of" instead of "arithmetic - average"? Unless specified, Trial Forwarding Rate is an aggregate - rate, not per interface, as it is representating capability of - DUT/SUT not a subset of it associated with particular interface :) - - VP: DONE: Checked, it is average. - -{:/comment} -   : For more complex traffic patterns, such as many-to-one as mentioned in Section 3.3.2 Partially Meshed Traffic of [RFC2285], @@ -2387,21 +1174,6 @@ Definition: : Trial Effective Duration is a time quantity related to a Trial, by default equal to the Trial Duration. -{::comment} - - MB81: For the periodic/recurrences, does it cover only one recurrence - or from start to last independent of in-between execution periods? - - VP: Make sure Trial implies no recurrence. - - MK: Edited. BUT - Why do we need to state that. There is nothing in the text of - Section 23 of RFC2544 and in above sections implying recurrences. - Why then do we need to explicity say "no recurrence"? - - VP: DONE: After Trial is stable, simplify this sentence. - -{:/comment} - Discussion:   @@ -2413,16 +1185,6 @@ the Controller MUST use the Trial Duration value instead. : Trial Effective Duration may be any positive time quantity chosen by the Measurer to be used for time-based decisions in the Controller. -{::comment} - - MB82: It is obvious, but should we say "positive"? - - VP: Yes. - - MK: Edited. - -{:/comment} -   : The test report MUST explain how the Measurer computes the returned Trial Effective Duration values, if they are not always @@ -2435,16 +1197,6 @@ rather than solely the traffic portion of it. An approach is to measure the duration of the whole trial (including all wait times) and use that as the Trial Effective Duration. -{::comment} - - MB83: To be defined early in the terminology section - - VP: Ok. - - MK: Edited. - -{:/comment} -   : This is also a way for the Measurer to inform the Controller about its surprising behavior, for example, when rounding the Trial Duration value. @@ -2471,20 +1223,6 @@ ignore values of any optional attribute they are not familiar with, except when passing Trial Output instances to the Manager. -{::comment} - - MB84: As we have an exception - - VP: Reformulate. - Conditional MUST has an authoritative prescribed condition, - SHOULD gives implementers freedom to choose their own conditions. - - MK: Edited. - - VP: Done: Stricter conditional requirements not needed. - -{:/comment} -   : Example of an optional attribute: The aggregate number of frames expected to be forwarded during the trial, @@ -2511,24 +1249,6 @@ Discussion: : When referring to more than one trial, plural term "Trial Results" is used to collectively describe multiple Trial Result instances. -{::comment} - - While Controller implementations SHOULD NOT include additional attributes - with independent values, they MAY include derived quantities. - - MB85: Can we include a short sentence to explain the risk if not followed? - - VP: Now I think even SHOULD NOT is too strong. Either way, reformulate. - - MK: For Vratko. Isn't this already covered in Trial Output? What - other optional attributes are applicable here, give examples? - Otherwise it's too abstract, open-ended, ambiguous and so on ... - Many other blue-sky and hand-wavy adjectives come to my mind :) - - VP: DONE: Deleted - -{:/comment} - ## Goal Terms This section defines new terms for quantities relevant (directly or indirectly) @@ -2561,16 +1281,6 @@ Discussion: : Certain trials must reach this minimum duration before a load can be classified as a lower bound. -{::comment} - - MB86: I don't parse this. - - VP: Reformulate. - - MK: Edited. - -{:/comment} -   : The Controller may choose shorter durations, results of those may be enough for classification as an Upper Bound. @@ -2589,18 +1299,6 @@ Definition: : A threshold value for a particular sum of Trial Effective Duration values. The value MUST be positive. -{::comment} - - MB87: I like this, but we should be consistent - and mention it when appropriate for all other metrics - - VP: Ok. Check everywhere. - - MK: Checked all subsections under Goal Terms and Trial Terms. - Applied as appropriate. - -{:/comment} - Discussion:   @@ -2753,23 +1451,6 @@ some of them are required. : Required attributes: Goal Final Trial Duration, Goal Duration Sum, Goal Loss Ratio and Goal Exceed Ratio. -{::comment} - - MB88: Listing the attributes this way allows to easily classify mandatory/optional. - However, this not followed in previous. Please pick your favorite approach - and use it in a consistent manner in the document. - - VP: Use this longer way everywhere (also saying if no other attributes could be added). - Tangent: Be more lenient on attributes internal to Controller? - - MK: Edited this one. Applied to subsections in Trial Terms and - Goal Terms as appropriate. - - WONTFIX211 check if more places need this. - Too late. - -{:/comment} -   : Optional attributes: Goal Initial Trial Duration and Goal Width. @@ -2782,24 +1463,6 @@ even if they are not required by MLRsearch Specification. However, it is RECOMMENDED for those implementations to support missing attributes by providing typical default values. -{::comment} - - MB89: I guess I understand what is meant here, but I think this should be reworded - to avoid what can be seen as inconsistency: do not support vs. support a default. - - VP: Yes, probably worth a separate subsection, - distinguishing automated implementations from manual processes. - - MK: No separate subsection. We should state that that the listed - optional attributes should have documented default values. But i do - not like the open-ended "Implementations MAY add their own - attributes." Either examples are added or this sentence is - removed. - - VP: DONE: Check if Specification does not mention "implementation". - -{:/comment} -   : For example, implementations with Goal Initial Trial Durations may also require users to specify "how quickly" should Trial Durations increase. @@ -2847,12 +1510,6 @@ It is the maximal value the Controller is allowed to use for Trial Load values. Discussion: -{::comment} - - VP: DONE: Use MUST NOT to make Controller behavior constrained, conditionally? - -{:/comment} -   : Max Load is an example of an optional attribute (outside the list of Search Goals) required by some implementations of MLRsearch. @@ -2884,12 +1541,6 @@ is that it makes the search result independent of Max Load value. Test Report MAY express this quantity using multi-interface values, as sum of per-interface maximal loads. -{::comment} - - VP: DONE: Use MAY. - -{:/comment} - #### Min Load Definition: @@ -2900,12 +1551,6 @@ It is the minimal value the Controller is allowed to use for Trial Load values. Discussion: -{::comment} - - VP: DONE: Use MUST NOT? - -{:/comment} -   : Min Load is another example of an optional attribute required by some implementations of MLRsearch. @@ -2933,12 +1578,6 @@ and the implementation can apply relative Goal Width safely. Test Report MAY express this quantity using multi-interface values, as sum of per-interface minimal loads. -{::comment} - - VP: DONE: Use MAY. - -{:/comment} - ## Auxiliary Terms While the terms defined in this section are not strictly needed @@ -3022,14 +1661,6 @@ can only happen when more than Goal Duration Sum of trials are measured Informally, the previous Upper Bound got invalidated. In practice, the Load frequently becomes a [Lower Bound](#lower-bound) instead. -{::comment} - - VP: DONE: Reformulate to avoid the "we" construct. - - VP: DONE: Do we need Invalidation as a separate term? I guess no. - -{:/comment} - #### Lower Bound Definition: @@ -3067,16 +1698,6 @@ can only happen when more than Goal Duration Sum of trials are measured Informally, the previous Lower Bound got invalidated. In practice, the Load frequently becomes an [Upper Bound](#upper-bound) instead. -{::comment} - - Same as in upper bound: - - VP: DONE: Reformulate to avoid the "we" construct. - - VP: DONE: Do we need Invalidation as a separate term? I guess no. - -{:/comment} - #### Undecided Definition: @@ -3131,13 +1752,7 @@ it is not affected by Max Load value.   : Given that Relevant Upper Bound is a quantity based on Load, Test Report MAY express this quantity using multi-interface values, -as sum of per-interface loads. - -{::comment} - - VP: DONE: Use MAY. - -{:/comment} +as sum of per-interface loads. ### Relevant Lower Bound @@ -3168,12 +1783,6 @@ for a Relevant Lower Bound if larger Loads were possible. Test Report MAY express this quantity using multi-interface values, as sum of per-interface loads. -{::comment} - - VP: DONE: Use MAY. - -{:/comment} - ### Conditional Throughput Definition: @@ -3215,12 +1824,6 @@ and comparability of different MLRsearch implementations. Test Report MAY express this quantity using multi-interface values, as sum of per-interface forwarding rates. -{::comment} - - VP: DONE: Use MAY. - -{:/comment} - ### Goal Results MLRsearch Specification is based on a set of requirements @@ -3237,32 +1840,6 @@ Definition: Relevant Upper Bound and Relevant Lower Bound are REQUIRED attributes. Conditional Throughput is a RECOMMENDED attribute. -{::comment} - - MB90: To do what? I'm afraid we need to explicit the meaning here. - - VP: Yes, reformulate. - - MK: Edited. - -{:/comment} - -{::comment} - - MB91: Isn't this redundant with listing the bounds as required in the previous definition? - - VP: Do we need separation between may-not-exist and must-exist quantities? - Either way, reformulate. - - MK: Deleted. Agree with Med - Sentence was redundant as already - covered by text in definition "Relevant Upper Bound and Relevant - Lower Bound are REQUIRED attributes." - - VP: WONTFIX212: Re-check. - Too late. - -{:/comment} - Discussion:   @@ -3347,18 +1924,6 @@ sequence of their corresponding Search Goal instances. : When the Search Result is expressed as a mapping, it MUST contain an entry for every Search Goal instance supplied in the Controller Input. -{::comment} - - MB92: To what? - - VP: Subsections on quantities and interfaces should mention equivalent representations. - Then reformulate this. - - MK: Edited. First two paragraphs in Discussion changed to make it - clearer. - -{:/comment} -   : Identical Goal Result instances MAY be listed for different Search Goals, but their status as regular or irregular may be different. @@ -3380,21 +1945,6 @@ Discussion: : MLRsearch implementation MAY return additional data in the Controller Output, e.g., number of trials performed and the total Search Duration. -{::comment} - - VP: DONE low priority: Regular end, irregular exit, user abort. - Should not need new text, review related MD comments. - Maybe differentiate abort conditions, or at least make them explicitly vague? - -{:/comment} - -{::comment} - - VP: DONE elsewhere: Emphasize one controller call gives one benchmark. - Any recurring tests count as independent benchmarks. - -{:/comment} - ## Architecture Terms MLRsearch architecture consists of three main system components: @@ -3403,21 +1953,6 @@ The components were introduced in [Architecture Overview](#architecture-overview and the following subsections finalize their definitions using terms from previous sections. -{::comment} - - MB93: I guess these should be introduced before the attributes as these components - are used in the description. Please reconsider the flow of the document. - - VP: Reformulate this to clarify overview introduced, this finalizes the definition. - - MK: Edited. And I disagree. Three components of the architecture - are listed, with definitions following. I do not envisage any - problem from the reader perspective. - - VP: DONE: Added a sentence. - -{:/comment} - Note that the architecture also implies the presence of other components, such as the SUT and the tester (as a sub-component of the Measurer). @@ -3429,21 +1964,6 @@ to call the Measurer indirectly instead. In doing so, the Measurer implementatio can be fully independent from the Controller implementations, e.g., developed in different programming languages. -{::comment} - - MB94: Aha, this answers a comment I made earlier :) - Let's save cycles for other readers and move all this - section early in the document. - - VP: Hmm, maybe a subsection of overview? - Definitely something needs to be moved around. - - MK: Edited. And addressed the original concern. See my note at MB93. - - VP: DONE: Overview got updated. - -{:/comment} - ### Measurer Definition: @@ -3505,16 +2025,6 @@ cycle continues until the stopping conditions are met, at which point the Controller produces a final Controller Output instance and terminates. -{::comment} - - MB95: Till a stop? - - VP: Yes. - - MK: Edited. It should be clear now. - -{:/comment} - Discussion:   @@ -3560,12 +2070,6 @@ the size of frame handling buffers between tests of frame handling rates or to disable all but one transport protocol when testing the throughput of that protocol." -{::comment} - - VP: DONE: Nested "definition list" does not work. Use quotes here? - -{:/comment} -   : It is REQUIRED for the test report to encompass all the SUT configuration details, including description of a "default" configuration common for most tests @@ -3592,28 +2096,6 @@ as a fresh, independent Search; how the system behaves across multiple calls (for example, combining or comparing their results) is explicitly out of scope for this document. -{::comment} - - MB96: This answers a comment I have earlier. - Please move all these details to be provided early. - - VP: Yes (covered by earlier comments). - - MK: Yes - covered by earlier edits. - -{:/comment} - -{::comment} - - MB97: Should there be a mode where conditional calls are invoked? - Or more generally to instruct some dependency? - - VP: Explain in earlier subsections, repeats are out of scope. - - MK: Edited. It should be clear now that repeats are out of scope. - -{:/comment} - ## Compliance This section discusses compliance relations between MLRsearch @@ -3654,25 +2136,6 @@ unconditionally compliant with Section 24 of [RFC2544]. - Goal Loss Ratio = 0% - Goal Exceed Ratio = 0% -{::comment} - - MB98: Not related but triggered by this, - can we have at the end of the document a table with all - the default values/recommended for the various - attributes defined in the document? - - VP: Maybe? Revisit later to see if we have enough data to warrant table format. - - MK: WONTFIX213. This is not a bad idea. A section that in summary table - lists common usage cases with recommended settings e.g. RFC2544, - TST009, FD.io CSIT, examples of SUTs with certain behaviour e.g. - suspected periodic SUT disruption. It will make it more concrete to - the reader and verify their understanding of the spec. - - VP: I think too low priority for draft11. - -{:/comment} - Goal Loss Ratio and Goal Exceed Ratio attributes, are enough to make the Search Goal conditionally compliant. Adding Goal Final Trial Duration @@ -3719,26 +2182,6 @@ so third full-length trial is never needed. # Methodology Rationale and Design Considerations -{::comment} - - MB99: Please consider that a more explicit title that reflects the content. - - VP: Yes, but not sure what would be a better title yet. - - MK: Edited. Also updated opening paragraph to motivate the reader. - -{:/comment} - -{::comment} - - Manual processes, automation, implementation as library,... - - DONE: Recheck specification minimizes user/iplementation discussions. - - DONE low priority: Add those discussions here somewhere is useful. - -{:/comment} - This section explains the Why behind MLRsearch. Building on the normative specification in Section [MLRsearch Specification](#mlrsearch-specification), @@ -3798,16 +2241,6 @@ and its variance. The biggest -{::comment} - - MB100: We don't need to say it if it is obvious ;) - - VP: Reformulate. - - MK: Edited. - -{:/comment} - difference between MLRsearch and [RFC2544] binary search is in the goals of the search. [RFC2544] has a single goal, based on classifying a single full-length trial @@ -3828,16 +2261,6 @@ when the search is started with only one Search Goal instance. MLRsearch Specification -{::comment} - - MB101: Specification? - - VP: Ok. - - MK: Edited. - -{:/comment} - supports multiple Search Goals, making the search procedure more complicated compared to binary search with single goal, but most of the complications do not affect the final results much. @@ -3907,17 +2330,6 @@ The idea of performing multiple Trials at the same Trial Load comes from a model where some Trial Results (those with high Trial Loss Ratio) are affected by infrequent effects, causing unsatisfactory repeatability -{::comment} - - MB102: Or other similar terms, but not poor thing. - Please consider the same change in other parts of the document. - - VP: Ok, search&replace. - - MK: Edited. Searched and replaced all with unsatisfactory, unacceptable. - -{:/comment} - of [RFC2544] Throughput results. Refer to Section [DUT in SUT](#dut-in-sut) for a discussion about noiseful and noiseless ends of the SUT performance spectrum. @@ -3959,16 +2371,6 @@ An MLRsearch implementation MAY expose configuration parameters that decide whether, when, and how short trial durations are used. The exact heuristics and controls are left to the discretion of the implementer. -{::comment} - - MB103: We may say that how this is exposed to a user/manager is implmentation specific. - - VP: Earlier subsection should explain when discussing implementations. - - MK: Edited. - -{:/comment} - While MLRsearch implementations are free to use any logic to select Trial Input values, comparability between MLRsearch implementations is only assured when the Load Classification logic @@ -4002,16 +2404,6 @@ The clearest illustration - and the chief reason for adopting a generalized throughput definition - is the presence of a hard performance limit. -{::comment} - - MB104: Not sure to parse this. - - VP: Reformulate. - - MK: Edited. - -{:/comment} - ### Hard Performance Limit Even if bandwidth of a medium allows higher traffic forwarding performance, @@ -4021,22 +2413,6 @@ e.g., a specific frames-per-second limit on the NIC (a common occurrence). Those limitations should be known and provided as Max Load, Section [Max Load](#max-load). -{::comment} - - MB105: We may say that some implementation may expose their capabilities - using IPFIX/YANG, but such exposure is out of scope. - - VP: Add capability exposition to earlier implementation subsections. - Reformulate this sentence to be specific to hard limits. - - MK: Edited. Capability exposition of SUT and DUT is out of scope - of this document. Do we need to state it in the opening somewhere? - COTS NICs do not support network configuration protocols, - they are configured using vendor specific registers and associated - kernel or userspace drivers. - -{:/comment} - But if Max Load is set larger than what the interface can receive or transmit, there will be a "hard limit" behavior observed in Trial Results. @@ -4050,14 +2426,6 @@ counter-intuitive. Accordingly, the [RFC2544] Throughput metric should be generalized - rather than relying solely on the Relevant Lower Bound - to reflect realistic, limit-aware performance. -{::comment} - - MK: Edited. Above paragraph was not reading well. Following from - MB105 I have updated it further to motivate generalization of - throughput. - -{:/comment} - MLRsearch defines one such generalization, the [Conditional Throughput](#conditional-throughput). It is the Trial Forwarding Rate from one of the Full-Length Trials @@ -4080,12 +2448,6 @@ are equal to the Goal Loss Ratio), one can prove that Conditional Throughput values may have up to the Goal Loss Ratio relative difference. -{::comment} - - VP: DONE: Reformulate to avoid "we" construct. - -{:/comment} - Setting the Goal Width below the Goal Loss Ratio may cause the Conditional Throughput for a larger Goal Loss Ratio to become smaller than a Conditional Throughput for a goal with a lower Goal Loss Ratio, @@ -4118,19 +2480,6 @@ and uses more intuitive names for the intermediate values. Note: For explanation clarity variables are taged as (I)nput, (T)emporary, (O)utput. -{::comment} - - MB106: Move this to the terminology/convention section - - VP: I do not think these flags fit into terminology. - For this long list, maybe divide into sublists? - - MK: I agree - this is does not belong to draft terminology - section. And I agree, for readability we could split the long list - into groups with meaningful headers. See my attempt to do so below. - -{:/comment} - - Collect Trial Results: - Take all Trial Result instances (I) measured at a given load. @@ -4337,62 +2686,9 @@ The DUT/SUT SHOULD NOT include features that serve only to boost benchmark scores - such as a dedicated "fast-track" test mode that is never used in normal operation. -{::comment} - - MB109: Some more elaboration is needed - - VP: This needs BMWG discussion as this chapter is a "boilerplate" - copied from earlier BMWG documents. - - MK: Edited - - VP: Ok, for draft11, but we can start discussing on bmwg for later versions. - -{:/comment} - Any implications for network security arising from the DUT/SUT SHOULD be identical in the lab and in production networks. -{::comment} - - MB110: Why? We can accept some relax rule in controlled environnement, - but this not acceptable in deployement. I would adjust accordingly. - - VP: Explain and discuss in BMWG. - - MK: Keeping as is. It is a BMWG standard text that applies here. - You can see it verbatim in RFC 6815 (section 7), RFC 6414 (section 4.1), RFC - 9004 (section 8), and several BMWG Internet-Drafts. Its purpose is to - remind implementers and testers that the device under test must not - be re-configured into an unrealistic or less-secure state merely to - obtain benchmark data - a principle that complements the adjacent - sentence about avoiding "special benchmarking modes." Including - the sentence therefore maintains consistency with BMWG precedent - and reinforces a key security expectation. - -{:/comment} - -{::comment} - - MB111: I would some text to basically - say that the benchmarking results should be adequately - protected and guards top prevent leaks to unauthorized - entities. - Otherwise, the benchmark results can be used by - attacker to better adjust their attacks and perform - attacks that would lead to DDoS a node of the DUT in a - live network, infer the limitation of a DUT that can be - used for overflow attacks, etc. - Also, we can say that the benchmark is agnostic to trafic - and does not manipulate real traffic. As such, Privacy is - not a concern. - - VP: To BMWG. - - MK: Keeping as is. See my comments above at MB110. - -{:/comment} - # Acknowledgements Special wholehearted gratitude and thanks to the late Al Morton for his @@ -4413,18 +2709,6 @@ versions of this document. # Load Classification Code -{::comment} - - MB112: Move after references - - VP: Ok. - - MK: Move after references. - - VP: Done by moving "--- back" above. - -{:/comment} - This appendix specifies how to perform the Load Classification. Any Trial Load value can be classified, @@ -4440,16 +2724,6 @@ which computes two values, stored in variables named Although presented as pseudocode, the listing is syntactically valid Python and can be executed without modification. -{::comment} - - MB113: Where is that python code? - - VP: Reformulate. - - MK: Edited. - -{:/comment} - If values of both variables are computed to be true, the Load in question is classified as a Lower Bound according to the given Search Goal instance. If values of both variables are false, the Load is classified as an Upper Bound. @@ -4503,18 +2777,6 @@ optimistic_is_lower = effect_high_loss_s <= quantile_duration_s ~~~ -{::comment} - - MB114: May display this a table for better readability - - VP: Ok. - - MK: Disagree. Can we have it in a proper code block instead? - - VP: DONE: block with tags. - -{:/comment} - # Conditional Throughput Code This section specifies an example of how to compute Conditional Throughput, @@ -4585,35 +2847,8 @@ conditional_throughput = intended_load * (1.0 - quantile_loss_ratio) ~~~ -{::comment} - - MB115: Please use and markers. - - VP: Also table? Ok. - - MK: Not table, it's code. Can we have it in a proper code - block instead? - - VP: DONE: block with tags. - -{:/comment} - # Example Search -{::comment} - - MB107: We may move this section to an appendix - - VP: Ok. - - MK: Move to Appendix A, before the pseudocode Appendices. - Keeping it here for now to finish editing with clean change - tracking in gerrit. - - VP: DONE: Appendix C now. A and B are for pseudocode as that is more important. - -{:/comment} - The following example Search is related to one hypothetical run of a Search test procedure that has been started with multiple Search Goals. @@ -4792,19 +3027,6 @@ Optimistic exceed ratio | 0% | 0% | 0% | 0% Pessimistic exceed ratio | 100% | 100% | 50.833% | 100% Classification Result | Undecided | Undecided | Undecided | Undecided -{::comment} - - MB108: Please add a table legend. Idem for all tables - - VP: Ok. Figure out how. - - MK: Kramdown magic. - - VP: WONTFIX as currently not possible without coding effort: - https://github.com/gettalong/kramdown/issues/593 - -{:/comment} - This is the last point in time where all goals have this load as Undecided. ### Point 2 @@ -5068,36 +3290,3 @@ One has Trial Loss Ratio of 0%, the other of 0.1%. Due to stricter Goal Exceed Ratio, this Conditional Throughput is smaller than Conditional Throughput of the other two goals. - -{::comment} - - WONTFIX214: There are long lines. - Too late for draft-11. - -{:/comment} - -{::comment} - - VP: DONE: Fix warnings from kramdown. - -{:/comment} - -{::comment} - - [Final checklist.] - - VP WONTFIX215 Final Checks. Only mark as done when there are no active todos above. - - VP Rename chapter/sub-/section to better match their content. - - MKP3 VP WONTFIX216: Recheck the definition dependencies go bottom-up. - - VP DONE217: Unify external reference style (brackets, spaces, section numbers and names). - - MKP2 VP DONE218: Capitalization of New Terms: useful when editing and reviewing, - but I still vote to remove capitalization before final submit, - because all other RFCs I see only capitalize due to being section title. - - VP WONTFIX219: If time permits, keep improving formal style (e.g., using AI). - -{:/comment} diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-12.xml b/docs/ietf/draft-ietf-bmwg-mlrsearch-12.xml index 2291be6431..0598881865 100644 --- a/docs/ietf/draft-ietf-bmwg-mlrsearch-12.xml +++ b/docs/ietf/draft-ietf-bmwg-mlrsearch-12.xml @@ -47,7 +47,7 @@ - + This document specifies extensions to "Benchmarking Methodology for Network Interconnect Devices" (RFC 2544) throughput search by @@ -56,8 +56,6 @@ defining a new methodology called Multiple Loss Ratio search support multiple loss ratio searches, and improve result repeatability and comparability. - - MLRsearch is motivated by the pressing need to address the challenges of evaluating and testing the various data plane solutions, especially in software- based networking systems based on Commercial Off-the-Shelf @@ -65,8 +63,6 @@ software- based networking systems based on Commercial Off-the-Shelf - - @@ -76,7 +72,7 @@ software- based networking systems based on Commercial Off-the-Shelf - +
Introduction @@ -91,18 +87,14 @@ Network Function running on shared servers in the compute cloud).
Purpose - The purpose of this document is to describe the Multiple Loss Ratio search (MLRsearch) methodology, optimized for determining data plane throughput in software-based networking devices and functions. - Applying the vanilla throughput binary search, as specified for example in and to software devices under test (DUTs) results in several problems: - - Binary search takes long as most trials are done far from the eventually found throughput. @@ -119,9 +111,6 @@ throughput metric can no longer be pinned to a single, unambiguous value.) - - - To address these problems, early MLRsearch implementations employed the following enhancements: @@ -150,14 +139,9 @@ in Section 3.6.2 of , to initialize bounds. prevent the bounds from narrowing unnecessarily. - - - Enhacements 1, 2 and partly 4 are formalized as MLRsearch Specification within this document, other implementation details are out the scope. - - The remaining enhancements are treated as implementation details, thus achieving high comparability without limiting future improvements. @@ -173,14 +157,8 @@ Exact settings are not specified, but see the discussion in Overview of RFC 2544 Problems for the impact of different settings on result quality. - - - This document does not change or obsolete any part of . - - -
Positioning within BMWG Methodologies @@ -236,7 +214,6 @@ has increased both the number of performance tests required to verify the DUT update and the frequency of running those tests. This makes the overall test execution time even more important than before. - The throughput definition per restricts the potential for time-efficiency improvements. The bisection method, when used in a manner unconditionally compliant @@ -245,17 +222,12 @@ with , is excessively slow due to two main factors Firstly, a significant amount of time is spent on trials with loads that, in retrospect, are far from the final determined throughput. - - - - Secondly, does not specify any stopping condition for throughput search, so users of testing equipment implementing the procedure already have access to a limited trade-off between search duration and achieved precision. However, each of the full 60-second trials doubles the precision. - As such, not many trials can be removed without a substantial loss of precision. For reference, here is a brief throughput binary @@ -281,7 +253,6 @@ the highest zero-loss rate for every mandatory frame size. response measured Section 3.1.1 of . - SUT as: @@ -300,19 +271,16 @@ DUT, but the entire execution environment: host hardware, firmware and kernel/hypervisor services, as well as any other software workloads that share the same CPUs, memory and I/O resources. - Given that a SUT is a shared multi-tenant environment, the DUT might inadvertently experience interference from the operating system or other software operating on the same server. - Some of this interference can be mitigated. For instance, in multi-core CPU systems, pinning DUT program threads to specific CPU cores and isolating those cores can prevent context switching. - Despite taking all feasible precautions, some adverse effects may still impact the DUT's network performance. In this document, these effects are collectively @@ -354,7 +322,6 @@ to be observable, this time because minor noise events almost always occur during each trial, nudging the measured performance slightly below the theoretical maximum. - Unless specified otherwise, this document's focus is on the potentially observable ends of the SUT performance spectrum, as opposed to the extreme ones. @@ -366,16 +333,12 @@ as there are no realistic enough models that would be capable to distinguish SUT noise from DUT fluctuations (based on the available literature at the time of writing). - Provided SUT execution environment and any co-resident workloads place only negligible demands on SUT shared resources, so that the DUT remains the principal performance limiter, the DUT's ideal noiseless performance is defined as the noiseless end of the SUT performance spectrum. - - - Note that by this definition, DUT noiseless performance also minimizes the impact of DUT fluctuations, as much as realistically possible for a given trial duration. @@ -390,7 +353,6 @@ explicitly model SUT-generated noise, enabling to derive surrogate metrics that approximate the (proxies for) DUT noiseless performance across a range of SUT noise-tolerance levels. -
Repeatability and Comparability @@ -431,7 +393,6 @@ as less dependent on the SUT noise. report some statistics (e.g., average and standard deviation, and/or percentiles like p95). - This can be used for a subset of tests deemed more important, but it makes the search duration problem even more pronounced. @@ -462,34 +423,17 @@ non-zero loss ratio as the goal for their load search. Networking protocols tolerate frame loss better, compared to the time when and were specified. - - - - Increased link speeds require trials sending way more frames within the same duration, increasing the chance of a small SUT performance fluctuation being enough to cause frame loss. - - - - Because noise-related drops usually arrive in small bursts, their impact on the trial's overall frame loss ratio is diluted by the longer intervals in which the SUT operates close to its noiseless performance; consequently, the averaged Trial Loss Ratio can still end up below the specified Goal Loss Ratio value. - - - - If an approximation of the SUT noise impact on the Trial Loss Ratio is known, it can be set as the Goal Loss Ratio (see definitions of Trial and Goal terms in Trial Terms and Goal Terms). - - - - - For more information, see an earlier draft (Section 5) and references there. @@ -499,7 +443,6 @@ support for non-zero loss goals makes a search algorithm more user-friendly. throughput is not user-friendly in this regard. - Furthermore, allowing users to specify multiple loss ratio values, and enabling a single search to find all relevant bounds, significantly enhances the usefulness of the search algorithm. @@ -525,7 +468,6 @@ Section 3 of for loss ratios acceptable for an ac measurement of TCP throughput, and for models and calculations of TCP performance in presence of packet loss. -
Inconsistent Trial Results @@ -554,8 +496,6 @@ where two successive zero-loss trials are recommended, presumably because after one zero-loss trial there can be a subsequent inconsistent non-zero-loss trial. - - A robust throughput search algorithm needs to decide how to continue the search in the presence of such inconsistencies. Definitions of throughput in and are not specific enough @@ -574,13 +514,11 @@ inconsistent trial results remains an open problem.
Requirements Language - The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, and when, and only when, they appear in all capitals, as shown here. - This document is categorized as an Informational RFC. While it does not mandate the adoption of the MLRsearch methodology, it uses the normative language of BCP 14 to provide an unambiguous specification. @@ -589,7 +527,6 @@ it MUST adhere to all the absolute requirements defined herein. The use of normative language is intended to promote repeatable and comparable results among those who choose to implement this methodology. -
MLRsearch Specification @@ -607,17 +544,12 @@ fully specified and discussed in their own subsections, under sections titled "Terms". This way, the list of terms is visible in table of contents. - - - Each per term subsection contains a short Definition paragraph containing a minimal definition and all strict requirements, followed by Discussion paragraphs focusing on important consequences and recommendations. Requirements about how other components can use the defined quantity are also included in the discussion. - -
Scope This document specifies the Multiple Loss Ratio search (MLRsearch) methodology. @@ -682,7 +614,6 @@ within a multiple-goal search are considered outside the normative scope of this
Architecture Overview - Although the normative text references only terminology that has already been introduced, explanatory passages beside it sometimes profit from terms that are defined later in the document. To keep the initial @@ -697,7 +628,6 @@ is purely conceptual; actual implementations need not exchange explicit messages. When the text contrasts alternative behaviours, it refers to the different implementations of the same component. - A test procedure is considered compliant with the MLRsearch Specification if it can be conceptually decomposed into the abstract components defined herein, and each component satisfies the @@ -718,8 +648,6 @@ by calling Controller once for each benchmark. and the Controller then invokes the Measurer repeatedly until Controler decides it has enough information to return outputs. - - The part during which the Controller invokes the Measurer is termed the Search. Any work the Manager performs either before invoking the Controller or after Controller returns, falls outside the scope of the @@ -728,7 +656,6 @@ Search. MLRsearch Specification prescribes Regular Search Results and recommends corresponding search completion conditions. - Irregular Search Results are also allowed, they have different requirements and their corresponding stopping conditions are out of scope. @@ -742,8 +669,6 @@ according to Goal Width, the Regular Goal Result is found. Search stops when all Regular Goal Results are found, or when some Search Goals are proven to have only Irregular Goal Results. - -
Test Report A primary responsibility of the Manager is to produce a Test Report, @@ -771,7 +696,6 @@ if they apply to a MLRsearch benchmark. the Search ends in finite time, as the freedom the Controller has for Load selection also allows for clearly deficient choices. - For deeper insights on these matters, refer to . The primary MLRsearch implementation, used as the prototype @@ -785,7 +709,6 @@ for this specification, is . uses a number of specific quantities, some of them can be expressed in several different units. - In general, MLRsearch Specification does not require particular units to be used, but it is REQUIRED for the test report to state all the units. For example, ratio quantities can be dimensionless numbers between zero and one, @@ -795,7 +718,6 @@ but may be expressed as percentages instead. One constituent of a composite quantity is called an attribute. A group of attribute values is called an instance of that composite quantity. - Some attributes may depend on others and can be calculated from other attributes. Such quantities are called derived quantities. @@ -825,7 +747,6 @@ value of "duration" is expected to be equal to "final duration&qu
Existing Terms - This specification relies on the following three documents that should be consulted before attempting to make use of this document: @@ -843,8 +764,6 @@ benchmarking situations in a more precise way. Definitions of some central terms from above documents are copied and discussed in the following subsections. - -
SUT Defined in Section 3.1.2 of as follows. @@ -866,10 +785,6 @@ as a single entity and response measured.
An SUT consisting of a single network device is allowed by this definition.
- - - -
 
In software-based networking SUT may comprise multitude of @@ -898,7 +813,6 @@ to which stimulus is offered and response measured.
- Discussion:
@@ -914,10 +828,6 @@ MLRsearch Specification, but is of key relevance for its motivation. The device can represent a software-based networking functions running on commodity x86/ARM CPUs (vs purpose-built ASIC / NPU / FPGA). -
- - -
 
A well-designed SUTs should have the primary DUT as their performance bottleneck. @@ -985,10 +895,6 @@ shown in the Test Frame Formats document. as sent and received by a tester, as implicitly defined in Section 6 of .
-
- - -
 
The definition describes some traits, not using capitalized verbs @@ -999,11 +905,6 @@ but any such deviation MUST be described explicitly in the Test Report. It is still RECOMMENDED to not deviate from the description, as any deviation weakens comparability.
-
- - - -
 
An example of deviation from is using shorter wait times, @@ -1019,11 +920,6 @@ any such time-sensitive per-trial configuration method, with bridge MAC learning being only one possible examples. Appendix C.2.4.1 of lists another example: ARP with wait time of 5 seconds.
-
- - - -
 
Some methodologies describe recurring tests. @@ -1035,7 +931,6 @@ If those are based on Trials, they are treated as multiple independent Trials.
Trial Terms - This section defines new and redefine existing terms for quantities relevant as inputs or outputs of a Trial, as used by the Measurer component. This includes also any derived quantities related to results of one Trial. @@ -1051,7 +946,6 @@ This includes also any derived quantities related to results of one Trial.
- Discussion:
@@ -1069,7 +963,6 @@ only uses the accepted values.
-
Trial Load @@ -1101,10 +994,6 @@ as specified in Section 3.4 of ). Informally, Traffic Load is a single number that can "scale" any traffic pattern as long as the intuition of load intended against a single interface can be applied. - - - -
 
It MAY be possible to use a Trial Load value to describe a non-constant traffic @@ -1135,13 +1024,6 @@ Trial Load is the data rate per direction, half of aggregate data rate. Traffic patterns where a single Trial Load does not describe their scaling cannot be used for MLRsearch benchmarks.
-
- - - - - -
 
Similarly to Trial Duration, some Measurers MAY limit the possible values @@ -1150,19 +1032,11 @@ documenting such behavior in the test report is OPTIONAL. This is because the load differences are negligible (and frequently undocumented) in practice.
-
- - -
 
The Controller MAY select Trial Load and Trial Duration values in a way that would not be possible to achieve using any integer number of data frames.
-
- - -
 
If a particular Trial Load value is not tied to a single Trial, @@ -1180,19 +1054,11 @@ port), or (iii) the total across every interface. For any aggregate load value, the report MUST also give the fixed conversion factor that links the per-interface and multi-interface load values.
-
- - -
 
The per-interface value remains the primary unit, consistent with prevailing practice in , , and .
-
- - -
 
The last paragraph also applies to other terms related to Load. @@ -1292,12 +1158,6 @@ are outside of the scope of this document. An example standardization effort is , a draft at the time of writing.
-
- - - - -
 
Examples of traffic properties include: @@ -1310,11 +1170,6 @@ a draft at the time of writing. - Symmetric bidirectional traffic - Section 14 of .
-
- - - -
 
Other traffic properties that need to be somehow specified @@ -1332,7 +1187,6 @@ if they apply to the benchmark, include:
-
Trial Forwarding Ratio @@ -1360,10 +1214,6 @@ This SHOULD be the default interpretation. Only if this is not the case, the test report MUST describe the Traffic Profile in a detail sufficient to imply how Trial Forwarding Ratio should be calculated. - - - -
 
Trial Forwarding Ratio MAY be expressed in other units @@ -1384,10 +1234,6 @@ even though the final value is "rate" that is still per-interface. if one direction is forwarded without losses, but the opposite direction does not forward at all, the Trial Forwarding Ratio would be 0.5 (50%).
-
- - -
 
In future extensions, more general ways to compute Trial Forwarding Ratio @@ -1408,7 +1254,6 @@ averaged counters approach.
- Discussion:
@@ -1457,11 +1302,6 @@ Section 14 of , the Trial Forwarding Rate is numer to the arithmetic average of the individual per-interface forwarding rates that would be produced by the RFC 2285 procedure. -
- - - -
 
For more complex traffic patterns, such as many-to-one as mentioned @@ -1494,7 +1334,6 @@ by default equal to the Trial Duration.
- Discussion:
@@ -1509,10 +1348,6 @@ the Controller MUST use the Trial Duration value instead. Trial Effective Duration may be any positive time quantity chosen by the Measurer to be used for time-based decisions in the Controller. -
- - -
 
The test report MUST explain how the Measurer computes the returned @@ -1527,10 +1362,6 @@ rather than solely the traffic portion of it. An approach is to measure the duration of the whole trial (including all wait times) and use that as the Trial Effective Duration.
-
- - -
 
This is also a way for the Measurer to inform the Controller about @@ -1568,10 +1399,6 @@ ignore values of any optional attribute they are not familiar with, except when passing Trial Output instances to the Manager.
-
- - -
 
Example of an optional attribute: @@ -1611,7 +1438,6 @@ used to collectively describe multiple Trial Result instances.
-
Goal Terms @@ -1652,10 +1478,6 @@ The value MUST be positive. Certain trials must reach this minimum duration before a load can be classified as a lower bound. - - - -
 
The Controller may choose shorter durations, @@ -1683,7 +1505,6 @@ The value MUST be positive.
- Discussion:
@@ -1897,10 +1718,6 @@ some of them are required. Required attributes: Goal Final Trial Duration, Goal Duration Sum, Goal Loss Ratio and Goal Exceed Ratio. -
- - -
 
Optional attributes: Goal Initial Trial Duration and Goal Width. @@ -1918,10 +1735,6 @@ even if they are not required by MLRsearch Specification. However, it is RECOMMENDED for those implementations to support missing attributes by providing typical default values.
-
- - -
 
For example, implementations with Goal Initial Trial Durations @@ -1989,7 +1802,6 @@ It is the maximal value the Controller is allowed to use for Trial Load values.< Discussion: -
 
@@ -2030,7 +1842,6 @@ as sum of per-interface maximal loads.
-
Min Load @@ -2046,7 +1857,6 @@ It is the minimal value the Controller is allowed to use for Trial Load values.< Discussion: -
 
@@ -2082,7 +1892,6 @@ as sum of per-interface minimal loads.
-
@@ -2185,7 +1994,6 @@ In practice, the Load frequently becomes a Lower Boun -
Lower Bound @@ -2235,7 +2043,6 @@ In practice, the Load frequently becomes an Upper Bou -
Undecided @@ -2315,7 +2122,6 @@ as sum of per-interface loads. -
Relevant Lower Bound @@ -2356,7 +2162,6 @@ as sum of per-interface loads. -
Conditional Throughput @@ -2412,7 +2217,6 @@ as sum of per-interface forwarding rates. -
Goal Results @@ -2434,8 +2238,6 @@ Conditional Throughput is a RECOMMENDED attribute. - - Discussion:
@@ -2557,10 +2359,6 @@ sequence of their corresponding Search Goal instances. When the Search Result is expressed as a mapping, it MUST contain an entry for every Search Goal instance supplied in the Controller Input. -
- - -
 
Identical Goal Result instances MAY be listed for different Search Goals, @@ -2594,8 +2392,6 @@ e.g., number of trials performed and the total Search Duration.
- -
Architecture Terms @@ -2606,7 +2402,6 @@ The components were introduced in Architect and the following subsections finalize their definitions using terms from previous sections. - Note that the architecture also implies the presence of other components, such as the SUT and the tester (as a sub-component of the Measurer). @@ -2618,7 +2413,6 @@ to call the Measurer indirectly instead. In doing so, the Measurer implementatio can be fully independent from the Controller implementations, e.g., developed in different programming languages. -
Measurer Definition: @@ -2698,7 +2492,6 @@ terminates. - Discussion:
@@ -2759,10 +2552,6 @@ the size of frame handling buffers between tests of frame handling rates or to disable all but one transport protocol when testing the throughput of that protocol." -
- - -
 
It is REQUIRED for the test report to encompass all the SUT configuration @@ -2795,8 +2584,6 @@ out of scope for this document.
- -
Compliance @@ -2842,7 +2629,6 @@ unconditionally compliant with Section 24 of . Goal Exceed Ratio = 0% - Goal Loss Ratio and Goal Exceed Ratio attributes, are enough to make the Search Goal conditionally compliant. Adding Goal Final Trial Duration @@ -2895,8 +2681,6 @@ so third full-length trial is never needed.
Methodology Rationale and Design Considerations - - This section explains the Why behind MLRsearch. Building on the normative specification in Section MLRsearch Specification, @@ -2958,7 +2742,6 @@ and its variance. The biggest - difference between MLRsearch and binary search is in the goals of the search. has a single goal, based on classifying a single full-length trial @@ -2980,7 +2763,6 @@ when the search is started with only one Search Goal instance. MLRsearch Specification - supports multiple Search Goals, making the search procedure more complicated compared to binary search with single goal, but most of the complications do not affect the final results much. @@ -3054,7 +2836,6 @@ to get invalidated later. a model where some Trial Results (those with high Trial Loss Ratio) are affected by infrequent effects, causing unsatisfactory repeatability - of Throughput results. Refer to Section DUT in SUT for a discussion about noiseful and noiseless ends of the SUT performance spectrum. @@ -3097,7 +2878,6 @@ when Short Trials are used. decide whether, when, and how short trial durations are used. The exact heuristics and controls are left to the discretion of the implementer. - While MLRsearch implementations are free to use any logic to select Trial Input values, comparability between MLRsearch implementations is only assured when the Load Classification logic @@ -3132,7 +2912,6 @@ for Loads with multiple Trials and a non-zero Goal Loss Ratio. generalized throughput definition - is the presence of a hard performance limit. -
Hard Performance Limit Even if bandwidth of a medium allows higher traffic forwarding performance, @@ -3142,7 +2921,6 @@ e.g., a specific frames-per-second limit on the NIC (a common occurrence). Those limitations should be known and provided as Max Load, Section Max Load. - But if Max Load is set larger than what the interface can receive or transmit, there will be a "hard limit" behavior observed in Trial Results. @@ -3156,7 +2934,6 @@ counter-intuitive. Accordingly, the Throughput me be generalized - rather than relying solely on the Relevant Lower Bound - to reflect realistic, limit-aware performance. - MLRsearch defines one such generalization, the Conditional Throughput. It is the Trial Forwarding Rate from one of the Full-Length Trials @@ -3180,7 +2957,6 @@ are equal to the Goal Loss Ratio), one can prove that Conditional Throughput values may have up to the Goal Loss Ratio relative difference. - Setting the Goal Width below the Goal Loss Ratio may cause the Conditional Throughput for a larger Goal Loss Ratio to become smaller than a Conditional Throughput for a goal with a lower Goal Loss Ratio, @@ -3216,7 +2992,6 @@ and uses more intuitive names for the intermediate values. Note: For explanation clarity variables are taged as (I)nput, (T)emporary, (O)utput. - Collect Trial Results: Take all Trial Result instances (I) measured at a given load. @@ -3431,12 +3206,9 @@ solely on measurements observable external to the DUT/SUT. benchmark scores - such as a dedicated "fast-track" test mode that is never used in normal operation. - Any implications for network security arising from the DUT/SUT SHOULD be identical in the lab and in production networks. - -
Acknowledgements @@ -3560,11 +3332,10 @@ versions of this document. - +
Load Classification Code - This appendix specifies how to perform the Load Classification. Any Trial Load value can be classified, @@ -3580,7 +3351,6 @@ which computes two values, stored in variables named Although presented as pseudocode, the listing is syntactically valid Python and can be executed without modification. - If values of both variables are computed to be true, the Load in question is classified as a Lower Bound according to the given Search Goal instance. If values of both variables are false, the Load is classified as an Upper Bound. @@ -3631,7 +3401,6 @@ optimistic_is_lower = effect_high_loss_s <= quantile_duration_s ]]> -
Conditional Throughput Code @@ -3700,11 +3469,9 @@ conditional_throughput = intended_load * (1.0 - quantile_loss_ratio) ]]> -
Example Search - The following example Search is related to one hypothetical run of a Search test procedure that has been started with multiple Search Goals. @@ -3972,7 +3739,6 @@ Code for available results is: 59x1s0l Undecided - This is the last point in time where all goals have this load as Undecided.
@@ -4603,9 +4369,6 @@ One has Trial Loss Ratio of 0%, the other of 0.1%. Due to stricter Goal Exceed Ratio, this Conditional Throughput is smaller than Conditional Throughput of the other two goals. - - -
@@ -4614,1085 +4377,799 @@ is smaller than Conditional Throughput of the other two goals.