X-Git-Url: https://gerrit.fd.io/r/gitweb?p=csit.git;a=blobdiff_plain;f=docs%2Fietf%2Fdraft-vpolak-mkonstan-bmwg-mlrsearch-01.md;fp=docs%2Fietf%2Fdraft-vpolak-mkonstan-bmwg-mlrsearch-01.md;h=0000000000000000000000000000000000000000;hp=561365ef2e298a2af959680c1d9b9e3bb6451a28;hb=681443b9fdf79463edae4726d8f7b27c9dfced02;hpb=3be03603203d84ad49fa3fd6f376ecbf0395361b diff --git a/docs/ietf/draft-vpolak-mkonstan-bmwg-mlrsearch-01.md b/docs/ietf/draft-vpolak-mkonstan-bmwg-mlrsearch-01.md deleted file mode 100644 index 561365ef2e..0000000000 --- a/docs/ietf/draft-vpolak-mkonstan-bmwg-mlrsearch-01.md +++ /dev/null @@ -1,362 +0,0 @@ ---- -title: Multiple Loss Ratio Search for Packet Throughput (MLRsearch) -# abbrev: MLRsearch -docname: draft-vpolak-mkonstan-bmwg-mlrsearch-01 -date: 2019-03-27 - -ipr: trust200902 -area: ops -wg: Benchmarking Working Group -kw: Internet-Draft -cat: info - -coding: us-ascii -pi: # can use array (if all yes) or hash here -# - toc -# - sortrefs -# - symrefs - toc: yes - sortrefs: # defaults to yes - symrefs: yes - -author: - - - ins: M. Konstantynowicz - name: Maciek Konstantynowicz - org: Cisco Systems - role: editor - email: mkonstan@cisco.com - - - ins: V. Polak - name: Vratko Polak - org: Cisco Systems - role: editor - email: vrpolak@cisco.com - -normative: - RFC2544: - RFC8174: - -informative: - - ---- abstract - -This document proposes changes to [RFC2544], specifically to packet -throughput search methodology, by defining a new search algorithm -referred to as Multiple Loss Ratio search (MLRsearch for short). Instead -of relying on binary search with pre-set starting offered load, it -proposes a novel approach discovering the starting point in the initial -phase, and then searching for packet throughput based on defined packet -loss ratio (PLR) input criteria and defined final trial duration time. -One of the key design principles behind MLRsearch is minimizing the -total test duration and searching for multiple packet throughput rates -(each with a corresponding PLR) concurrently, instead of doing it -sequentially. - -The main motivation behind MLRsearch is the new set of challenges and -requirements posed by NFV (Network Function Virtualization), -specifically software based implementations of NFV data planes. Using -[RFC2544] in the experience of the authors yields often not repetitive -and not replicable end results due to a large number of factors that are -out of scope for this draft. MLRsearch aims to address this challenge and -define a common (standard?) way to evaluate NFV packet throughput -performance that takes into account varying characteristics of NFV -systems under test. - ---- middle - -# Terminology - -* NDR - Non-Drop Rate, a packet throughput metric with Packet Loss Ratio - equal zero (a zero packet loss), expressed in packets-per-second - (pps). NDR packet throughput has an associated metric oftentimes - referred to as NDR bandwidth expressed in bits-per-second (bps), and - calculated as a product of: - * NDR packet rate for specific packet (frame) size, and - * Packet (L2 frame size) size in bits plus any associated L1 overhead. -* PLR - Packet Loss Ratio, a packet loss metric calculated as a ratio of - (packets_transmitted - packets_received) to packets_transmitted, over - the test trial duration. -* PDR - Partial-Drop Rate, a packet throughput metric with Packet Loss - Ratio greater than zero (a non-zero packet loss), expressed in - packets-per-second (pps). PDR packet throughput has an associated - metric oftentimes referred to as PDR bandwidth expressed in bits-per- - second (bps), and calculated as a product of: - * PDR packet rate for specific packet (frame) size, and - * Packet (L2 frame size) size in bits plus any associated L1 overhead. - -# MLRsearch Background - -Multiple Loss Rate search (MLRsearch) is a packet throughput search -algorithm suitable for deterministic (as opposed to probabilistic) -systems. MLRsearch discovers multiple packet throughput rates in a -single search, each rate associated with a distinct Packet Loss Ratio -(PLR) criteria. - -Two popular names for particular PLR criteria are Non-Drop Rate (NDR, -with PLR=0, zero packet loss) and Partial Drop Rate (PDR, with PLR>0, -non-zero packet loss). MLRsearch discovers NDR and PDR in a single -search reducing required execution time compared to separate binary -searches for NDR and PDR. MLRsearch reduces execution time even further -by relying on shorter trial durations of intermediate steps, with only -the final measurements conducted at the specified final trial duration. -This results in the shorter overall search execution time when compared -to a standard NDR/PDR binary search, while guaranteeing the same or -similar results. -(TODO: Specify "standard" in the previous sentence.) - -If needed, MLRsearch can be easily adopted to discover more throughput -rates with different pre-defined PLRs. - -Unless otherwise noted, all throughput rates are *always* bi-directional -aggregates of two equal (symmetric) uni-directional packet rates -received and reported by an external traffic generator. - -# MLRsearch Overview - -The main properties of MLRsearch: - -* MLRsearch is a duration aware multi-phase multi-rate search algorithm. - * Initial phase determines promising starting interval for the search. - * Intermediate phases progress towards defined final search criteria. - * Final phase executes measurements according to the final search - criteria. -* Initial phase: - * Uses link rate as a starting transmit rate and discovers the Maximum - Receive Rate (MRR) used as an input to the first intermediate phase. -* Intermediate phases: - * Start with initial trial duration (in the first phase) and converge - geometrically towards the final trial duration (in the final phase). - * Track two values for NDR and two for PDR. - * The values are called (NDR or PDR) lower_bound and upper_bound. - * Each value comes from a specific trial measurement - (most recent for that transmit rate), - and as such the value is associated with that measurement's duration and loss. - * A bound can be invalid, for example if NDR lower_bound - has been measured with nonzero loss. - * Invalid bounds are not real boundaries for the searched value, - but are needed to track interval widths. - * Valid bounds are real boundaries for the searched value. - * Each non-initial phase ends with all bounds valid. - * Start with a large (lower_bound, upper_bound) interval width and - geometrically converge towards the width goal (measurement resolution) - of the phase. Each phase halves the previous width goal. - * Use internal and external searches: - * External search - measures at transmit rates outside the (lower_bound, - upper_bound) interval. Activated when a bound is invalid, - to search for a new valid bound by doubling the interval width. - It is a variant of "exponential search". - * Internal search - "binary search", measures at transmit rates within the - (lower_bound, upper_bound) valid interval, halving the interval width. -* Final phase - * Executed with the final test trial duration, and the final width - goal that determines resolution of the overall search. -* Intermediate phases together with the final phase are called - non-initial phases. - -The main benefits of MLRsearch vs. binary search include: - -* In general MLRsearch is likely to execute more search trials overall, but - less trials at a set final duration. -* In well behaving cases it greatly reduces (>50%) the overall duration - compared to a single PDR (or NDR) binary search duration, - while finding multiple drop rates. -* In all cases MLRsearch yields the same or similar results to binary search. -* Note: both binary search and MLRsearch are susceptible to reporting - non-repeatable results across multiple runs for very bad behaving - cases. - -Caveats: - -* Worst case MLRsearch can take longer than a binary search e.g. in case of - drastic changes in behaviour for trials at varying durations. - -# Sample Implementation - -Following is a brief description of a sample MLRsearch implementation -based on the open-source code running in FD.io CSIT project as part of a -Continuous Integration / Continuous Development (CI/CD) framework. - -## Input Parameters - -1. **maximum_transmit_rate** - maximum packet transmit rate to be used by - external traffic generator, limited by either the actual Ethernet - link rate or traffic generator NIC model capabilities. Sample - defaults: 2 * 14.88 Mpps for 64B 10GE link rate, - 2 * 18.75 Mpps for 64B 40GE NIC maximum rate. -2. **minimum_transmit_rate** - minimum packet transmit rate to be used for - measurements. MLRsearch fails if lower transmit rate needs to be - used to meet search criteria. Default: 2 * 10 kpps (could be higher). -3. **final_trial_duration** - required trial duration for final rate - measurements. Default: 30 sec. -4. **initial_trial_duration** - trial duration for initial MLRsearch phase. - Default: 1 sec. -5. **final_relative_width** - required measurement resolution expressed as - (lower_bound, upper_bound) interval width relative to upper_bound. - Default: 0.5%. -6. **packet_loss_ratio** - maximum acceptable PLR search criteria for - PDR measurements. Default: 0.5%. -7. **number_of_intermediate_phases** - number of phases between the initial - phase and the final phase. Impacts the overall MLRsearch duration. - Less phases are required for well behaving cases, more phases - may be needed to reduce the overall search duration for worse behaving cases. - Default (2). (Value chosen based on limited experimentation to date. - More experimentation needed to arrive to clearer guidelines.) - -## Initial phase - -1. First trial measures at maximum rate and discovers MRR. - * *in*: trial_duration = initial_trial_duration. - * *in*: offered_transmit_rate = maximum_transmit_rate. - * *do*: single trial. - * *out*: measured loss ratio. - * *out*: mrr = measured receive rate. -2. Second trial measures at MRR and discovers MRR2. - * *in*: trial_duration = initial_trial_duration. - * *in*: offered_transmit_rate = MRR. - * *do*: single trial. - * *out*: measured loss ratio. - * *out*: mrr2 = measured receive rate. -3. Third trial measures at MRR2. - * *in*: trial_duration = initial_trial_duration. - * *in*: offered_transmit_rate = MRR2. - * *do*: single trial. - * *out*: measured loss ratio. - -## Non-initial phases - -1. Main loop: - * *in*: trial_duration for the current phase. - Set to initial_trial_duration for the first intermediate phase; - to final_trial_duration for the final phase; - or to the element of interpolating geometric sequence - for other intermediate phases. - For example with two intermediate phases, trial_duration - of the second intermediate phase is the geometric average - of initial_strial_duration and final_trial_duration. - * *in*: relative_width_goal for the current phase. - Set to final_relative_width for the final phase; - doubled for each preceding phase. - For example with two intermediate phases, - the first intermediate phase uses quadruple of final_relative_width - and the second intermediate phase uses double of final_relative_width. - * *in*: ndr_interval, pdr_interval from the previous main loop iteration - or the previous phase. - If the previous phase is the initial phase, both intervals have - lower_bound = MRR2, uper_bound = MRR. - Note that the initial phase is likely to create intervals with invalid bounds. - * *do*: According to the procedure described in point 2, - either exit the phase (by jumping to 1.g.), - or prepare new transmit rate to measure with. - * *do*: Perform the trial measurement at the new transmit rate - and trial_duration, compute its loss ratio. - * *do*: Update the bounds of both intervals, based on the new measurement. - The actual update rules are numerous, as NDR external search - can affect PDR interval and vice versa, but the result - agrees with rules of both internal and external search. - For example, any new measurement below an invalid lower_bound - becomes the new lower_bound, while the old measurement - (previously acting as the invalid lower_bound) - becomes a new and valid upper_bound. - Go to next iteration (1.c.), taking the updated intervals as new input. - * *out*: current ndr_interval and pdr_interval. - In the final phase this is also considered - to be the result of the whole search. - For other phases, the next phase loop is started - with the current results as an input. -2. New transmit rate (or exit) calculation (for 1.d.): - * If there is an invalid bound then prepare for external search: - * *If* the most recent measurement at NDR lower_bound transmit rate - had the loss higher than zero, then - the new transmit rate is NDR lower_bound - decreased by two NDR interval widths. - * Else, *if* the most recent measurement at PDR lower_bound - transmit rate had the loss higher than PLR, then - the new transmit rate is PDR lower_bound - decreased by two PDR interval widths. - * Else, *if* the most recent measurement at NDR upper_bound - transmit rate had no loss, then - the new transmit rate is NDR upper_bound - increased by two NDR interval widths. - * Else, *if* the most recent measurement at PDR upper_bound - transmit rate had the loss lower or equal to PLR, then - the new transmit rate is PDR upper_bound - increased by two PDR interval widths. - * If interval width is higher than the current phase goal: - * Else, *if* NDR interval does not meet the current phase width goal, - prepare for internal search. The new transmit rate is - (NDR lower bound + NDR upper bound) / 2. - * Else, *if* PDR interval does not meet the current phase width goal, - prepare for internal search. The new transmit rate is - (PDR lower bound + PDR upper bound) / 2. - * Else, *if* some bound has still only been measured at a lower duration, - prepare to re-measure at the current duration (and the same transmit rate). - The order of priorities is: - * NDR lower_bound, - * PDR lower_bound, - * NDR upper_bound, - * PDR upper_bound. - * *Else*, do not prepare any new rate, to exit the phase. - This ensures that at the end of each non-initial phase - all intervals are valid, narrow enough, and measured - at current phase trial duration. - -# Known Implementations - -The only known working implementation of MLRsearch is in Linux Foundation -FD.io CSIT project. https://wiki.fd.io/view/CSIT. https://git.fd.io/csit/. - -## FD.io CSIT Implementation Deviations - -This document so far has been describing a simplified version of MLRsearch algorithm. -The full algorithm as implemented contains additional logic, -which makes some of the details (but not general ideas) above incorrect. -Here is a short description of the additional logic as a list of principles, -explaining their main differences from (or additions to) the simplified description, -but without detailing their mutual interaction. - -1. Logarithmic transmit rate. - In order to better fit the relative width goal, - the interval doubling and halving is done differently. - For example, the middle of 2 and 8 is 4, not 5. -2. Optimistic maximum rate. - The increased rate is never higher than the maximum rate. - Upper bound at that rate is always considered valid. -3. Pessimistic minimum rate. - The decreased rate is never lower than the minimum rate. - If a lower bound at that rate is invalid, - a phase stops refining the interval further (until it gets re-measured). -4. Conservative interval updates. - Measurements above current upper bound never update a valid upper bound, - even if drop ratio is low. - Measurements below current lower bound always update any lower bound - if drop ratio is high. -5. Ensure sufficient interval width. - Narrow intervals make external search take more time to find a valid bound. - If the new transmit increased or decreased rate would result in width - less than the current goal, increase/decrease more. - This can happen if the measurement for the other interval - makes the current interval too narrow. - Similarly, take care the measurements in the initial phase - create wide enough interval. -6. Timeout for bad cases. - The worst case for MLRsearch is when each phase converges to intervals - way different than the results of the previous phase. - Rather than suffer total search time several times larger - than pure binary search, the implemented tests fail themselves - when the search takes too long (given by argument *timeout*). - -# IANA Considerations - -.. - -# Security Considerations - -.. - -# Acknowledgements - -.. - ---- back