docs/ietf/draft-vpolak-bmwg-plrsearch-00.md

   1 ---
   2 title: Probabilistic Loss Ratio Search for Packet Throughput (PLRsearch)
   3 # abbrev: PLRsearch
   4 docname: draft-vpolak-bmwg-plrsearch-00
   5 date: 2018-11-13
   6
   7 ipr: trust200902
   8 area: ops
   9 wg: Benchmarking Working Group
  10 kw: Internet-Draft
  11 cat: info
  12
  13 coding: us-ascii
  14 pi:    # can use array (if all yes) or hash here
  15 #  - toc
  16 #  - sortrefs
  17 #  - symrefs
  18   toc: yes
  19   sortrefs:   # defaults to yes
  20   symrefs: yes
  21
  22 author:
  23       -
  24         ins: M. Konstantynowicz
  25         name: Maciek Konstantynowicz
  26         org: Cisco Systems
  27         role: editor
  28         email: mkonstan@cisco.com
  29       -
  30         ins: V. Polak
  31         name: Vratko Polak
  32         org: Cisco Systems
  33         role: editor
  34         email: vrpolak@cisco.com
  35
  36 normative:
  37   RFC2544:
  38   RFC8174:
  39
  40 informative:
  41
  42 --- abstract
  43
  44 This document addresses challenges while applying methodologies
  45 described in [RFC2544] to benchmarking NFV (Network Function
  46 Virtualization) over an extended period of time, sometimes referred to
  47 as "soak testing". More specifically to benchmarking software based
  48 implementations of NFV data planes. Packet throughput search approach
  49 proposed by this document assumes that system under test is
  50 probabilistic in nature, and not deterministic.
  51
  52 --- middle
  53
  54 # Motivation
  55
  56 Network providers are interested in throughput a device can sustain.
  57
  58 RFC 2544 assumes loss ratio is given by a deterministic function of
  59 offered load. But NFV software devices are not deterministic (enough).
  60 This leads for deterministic algorithms (such as MLRsearch with single
  61 trial) to return results, which when repeated show relatively high
  62 standard deviation, thus making it harder to tell what "the throughput"
  63 actually is.
  64
  65 We need another algorithm, which takes this indeterminism into account.
  66
  67 # Model
  68
  69 Each algorithm searches for an answer to a precisely formulated
  70 question. When the question involves indeterministic systems, it has to
  71 specify probabilities (or prior distributions) which are tied to a
  72 specific probabilistic model. Different models will have different
  73 number (and meaning) of parameters. Complicated (but more realistic)
  74 models have many parameters, and the math involved can be very
  75 complicated. It is better to start with simpler probabilistic model, and
  76 only change it when the output of the simpler algorithm is not stable or
  77 useful enough.
  78
  79 TODO: Refer to packet forwarding terminology, such as "offered load" and
  80 "loss ratio".
  81
  82 TODO: Mention that no packet duplication is expected (or is filtered
  83 out).
  84
  85 TODO: Define critical load and critical region earlier.
  86
  87 This document is focused on algorithms related to packet loss count
  88 only. No latency (or other information) is taken into account. For
  89 simplicity, only one type of measurement is considered: dynamically
  90 computed offered load, constant within trial measurement of
  91 predetermined trial duration.
  92
  93 Also, running longer trials (in some situations) could be more efficient,
  94 but in order to perform trial at multiple offered loads withing critical region,
  95 trial durations should be kept as short as possible.
  96
  97 # Poisson Distribution
  98
  99 TODO: Give link to more officially published literature about Poisson
 100 distribution.
 101
 102 Note-1: that the algorithm makes an assumption that packet traffic
 103 generator detects duplicate packets on receive detection, and reports
 104 this as an error.
 105
 106 Note-2: Binomial distribution is a better fit compared to Poisson
 107 distribution (acknowledging that the number of packets lost cannot be
 108 higher than the number of packets offered), but the difference tends to
 109 be relevant in loads far above the critical region, so using Poisson
 110 distribution helps the algorithm focus on critical region better.
 111
 112 When comparing different offered loads, the average loss per second is
 113 assumed to increase, but the (deterministic) function from offered load
 114 into average loss rate is otherwise unknown.
 115
 116 Given a loss target (configurable, by default one packet lost per
 117 second), there is an unknown offered load when the average is exactly
 118 that. We call that the "critical load". If critical load seems higher
 119 than maximum offerable load, we should use the maximum offerable load to
 120 make search output more stable.
 121
 122 Of course, there are great many increasing functions. The offered load
 123 has to be chosen for each trial, and the computed posterior distribution
 124 of critical load can change with each trial result.
 125
 126 To make the space of possible functions more tractable, some other
 127 simplifying assumption is needed. As the algorithm will be examining
 128 (also) loads close to the critical load, linear approximation to the
 129 function (TODO: name the function) in the critical region is important.
 130 But as the search algorithm needs to evaluate the function also far
 131 away from the critical region, the approximate function has to be well-
 132 behaved for every positive offered load, specifically it cannot predict
 133 non-positive packet loss rate.
 134
 135 Within this document, "fitting function" is the name for such well-behaved
 136 function which approximates the unknown function in the critical region.
 137
 138 Results from trials far from the critical region are likely to affect
 139 the critical rate estimate negatively, as the fitting function does not
 140 need to be a good approximation there. Instead of discarding some
 141 results, or "suppressing" their impact with ad-hoc methods (other than
 142 using Poisson distribution instead of binomial) is not used, as such
 143 methods tend to make the overall search unstable. We rely on most of
 144 measurements being done (eventually) within the critical region, and
 145 overweighting far-off measurements (eventually) for well-behaved fitting
 146 functions.
 147
 148 # Fitting Function Coefficients Distribution
 149
 150 To accomodate systems with different behaviours, the fitting function is
 151 expected to have few numeric parameters affecting its shape (mainly
 152 affecting the linear approximation in the critical region).
 153
 154 The general search algorithm can use whatever increasing fitting
 155 function, some specific functions can be described later.
 156
 157 TODO: Describe sigmoid-based and erf-based functions.
 158
 159 It is up to implementer to chose a fitting function and prior
 160 distribution of its parameters. The rest of this document assumes each
 161 parameter is independently and uniformly distributed over common
 162 interval. Implementers are to add non-linear transformations into their
 163 fitting functions if their prior is different.
 164
 165 TODO: Move the following sentence into more appropriate place.
 166
 167 Speaking about new trials, each next trial will be done at offered load
 168 equal to the current average of the critical load.
 169
 170 Exit condition is either critical load stdev becoming small enough, or
 171 overal search time becoming long enough.
 172
 173 The algorithm should report both avg and stdev for critical load. If the
 174 reported averages follow a trend (without reaching equilibrium), avg and
 175 stdev should refer to the equilibrium estibated based on the trend, not
 176 to immediate posterior values.
 177
 178 TODO: Explicitly mention the iterative character of the search.
 179
 180 # Algorithm Formulas
 181
 182 ## Integration
 183
 184 The posterior distributions for fitting function parameters will not be
 185 integrable in general.
 186
 187 The search algorithm utilises the fact that trial measurement takes some
 188 time, so this time can be used for numeric integration (using suitable
 189 method, such as Monte Carlo) to achieve sufficient precision.
 190
 191 ## Optimizations
 192
 193 After enough trials, the posterior distribution will be concentrated in
 194 a narrow area of parameter space. The integration method could take
 195 advantage of that.
 196
 197 Even in the concentrated area, the likelihood can be quite small, so the
 198 integration algorithm should track the logarithm of the likelihood, and
 199 also avoid underflow errors bu ther means.
 200
 201 # Known Implementations
 202
 203 The only known working implementatin of Probabilistic Loss Ratio Search
 204 for Packet Throughput is in Linux Foundation FD.io CSIT project. https://wiki.fd.io/view/CSIT. https://git.fd.io/csit/.
 205
 206 ## FD.io CSIT Implementation Specifics
 207
 208 In a sample implemenation in FD.io CSIT project, there is around 0.5
 209 second delay between trials due to restrictons imposed by packet traffic
 210 generator in use (T-Rex), avoiding that delay is out of scope of this
 211 document.
 212
 213 TODO: Describe how the current integration algorithm finds the
 214 concentrated area.
 215
 216 # IANA Considerations
 217
 218 ..
 219
 220 # Security Considerations
 221
 222 ..
 223
 224 # Acknowledgements
 225
 226 ..
 227
 228 --- back