docs/cpta/introduction/index.rst

   1 Introduction
   2 ============
   3
   4 Purpose
   5 -------
   6
   7 With increasing number of features and code changes in the FD.io VPP data plane
   8 codebase, it is increasingly difficult to measure and detect VPP data plane
   9 performance changes. Similarly, once degradation is detected, it is getting
  10 harder to bisect the source code in search of the Bad code change or addition.
  11 The problem is further escalated by a large combination of compute platforms
  12 that VPP is running and used on, including Intel Xeon, Intel Atom, ARM Aarch64.
  13
  14 Existing FD.io CSIT continuous performance trending test jobs help, but they
  15 rely on human factors for anomaly detection, and as such are error prone and
  16 unreliable, as the volume of data generated by these jobs is growing
  17 exponentially.
  18
  19 Proposed solution is to eliminate human factor and fully automate performance
  20 trending, regression and progression detection, as well as bisecting.
  21
  22 This document describes a high-level design of a system for continuous
  23 measuring, trending and performance change detection for FD.io VPP SW data
  24 plane. It builds upon the existing CSIT framework with extensions to its
  25 throughput testing methodology, CSIT data analytics engine
  26 (PAL – Presentation-and-Analytics-Layer) and associated Jenkins jobs
  27 definitions.
  28
  29 Continuous Performance Trending and Analysis
  30 --------------------------------------------
  31
  32 Proposed design replaces existing CSIT performance trending jobs and tests with
  33 new Performance Trending (PT) CSIT module and separate Performance Analysis (PA)
  34 module ingesting results from PT and analysing, detecting and reporting any
  35 performance anomalies using historical trending data and statistical metrics.
  36 PA does also produce trending graphs with summary and drill-down views across
  37 all specified tests that can be reviewed and inspected regularly by FD.io
  38 developers and users community.
  39
  40 Trend Analysis
  41 ``````````````
  42
  43 All measured performance trend data is treated as time-series data that can be
  44 modelled using normal distribution. After trimming the outliers, the average and
  45 deviations from average are used for detecting performance change anomalies
  46 following the three-sigma rule of thumb (a.k.a. 68-95-99.7 rule).
  47
  48 Analysis Metrics
  49 ````````````````
  50
  51 Following statistical metrics are proposed as performance trend indicators over
  52 the rolling window of last <N> sets of historical measurement data:
  53
  54     #. Quartiles Q1, Q2, Q3 – three points dividing a ranked set of data set
  55        into four equal parts, Q2 is the median of the data.
  56     #. Inter Quartile Range IQR=Q3-Q1 – measure of variability, used here to
  57        eliminate outliers.
  58     #. Outliers – extreme values that are at least 1.5*IQR below Q1, or at
  59        least 1.5*IQR above Q3.
  60     #. Trimmed Moving Average (TMA) – average across the data set of the rolling
  61        window of <N> values without the outliers. Used here to calculate TMSD.
  62     #. Trimmed Moving Standard Deviation (TMSD) – standard deviation over the
  63        data set of the rolling window of <N> values without the outliers,
  64        requires calculating TMA. Used here for anomaly detection.
  65     #. Moving Median (MM) - median across the data set of the rolling window of
  66        <N> values with all data points, including the outliers. Used here for
  67        anomaly detection.
  68
  69 Anomaly Detection
  70 `````````````````
  71
  72 Based on the assumption that all performance measurements can be modelled using
  73 normal distribution, a three-sigma rule of thumb is proposed as the main
  74 criteria for anomaly detection.
  75
  76 Three-sigma rule of thumb, aka 68–95–99.7 rule, is a shorthand used to capture
  77 the percentage of values that lie within a band around the average (mean) in a
  78 normal distribution within a width of two, four and six standard deviations.
  79 More accurately 68.27%, 95.45% and 99.73% of the result values should lie within
  80 one, two or three standard deviations of the mean, see figure below.
  81
  82 To verify compliance of test result with value X against defined trend analysis
  83 metric and detect anomalies, three simple evaluation criteria are proposed:
  84
  85 ::
  86
  87     Test Result Evaluation      Reported Result     Reported Reason     Trending Graph Markers
  88     ==========================================================================================
  89           Normal                      Pass              Normal            Part of plot line
  90           Regression                  Fail              Regression        Red circle
  91           Progression                 Pass              Progression       Green circle
  92
  93 Jenkins job cumulative results:
  94
  95     #. Pass - if all detection results are Pass or Warning.
  96     #. Fail - if any detection result is Fail.
  97
  98 Performance Trending (PT)
  99 `````````````````````````
 100
 101 CSIT PT runs regular performance test jobs finding MRR, PDR and NDR per test
 102 cases. PT is designed as follows:
 103
 104     #. PT job triggers:
 105
 106         #. Periodic e.g. daily.
 107         #. On-demand gerrit triggered.
 108         #. Other periodic TBD.
 109
 110     #. Measurements and calculations per test case:
 111
 112         #. MRR Max Received Rate
 113
 114             #. Measured: Unlimited tolerance of packet loss.
 115             #. Send packets at link rate, count total received packets, divide
 116                by test trial period.
 117
 118         #. Optimized binary search bounds for PDR and NDR tests:
 119
 120             #. Calculated: High and low bounds for binary search based on MRR
 121                and pre-defined Packet Loss Ratio (PLR).
 122             #. HighBound=MRR, LowBound=to-be-determined.
 123             #. PLR – acceptable loss ratio for PDR tests, currently set to 0.5%
 124                for all performance tests.
 125
 126         #. PDR and NDR:
 127
 128             #. Run binary search within the calculated bounds, find PDR and NDR.
 129             #. Measured: PDR Partial Drop Rate – limited non-zero tolerance of
 130                packet loss.
 131             #. Measured: NDR Non Drop Rate - zero packet loss.
 132
 133     #. Archive MRR, PDR and NDR per test case.
 134     #. Archive counters collected at MRR, PDR and NDR.
 135
 136 Performance Analysis (PA)
 137 `````````````````````````
 138
 139 CSIT PA runs performance analysis, change detection and trending using specified
 140 trend analysis metrics over the rolling window of last <N> sets of historical
 141 measurement data. PA is defined as follows:
 142
 143     #. PA job triggers:
 144
 145         #. By PT job at its completion.
 146         #. On-demand gerrit triggered.
 147         #. Other periodic TBD.
 148
 149     #. Download and parse archived historical data and the new data:
 150
 151         #. New data from latest PT job is evaluated against the rolling window
 152            of <N> sets of historical data.
 153         #. Download RF output.xml files and compressed archived data.
 154         #. Parse out the data filtering test cases listed in PA specification
 155            (part of CSIT PAL specification file).
 156
 157     #. Calculate trend metrics for the rolling window of <N> sets of historical data:
 158
 159         #. Calculate quartiles Q1, Q2, Q3.
 160         #. Trim outliers using IQR.
 161         #. Calculate TMA and TMSD.
 162         #. Calculate normal trending range per test case based on TMA and TMSD.
 163
 164     #. Evaluate new test data against trend metrics:
 165
 166         #. If within the range of (TMA +/- 3*TMSD) => Result = Pass,
 167            Reason = Normal.
 168         #. If below the range => Result = Fail, Reason = Regression.
 169         #. If above the range => Result = Pass, Reason = Progression.
 170
 171     #. Generate and publish results
 172
 173         #. Relay evaluation result to job result.
 174         #. Generate a new set of trend analysis summary graphs and drill-down
 175            graphs.
 176
 177             #. Summary graphs to include measured values with Normal,
 178                Progression and Regression markers. MM shown in the background if
 179                possible.
 180             #. Drill-down graphs to include MM, TMA and TMSD.
 181
 182         #. Publish trend analysis graphs in html format.