7 With increasing number of features and code changes in the FD.io VPP data plane
8 codebase, it is increasingly difficult to measure and detect VPP data plane
9 performance changes. Similarly, once degradation is detected, it is getting
10 harder to bisect the source code in search of the Bad code change or addition.
11 The problem is further escalated by a large combination of compute platforms
12 that VPP is running and used on, including Intel Xeon, Intel Atom, ARM Aarch64.
14 Existing FD.io CSIT continuous performance trending test jobs help, but they
15 rely on human factors for anomaly detection, and as such are error prone and
16 unreliable, as the volume of data generated by these jobs is growing
19 Proposed solution is to eliminate human factor and fully automate performance
20 trending, regression and progression detection, as well as bisecting.
22 This document describes a high-level design of a system for continuous
23 measuring, trending and performance change detection for FD.io VPP SW data
24 plane. It builds upon the existing CSIT framework with extensions to its
25 throughput testing methodology, CSIT data analytics engine
26 (PAL – Presentation-and-Analytics-Layer) and associated Jenkins jobs
29 Continuous Performance Trending and Analysis
30 --------------------------------------------
32 Proposed design replaces existing CSIT performance trending jobs and tests with
33 new Performance Trending (PT) CSIT module and separate Performance Analysis (PA)
34 module ingesting results from PT and analysing, detecting and reporting any
35 performance anomalies using historical trending data and statistical metrics.
36 PA does also produce trending graphs with summary and drill-down views across
37 all specified tests that can be reviewed and inspected regularly by FD.io
38 developers and users community.
43 All measured performance trend data is treated as time-series data that can be
44 modelled using normal distribution. After trimming the outliers, the average and
45 deviations from average are used for detecting performance change anomalies
46 following the three-sigma rule of thumb (a.k.a. 68-95-99.7 rule).
51 Following statistical metrics are proposed as performance trend indicators over
52 the rolling window of last <N> sets of historical measurement data:
54 #. Quartiles Q1, Q2, Q3 – three points dividing a ranked set of data set
55 into four equal parts, Q2 is the median of the data.
56 #. Inter Quartile Range IQR=Q3-Q1 – measure of variability, used here to
58 #. Outliers – extreme values that are at least 1.5*IQR below Q1, or at
59 least 1.5*IQR above Q3.
60 #. Trimmed Moving Average (TMA) – average across the data set of the rolling
61 window of <N> values without the outliers. Used here to calculate TMSD.
62 #. Trimmed Moving Standard Deviation (TMSD) – standard deviation over the
63 data set of the rolling window of <N> values without the outliers,
64 requires calculating TMA. Used here for anomaly detection.
65 #. Moving Median (MM) - median across the data set of the rolling window of
66 <N> values with all data points, including the outliers. Used here for
72 Based on the assumption that all performance measurements can be modelled using
73 normal distribution, a three-sigma rule of thumb is proposed as the main
74 criteria for anomaly detection.
76 Three-sigma rule of thumb, aka 68–95–99.7 rule, is a shorthand used to capture
77 the percentage of values that lie within a band around the average (mean) in a
78 normal distribution within a width of two, four and six standard deviations.
79 More accurately 68.27%, 95.45% and 99.73% of the result values should lie within
80 one, two or three standard deviations of the mean, see figure below.
82 To verify compliance of test result with value X against defined trend analysis
83 metric and detect anomalies, three simple evaluation criteria are proposed:
87 Test Result Evaluation Reported Result Reported Reason Trending Graph Markers
88 ==========================================================================================
89 Normal Pass Normal Part of plot line
90 Regression Fail Regression Red circle
91 Progression Pass Progression Green circle
93 Jenkins job cumulative results:
95 #. Pass - if all detection results are Pass or Warning.
96 #. Fail - if any detection result is Fail.
98 Performance Trending (PT)
99 `````````````````````````
101 CSIT PT runs regular performance test jobs finding MRR, PDR and NDR per test
102 cases. PT is designed as follows:
106 #. Periodic e.g. daily.
107 #. On-demand gerrit triggered.
108 #. Other periodic TBD.
110 #. Measurements and calculations per test case:
112 #. MRR Max Received Rate
114 #. Measured: Unlimited tolerance of packet loss.
115 #. Send packets at link rate, count total received packets, divide
116 by test trial period.
118 #. Optimized binary search bounds for PDR and NDR tests:
120 #. Calculated: High and low bounds for binary search based on MRR
121 and pre-defined Packet Loss Ratio (PLR).
122 #. HighBound=MRR, LowBound=to-be-determined.
123 #. PLR – acceptable loss ratio for PDR tests, currently set to 0.5%
124 for all performance tests.
128 #. Run binary search within the calculated bounds, find PDR and NDR.
129 #. Measured: PDR Partial Drop Rate – limited non-zero tolerance of
131 #. Measured: NDR Non Drop Rate - zero packet loss.
133 #. Archive MRR, PDR and NDR per test case.
134 #. Archive counters collected at MRR, PDR and NDR.
136 Performance Analysis (PA)
137 `````````````````````````
139 CSIT PA runs performance analysis, change detection and trending using specified
140 trend analysis metrics over the rolling window of last <N> sets of historical
141 measurement data. PA is defined as follows:
145 #. By PT job at its completion.
146 #. On-demand gerrit triggered.
147 #. Other periodic TBD.
149 #. Download and parse archived historical data and the new data:
151 #. New data from latest PT job is evaluated against the rolling window
152 of <N> sets of historical data.
153 #. Download RF output.xml files and compressed archived data.
154 #. Parse out the data filtering test cases listed in PA specification
155 (part of CSIT PAL specification file).
157 #. Calculate trend metrics for the rolling window of <N> sets of historical data:
159 #. Calculate quartiles Q1, Q2, Q3.
160 #. Trim outliers using IQR.
161 #. Calculate TMA and TMSD.
162 #. Calculate normal trending range per test case based on TMA and TMSD.
164 #. Evaluate new test data against trend metrics:
166 #. If within the range of (TMA +/- 3*TMSD) => Result = Pass,
168 #. If below the range => Result = Fail, Reason = Regression.
169 #. If above the range => Result = Pass, Reason = Progression.
171 #. Generate and publish results
173 #. Relay evaluation result to job result.
174 #. Generate a new set of trend analysis summary graphs and drill-down
177 #. Summary graphs to include measured values with Normal,
178 Progression and Regression markers. MM shown in the background if
180 #. Drill-down graphs to include MM, TMA and TMSD.
182 #. Publish trend analysis graphs in html format.