X-Git-Url: https://gerrit.fd.io/r/gitweb?p=csit.git;a=blobdiff_plain;f=docs%2Fcpta%2Fintroduction%2Findex.rst;h=991181aff461e14ba66155708b4d3d8c8e8fd978;hp=944a56e3835095ece18424069bfbe2a8bc12c81f;hb=2d001ed910d3835848fccb7bb96a98a5270698fe;hpb=6ef96e8c0a95dc9ccfaf51fe51b60b7934ed6e89 diff --git a/docs/cpta/introduction/index.rst b/docs/cpta/introduction/index.rst index 944a56e383..991181aff4 100644 --- a/docs/cpta/introduction/index.rst +++ b/docs/cpta/introduction/index.rst @@ -1,181 +1,76 @@ -Introduction -============ - -Purpose -------- - -With increasing number of features and code changes in the FD.io VPP data plane -codebase, it is increasingly difficult to measure and detect VPP data plane -performance changes. Similarly, once degradation is detected, it is getting -harder to bisect the source code in search of the Bad code change or addition. -The problem is further escalated by a large combination of compute platforms -that VPP is running and used on, including Intel Xeon, Intel Atom, ARM Aarch64. - -Existing FD.io CSIT continuous performance trending test jobs help, but they -rely on human factors for anomaly detection, and as such are error prone and -unreliable, as the volume of data generated by these jobs is growing -exponentially. - -Proposed solution is to eliminate human factor and fully automate performance -trending, regression and progression detection, as well as bisecting. - -This document describes a high-level design of a system for continuous -measuring, trending and performance change detection for FD.io VPP SW data -plane. It builds upon the existing CSIT framework with extensions to its -throughput testing methodology, CSIT data analytics engine -(PAL – Presentation-and-Analytics-Layer) and associated Jenkins jobs -definitions. - -Continuous Performance Trending and Analysis --------------------------------------------- - -Proposed design replaces existing CSIT performance trending jobs and tests with -new Performance Trending (PT) CSIT module and separate Performance Analysis (PA) -module ingesting results from PT and analysing, detecting and reporting any -performance anomalies using historical trending data and statistical metrics. -PA does also produce trending graphs with summary and drill-down views across -all specified tests that can be reviewed and inspected regularly by FD.io -developers and users community. - -Trend Analysis -`````````````` - -All measured performance trend data is treated as time-series data that can be -modelled using normal distribution. After trimming the outliers, the average and -deviations from average are used for detecting performance change anomalies -following the three-sigma rule of thumb (a.k.a. 68-95-99.7 rule). - -Analysis Metrics -```````````````` - -Following statistical metrics are proposed as performance trend indicators over -the rolling window of last sets of historical measurement data: - - #. Quartiles Q1, Q2, Q3 – three points dividing a ranked set of data set - into four equal parts, Q2 is the median of the data. - #. Inter Quartile Range IQR=Q3-Q1 – measure of variability, used here to - eliminate outliers. - #. Outliers – extreme values that are at least 1.5*IQR below Q1, or at - least 1.5*IQR above Q3. - #. Trimmed Moving Average (TMA) – average across the data set of the rolling - window of values without the outliers. Used here to calculate TMSD. - #. Trimmed Moving Standard Deviation (TMSD) – standard deviation over the - data set of the rolling window of values without the outliers, - requires calculating TMA. Used here for anomaly detection. - #. Moving Median (MM) - median across the data set of the rolling window of - values with all data points, including the outliers. Used here for - anomaly detection. - -Anomaly Detection -````````````````` - -Based on the assumption that all performance measurements can be modelled using -normal distribution, a three-sigma rule of thumb is proposed as the main -criteria for anomaly detection. - -Three-sigma rule of thumb, aka 68–95–99.7 rule, is a shorthand used to capture -the percentage of values that lie within a band around the average (mean) in a -normal distribution within a width of two, four and six standard deviations. -More accurately 68.27%, 95.45% and 99.73% of the result values should lie within -one, two or three standard deviations of the mean, see figure below. - -To verify compliance of test result with value X against defined trend analysis -metric and detect anomalies, three simple evaluation criteria are proposed: - -:: - - Test Result Evaluation Reported Result Reported Reason Trending Graph Markers - ========================================================================================== - Normal Pass Normal Part of plot line - Regression Fail Regression Red circle - Progression Pass Progression Green circle - -Jenkins job cumulative results: - - #. Pass - if all detection results are Pass or Warning. - #. Fail - if any detection result is Fail. - -Performance Trending (PT) -````````````````````````` - -CSIT PT runs regular performance test jobs finding MRR, PDR and NDR per test -cases. PT is designed as follows: - - #. PT job triggers: - - #. Periodic e.g. daily. - #. On-demand gerrit triggered. - #. Other periodic TBD. - - #. Measurements and calculations per test case: - - #. MRR Max Received Rate - - #. Measured: Unlimited tolerance of packet loss. - #. Send packets at link rate, count total received packets, divide - by test trial period. - - #. Optimized binary search bounds for PDR and NDR tests: - - #. Calculated: High and low bounds for binary search based on MRR - and pre-defined Packet Loss Ratio (PLR). - #. HighBound=MRR, LowBound=to-be-determined. - #. PLR – acceptable loss ratio for PDR tests, currently set to 0.5% - for all performance tests. - - #. PDR and NDR: - - #. Run binary search within the calculated bounds, find PDR and NDR. - #. Measured: PDR Partial Drop Rate – limited non-zero tolerance of - packet loss. - #. Measured: NDR Non Drop Rate - zero packet loss. - - #. Archive MRR, PDR and NDR per test case. - #. Archive counters collected at MRR, PDR and NDR. - -Performance Analysis (PA) -````````````````````````` - -CSIT PA runs performance analysis, change detection and trending using specified -trend analysis metrics over the rolling window of last sets of historical -measurement data. PA is defined as follows: - - #. PA job triggers: - - #. By PT job at its completion. - #. Manually from Jenkins UI. - - #. Download and parse archived historical data and the new data: - - #. New data from latest PT job is evaluated against the rolling window - of sets of historical data. - #. Download RF output.xml files and compressed archived data. - #. Parse out the data filtering test cases listed in PA specification - (part of CSIT PAL specification file). - - #. Calculate trend metrics for the rolling window of sets of historical data: - - #. Calculate quartiles Q1, Q2, Q3. - #. Trim outliers using IQR. - #. Calculate TMA and TMSD. - #. Calculate normal trending range per test case based on TMA and TMSD. - - #. Evaluate new test data against trend metrics: - - #. If within the range of (TMA +/- 3*TMSD) => Result = Pass, - Reason = Normal. - #. If below the range => Result = Fail, Reason = Regression. - #. If above the range => Result = Pass, Reason = Progression. - - #. Generate and publish results - - #. Relay evaluation result to job result. - #. Generate a new set of trend analysis summary graphs and drill-down - graphs. - - #. Summary graphs to include measured values with Normal, - Progression and Regression markers. MM shown in the background if - possible. - #. Drill-down graphs to include MM, TMA and TMSD. - - #. Publish trend analysis graphs in html format. +VPP Performance Dashboard +========================= + +Description +----------- + +Performance dashboard tables provide the latest VPP throughput trend, +trend compliance and detected anomalies, all on a per VPP test case +basis. Linked trendline graphs enable further drill-down into the +trendline compliance, sequence and nature of anomalies, as well as +pointers to performance test builds/logs and VPP builds. Performance +trending is currently based on the Maximum Receive Rate (MRR) tests. MRR +tests measure the packet forwarding rate under the maximum load offered +by traffic generator over a set trial duration, regardless of packet +loss. See :ref:`trending_methodology` section for more detail including +trend and anomaly calculations. + +Data samples are generated by the CSIT VPP performance trending jobs +executed twice a day (target start: every 12 hrs, 02:00, 14:00 UTC). All +trend and anomaly evaluation is based on a rolling window of data +samples, covering last 7 days. + +Failed tests +------------ + +The table lists the tests which failed over the runs of the trending +jobs. + +Legend to the table: + + - **Test Case**: name of FD.io CSIT test case, naming convention + `here `_. + - **Fails [#]**: number of fails of the tests over the period. + - **Last Fail [Date]**: the date and time when the test failed the last + time. + - **Last Fail [VPP Build]**: VPP build which was tested when the test failed + the last time. + - **Last Fail [CSIT Build]**: the last CSIT build where the test failed. + +.. include:: ../../../_build/_static/vpp/failed-tests.rst + +Dashboard +--------- + +Legend to the tables: + + - **Test Case**: name of FD.io CSIT test case, naming convention + `here `_. + - **Trend [Mpps]**: last value of performance trend. + - **Short-Term Change [%]**: Relative change of last trend value + vs. last week trend value. + - **Long-Term Change [%]**: Relative change of last trend value vs. + maximum of trend values over the last quarter except last week. + - **Regressions [#]**: Number of regressions detected. + - **Progressions [#]**: Number of progressions detected. + - **Outliers [#]**: Number of outliers detected. + +Tested VPP worker-thread-core combinations (1t1c, 2t2c, 4t4c) are listed +in separate tables in section 1.x. Followed by trending methodology in +section 2. and trendline graphs in sections 3.x. Performance test data +used for trendline graphs is provided in sections 4.x. + +VPP worker on 1t1c +`````````````````` + +.. include:: ../../../_build/_static/vpp/performance-trending-dashboard-1t1c.rst + +VPP worker on 2t2c +`````````````````` + +.. include:: ../../../_build/_static/vpp/performance-trending-dashboard-2t2c.rst + +VPP worker on 4t4c +`````````````````` + +.. include:: ../../../_build/_static/vpp/performance-trending-dashboard-4t4c.rst