===================================================
Presentation and Analytics Layer - Low Level Design
===================================================

Table of content
----------------

 .. toctree:: .
    :maxdepth: 3


Overview
--------

The presentation and analytics layer (PAL) is the fourth layer of CSIT
hierarchy. The model of presentation and analytics layer consists of four
sub-layers, from bottom to top:

 - sL1 - Data - input data to be processed:

   - Static content - .rst text files, .svg static figures, and other files
     stored in the CSIT git repository.
   - Data to process - .xml files generated by Jenkins jobs executing tests,
     stored as robot results files (output.xml).
   - Specification - .yaml file with the models of report elements (tables,
     plots, layout, ...) generated by this tool. There is also the configuration
     of the tool and the specification of input data (jobs and builds).

 - sL2 - Data processing

   - The data are read from the specified input files (.xml) and stored as
     multi-indexed `pandas.Series <https://pandas.pydata.org/pandas-docs/stable/
     generated/pandas.Series.html>`_.
   - This layer provides also interface to input data and filtering of the input
     data.

 - sL3 - Data presentation - This layer generates the elements specified in the
   specification file:

   - Tables: .csv files linked to static .rst files
   - Plots: .html files generated using plot.ly linked to static .rst files

 - sL4 - Report generation - Sphinx generates required formats and versions:

   - formats: html, pdf
   - versions: minimal, full (TODO: define the names and scope of versions)


Data
----

Report Specification
````````````````````

The report specification file defines which data is used and which outputs are
generated. It is human readable and structured. It is easy to add / remove /
change items. The specification includes:

 - Specification of the environment
 - Configuration of debug mode (optional)
 - Specification of input data (jobs, builds, files, ...)
 - Specification of the output
 - What and how is generated
   - What: plots, tables
   - How: specification of all properties and parameters
 - .yaml format

Structure of the specification file
'''''''''''''''''''''''''''''''''''

The specification file is organized as a list of dictionaries distinguished by
the type:

 | -
 |   type: "environment"
 |
 | -
 |   type: "debug"
 |
 | -
 |   type: "input"
 |
 | -
 |   type: "output"
 |
 | -
 |   type: "table"
 |
 | -
 |   type: "plot"

Each type represents a section. The sections "environment", "debug", "input" and
"output" are only once in the specification; "table" and "plot" can be there
multiple times.

Sections "debug", "table" and "plot" are optional.

Table(s) and plot(s) are referred as "elements" in this text. It is possible to
define and implement other elements if needed.


Section: Environment
''''''''''''''''''''

This section has these parts:

 - type: "environment" - says that this is the section "environment"
 - configuration - configuration of the PAL
 - paths - paths used by the PAL
 - urls - urls pointing to the data sources
 - make-dirs - a list of the directories to be created by the PAL while
   preparing the environment
 - remove-dirs - a list of the directories to be removed while cleaning the
   environment
 - build-dirs - a list of the directories where the results are stored

The structure of the section "Environment" is as follows (example):

 | -
 |   type: "environment"
 |   configuration:
 |     # Debug mode:
 |     # If the section "type: debug" is missing, CFG[DEBUG] is set to 0.
 |     CFG[DEBUG]: 1
 |
 |   paths:
 |     DIR[WORKING]: "_tmp"
 |     DIR[BUILD,HTML]: "_build"
 |     DIR[BUILD,LATEX]: "_build_latex"
 |     DIR[RST]: "../../../docs/report"
 |
 |     DIR[WORKING,DATA]: "{DIR[WORKING]}/data"
 |
 |     DIR[STATIC,VPP]: "{DIR[STATIC]}/vpp"
 |     DIR[STATIC,ARCH]: "{DIR[STATIC]}/archive"
 |     DIR[STATIC,TREND]: "{DIR[STATIC]}/trending"
 |
 |     DIR[PLOT,DPDK]: "{DIR[WORKING]}/dpdk_plot"
 |
 |     DIR[DTR]: "{DIR[RST]}/detailed_test_results"
 |     DIR[DTR,PERF,DPDK]: "{DIR[DTR]}/dpdk_performance_results"
 |     DIR[DTR,PERF,VPP]: "{DIR[DTR]}/vpp_performance_results"
 |     DIR[DTR,PERF,HC]: "{DIR[DTR]}/honeycomb_performance_results"
 |     DIR[DTR,FUNC,VPP]: "{DIR[DTR]}/vpp_functional_results"
 |     DIR[DTR,FUNC,HC]: "{DIR[DTR]}/honeycomb_functional_results"
 |     DIR[DTR,FUNC,NSHSFC]: "{DIR[DTR]}/nshsfc_functional_results"
 |     DIR[DTR,PERF,VPP,IMPRV]: "{DIR[RST]}/vpp_performance_tests/performance_improvements"
 |
 |     DIR[DTC]: "{DIR[RST]}/test_configuration"
 |     DIR[DTC,PERF,VPP]: "{DIR[DTC]}/vpp_performance_configuration"
 |     DIR[DTC,FUNC,VPP]: "{DIR[DTC]}/vpp_functional_configuration"
 |
 |     DIR[DTO]: "{DIR[RST]}/test_operational_data"
 |     DIR[DTO,PERF,VPP]: "{DIR[DTO]}/vpp_performance_operational_data"
 |
 |     DIR[CSS_PATCH_FILE]: "{DIR[STATIC]}/theme_overrides.css"
 |
 |   urls:
 |     URL[JENKINS,CSIT]: "https://jenkins.fd.io/view/csit/job"
 |     URL[JENKINS,HC]: "https://jenkins.fd.io/view/hc2vpp/job"
 |
 |   make-dirs:
 |   # List the directories which are created while preparing the environment.
 |   # All directories MUST be defined in "paths" section.
 |   - "DIR[WORKING,DATA]"
 |   - "DIR[STATIC,VPP]"
 |   - "DIR[STATIC,DPDK]"
 |   - "DIR[STATIC,ARCH]"
 |   - "DIR[STATIC,TREND]"
 |   - "DIR[PLOT,VPP]"
 |   - "DIR[PLOT,DPDK]"
 |   - "DIR[BUILD,LATEX]"
 |
 |   remove-dirs:
 |   # List the directories which are deleted while cleaning the environment.
 |   # All directories MUST be defined in "paths" section.
 |   - "DIR[WORKING]"
 |
 |   build-dirs:
 |   # List the directories where the results (build) is stored.
 |   # All directories MUST be defined in "paths" section.
 |   - "DIR[BUILD,HTML]"
 |   - "DIR[BUILD,LATEX]"

It is possible to use defined items in the definition of other items, e.g.:

 | DIR[WORKING,DATA]: "{DIR[WORKING]}/data"

will be automatically changed to

 | DIR[WORKING,DATA]: "_tmp/data"


Section: Debug mode
'''''''''''''''''''

This section is optional and it configures the debug mode. It is used if we
do not want to download data files and use local files instead of them.

If the debug mode is configured, the "input" section is ignored.

This section has these parts:

 - type: "debug" - says that this is the section "debug"
 - general

   - input-format - xml or zip
   - extract - if "zip" is defined as the input format, this file is extracted
     from the zip file, otherwise this parameter is ignored

 - builds - list of builds which data is used. There must be defined the job
   name as the key and then list of builds and their output files.

The structure of the section "Debug" is as follows (example):

 | -
 |   type: "debug"
 |   general:
 |     input-format: "xml"  # zip or xml
 |     extract: "output.xml"  # Only for zip
 |   builds:
 |     # The files must be in the directory DIR[WORKING,DATA]
 |     csit-vpp-perf-1704-all:
 |     -
 |       build: 17
 |       file: "{DIR[WORKING,DATA]}/csit-vpp-perf-1707-all__17__output.xml"


Section: Input
''''''''''''''

This section is mandatory if the debug mode is not used, and defines the data
which will be used to generate elements.

This section has these parts:

 - type: "input" - says that this section is the "input"
 - general - parameters common to all builds:

   - file-name: file to be downloaded
   - download-path: path to be added to url pointing to the file, e.g.:
     "{job}/{build}/robot/report/*zip*/{filename}"; {job}, {build} and
     {filename} are replaced by proper values defined in this section
   - extract: file to be extracted from downloaded zip file, e.g.: "output.xml";
     if xml file is downloaded, this parameter is ignored.

 - builds - list of jobs (keys) and builds which output data will be downloaded

The structure of the section "Input" is as follows (example from 17.07 report):

 | -
 |   type: "input"  # Ignored in the debug mode
 |   general:
 |     file-name: "robot-plugin.zip"
 |     download-path: "{job}/{build}/robot/report/*zip*/{filename}"
 |     extract: "output.xml"
 |   builds:
 |     csit-vpp-perf-1707-all:
 |     - 9
 |     - 10
 |     - 13
 |     - 14
 |     - 15
 |     - 16
 |     - 17
 |     - 18
 |     - 19
 |     - 21
 |     - 22
 |     csit-dpdk-perf-1704-all:
 |     - 1
 |     - 2
 |     - 3
 |     - 4
 |     - 5
 |     - 6
 |     - 7
 |     - 8
 |     - 9
 |     - 10
 |     csit-vpp-functional-1707-ubuntu1604-virl:
 |     - lastSuccessfulBuild
 |     hc2vpp-csit-perf-master-ubuntu1604:
 |     - 8
 |     - 9
 |     hc2vpp-csit-integration-1707-ubuntu1604:
 |     - lastSuccessfulBuild
 |     csit-nsh_sfc-verify-func-1707-ubuntu1604-virl:
 |     - 2
 |     csit-vpp-perf-1704-all:
 |     - 6
 |     - 7
 |     - 8
 |     - 9
 |     - 10
 |     - 12
 |     - 14
 |     - 15
 |     - 16
 |     - 17
 |     csit-dpdk-perf-1704-all:
 |     - 1
 |     - 2
 |     - 3
 |     - 4
 |     - 6
 |     - 7
 |     - 8
 |     - 9
 |     - 10
 |     - 11


Section: Output
'''''''''''''''

This section specifies which format(s) will be generated (html, pdf) and which
versions for each format will be generated.

This section has these parts:

 - type: "output" - says that this section is the "output"
 - format: html or pdf
 - version: defined for each format separately

The structure of the section "Output" is as follows (example):

 | -
 |   type: "output"
 |   format:
 |     html:
 |     - full
 |     pdf:
 |     - full
 |     - minimal

TODO: define the names of versions


Content of "minimal" version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TODO: define the name and content of this version


Section: Table
''''''''''''''

This section defines a table to be generated. There can be 0 or more "table"
sections.

This section has these parts:

 - type: "table" - says that this section defines a table
 - algorithm: Algorithm which is used to generate the table. The other
   parameters in this section must provide all information needed by the used
   algorithm.
 - template: (optional) a .csv file used as a template while generating the
   table
 - output-file-format: (optional) format of the output file.
 - output-file: file which the table will be written to
 - columns: specification of table columns
 - data: Specify the jobs and builds which data is used to generate the table
 - filter: filter based on tags applied on the input data
 - parameters: Only these parameters will be put to the output data structure

The structure of the section "Table" is as follows (example):

 | -
 |   type: "table"
 |   title: "Performance improvments"
 |   algoritm: "performance-improvements"
 |   template: "templates/tmpl_performance_improvements.csv"
 |   output-file-format: "csv"
 |   output-file: "{DIR[WORKING]}/path/to/my_table.csv"
 |   columns:
 |   -
 |     title: "VPP Functionality"
 |     data: "template 2"
 |   -
 |     title: "Test Name"
 |     data: "template 3"
 |   -
 |     title: "VPP-17.04 mean [Mpps]"
 |     data: "vpp 1704 performance mean"
 |   -
 |     title: "VPP-17.07 mean [Mpps]"
 |     data: "vpp 1707 performance mean"
 |   -
 |     title: "VPP-17.07 stdev [Mpps]"
 |     data: "vpp 1707 performance stdev"
 |   -
 |     title: "17.04 to 17.07 change"
 |     data: "change-relative 4 5"
 |   rows: "generated"
 |   data:
 |     csit-vpp-perf-1707-all:
 |     - 13
 |     - 16
 |     - 17
 |   # Keep this formatting, the filter is enclosed with " (quotation mark) and
 |   # each tag is enclosed with ' (apostrophe).
 |   filter: "'64B' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"
 |   parameters:
 |   - "throughput"
 |   - "latency"


Section: Plot
'''''''''''''

This section defines a plot to be generated. There can be 0 or more "plot"
sections.

This section has these parts:

 - type: "plot" - says that this section defines a plot
 - output-file-format: (optional) format of the output file.
 - output-file: file which the plot will be written to
 - plot-type: Type of the plot. The other parameters in this section must
   provide all information needed by plot.ly to generate the plot. For example:

   - x-axis: x-axis title
   - y-axis: y-axis title

 - data: Specify the jobs and builds which data is used to generate the plot
 - filter: filter applied on the input data

The structure of the section "Plot" is as follows (example):

 | -
 |   type: "plot"
 |   plot-type: "performance-box"   # box, line
 |   output-file-type: "html"
 |   output-file: "{DIR[WORKING]}/path/to/my_plot.html"
 |   plot-title: "plot title"
 |   x-axis: "x-axis title"
 |   y-axis: "y-axis title"
 |   data:
 |     csit-vpp-perf-1707-all:
 |     - 9
 |     - 10
 |     - 13
 |     - 14
 |     - 15
 |     - 16
 |     - 17
 |     - 18
 |     - 19
 |     - 21
 |   filter:
 |     - "'64B' and 'BASE' and 'NDRDISC' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"


Static content
``````````````

 - Manually created / edited files
 - .rst files, static .csv files, static pictures (.svg), ...
 - Stored in CSIT gerrit

No more details about the static content in this document.


Data to process
```````````````

The PAL processes tests results and other information produced by Jenkins jobs.
The data are now stored as robot results in Jenkins (TODO: store the data in
nexus) either as .zip and / or .xml files.


Data processing
---------------

As the first step, the data are downloaded and stored locally (typically on a
Jenkins slave). If .zip files are used, the given .xml files are extracted for
further processing.

Parsing of the .xml files is performed by a class derived from
"robot.api.ResultVisitor", only necessary methods are overridden. All and only
necessary data is extracted from .xml file and stored in a structured form.

The parsed data are stored as the multi-indexed pandas.Series data type. Its
structure is as follows:

 | <job name>
 |   <build>
 |     <metadata>
 |     <suites>
 |     <tests>

"job name", "build", "metadata", "suites", "tests" are indexes to access the
data. For example:

 | data =
 |
 | job 1 name:
 |   build 1:
 |     metadata: metadata
 |     suites: suites
 |     tests: tests
 |   ...
 |   build N:
 |     metadata: metadata
 |     suites: suites
 |     build 1: tests
 | ...
 | job M name:
 |   build 1:
 |     metadata: metadata
 |     suites: suites
 |     tests: tests
 |   ...
 |   build N:
 |     metadata: metadata
 |     suites: suites
 |     tests: tests

Using indexes data["job 1 name"]["build 1"]["tests"] (e.g.:
data["csit-vpp-perf-1704-all"]["17"]["tests"]) we get a list of all tests with
all tests data.

Data will not be accessible directly using indexes, but using getters and
filters.

**Structure of metadata:**

 | "metadata": {
 |     "version": "VPP version",
 |     "job": "Jenkins job name"
 |     "build": "Information about the build"
 | },

**Structure of suites:**

 | "suites": {
 |     "Suite name 1": {
 |         "doc": "Suite 1 documentation"
 |     }
 |     "Suite name N": {
 |         "doc": "Suite N documentation"
 |     }

**Structure of tests:**

 | "tests": {
 |     "ID": {
 |         "name": "Test name",
 |         "parent": "Name of the parent of the test",
 |         "tags": ["tag 1", "tag 2", "tag n"],
 |         "type": "PDR" | "NDR",
 |         "throughput": {
 |             "value": int,
 |             "unit": "pps" | "bps" | "percentage"
 |         },
 |         "latency": {
 |             "direction1": {
 |                 "100": {
 |                     "min": int,
 |                     "avg": int,
 |                     "max": int
 |                 },
 |                 "50": {  # Only for NDR
 |                     "min": int,
 |                     "avg": int,
 |                     "max": int
 |                 },
 |                 "10": {  # Only for NDR
 |                     "min": int,
 |                     "avg": int,
 |                     "max": int
 |                 }
 |             },
 |             "direction2": {
 |                 "100": {
 |                     "min": int,
 |                     "avg": int,
 |                     "max": int
 |                 },
 |                 "50": {  # Only for NDR
 |                     "min": int,
 |                     "avg": int,
 |                     "max": int
 |                 },
 |                 "10": {  # Only for NDR
 |                     "min": int,
 |                     "avg": int,
 |                     "max": int
 |                 }
 |             }
 |         },
 |         "lossTolerance": "lossTolerance"  # Only for PDR
 |         "vat-history": {
 |             "DUT1": " DUT1 VAT History",
 |             "DUT2": " DUT2 VAT History"
 |         },
 |         "show-run": "Show Run"
 |     },
 |     "ID" {
 |         # next test
 |     }

Note: ID is the lowercase full path to the test.


Data filtering
``````````````

The first step when generating an element is getting the data needed to
construct the element. The data are filtered from the processed input data.

The data filtering is based on:

 - job name(s)
 - build number(s)
 - tag(s)
 - required data - only this data is included in the output.

WARNING: The filtering is based on tags, so be careful with tagging.

For example, the element which specification includes:

 |   data:
 |     csit-vpp-perf-1707-all:
 |     - 9
 |     - 10
 |     - 13
 |     - 14
 |     - 15
 |     - 16
 |     - 17
 |     - 18
 |     - 19
 |     - 21
 |   filter:
 |     - "'64B' and 'BASE' and 'NDRDISC' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"

will be constructed using data from the job "csit-vpp-perf-1707-all", all listed
builds and the tests which list of tags fulfils the condition specified in the
filter.

The output data structure for filtered test data is:

 | - job 1
 |   - build 1
 |     - test 1 ID:
 |       - parameter 1
 |       - parameter 2
 |       ...
 |       - parameter n
 |     ...
 |     - test n ID:
 |     ...
 |   ...
 |   - build n
 | ...
 | - job n


Data analytics
``````````````

Data analytics part implements:

 - methods to compute statistical data from the filtered input data
 - trending
 - etc.


Data presentation
-----------------

Generates the plots an tables according to the report models specified in
specification file. The elements are generated using algorithms and data
specified in their models.

Tables
``````

 - tables are generated by algorithms implemented in PAL, the model includes the
   algorithm and all necessary information.
 - output format: csv
 - generated tables are stored in specified directories and linked to .rst files


Plots
`````

 - `plot.ly <https://plot.ly/>`_ is currently used to generate plots, the model
   includes the type of plot and all necessary information.
 - output format: html
 - generated plots are stored in specified directories and linked to .rst files


Report generation
-----------------

Report is generated using Sphinx and Read the docs template. PAL generates html
and pdf format. It is possible to define the content of report by specifying
the version (TODO: define the names and content of versions)

The process
```````````

1. Read the specification
2. Read the input data
3. Process the input data
4. For element (plot, table) defined in specification:

   a. Get the data needed to construct the element using a filter
   b. Generate the element
   c. Store the element

5. Generate the report
6. Store the report (Nexus)

The process is model driven. The elements’ models (tables, plots and report
itself) are defined in the specification file. Script reads the elements’ models
from specification file and generates the elements.

It is easy to add elements to be generated, if a new kind of element is
required, only a new algorithm is implemented and integrated.