resources/tools/presentation/lld.rst

   1 ===================================================
   2 Presentation and Analytics Layer - Low Level Design
   3 ===================================================
   4
   5 Table of content
   6 ----------------
   7
   8  .. toctree:: .
   9     :maxdepth: 3
  10
  11
  12 Overview
  13 --------
  14
  15 The presentation and analytics layer (PAL) is the fourth layer of CSIT
  16 hierarchy. The model of presentation and analytics layer consists of four
  17 sub-layers, from bottom to top:
  18
  19  - sL1 - Data - input data to be processed:
  20
  21    - Static content - .rst text files, .svg static figures, and other files
  22      stored in the CSIT git repository.
  23    - Data to process - .xml files generated by Jenkins jobs executing tests,
  24      stored as robot results files (output.xml).
  25    - Specification - .yaml file with the models of report elements (tables,
  26      plots, layout, ...) generated by this tool. There is also the configuration
  27      of the tool and the specification of input data (jobs and builds).
  28
  29  - sL2 - Data processing
  30
  31    - The data are read from the specified input files (.xml) and stored as
  32      multi-indexed `pandas.Series <https://pandas.pydata.org/pandas-docs/stable/
  33      generated/pandas.Series.html>`_.
  34    - This layer provides also interface to input data and filtering of the input
  35      data.
  36
  37  - sL3 - Data presentation - This layer generates the elements specified in the
  38    specification file:
  39
  40    - Tables: .csv files linked to static .rst files
  41    - Plots: .html files generated using plot.ly linked to static .rst files
  42
  43  - sL4 - Report generation - Sphinx generates required formats and versions:
  44
  45    - formats: html, pdf
  46    - versions: minimal, full (TODO: define the names and scope of versions)
  47
  48
  49 Data
  50 ----
  51
  52 Report Specification
  53 ````````````````````
  54
  55 The report specification file defines which data is used and which outputs are
  56 generated. It is human readable and structured. It is easy to add / remove /
  57 change items. The specification includes:
  58
  59  - Specification of the environment
  60  - Configuration of debug mode (optional)
  61  - Specification of input data (jobs, builds, files, ...)
  62  - Specification of the output
  63  - What and how is generated
  64    - What: plots, tables
  65    - How: specification of all properties and parameters
  66  - .yaml format
  67
  68 Structure of the specification file
  69 '''''''''''''''''''''''''''''''''''
  70
  71 The specification file is organized as a list of dictionaries distinguished by
  72 the type:
  73
  74  | -
  75  |   type: "environment"
  76  |
  77  | -
  78  |   type: "debug"
  79  |
  80  | -
  81  |   type: "input"
  82  |
  83  | -
  84  |   type: "output"
  85  |
  86  | -
  87  |   type: "table"
  88  |
  89  | -
  90  |   type: "plot"
  91
  92 Each type represents a section. The sections "environment", "debug", "input" and
  93 "output" are only once in the specification; "table" and "plot" can be there
  94 multiple times.
  95
  96 Sections "debug", "table" and "plot" are optional.
  97
  98 Table(s) and plot(s) are referred as "elements" in this text. It is possible to
  99 define and implement other elements if needed.
 100
 101
 102 Section: Environment
 103 ''''''''''''''''''''
 104
 105 This section has these parts:
 106
 107  - type: "environment" - says that this is the section "environment"
 108  - configuration - configuration of the PAL
 109  - paths - paths used by the PAL
 110  - urls - urls pointing to the data sources
 111  - make-dirs - a list of the directories to be created by the PAL while
 112    preparing the environment
 113  - remove-dirs - a list of the directories to be removed while cleaning the
 114    environment
 115  - build-dirs - a list of the directories where the results are stored
 116
 117 The structure of the section "Environment" is as follows (example):
 118
 119  | -
 120  |   type: "environment"
 121  |   configuration:
 122  |     # Debug mode:
 123  |     # If the section "type: debug" is missing, CFG[DEBUG] is set to 0.
 124  |     CFG[DEBUG]: 1
 125  |
 126  |   paths:
 127  |     DIR[WORKING]: "_tmp"
 128  |     DIR[BUILD,HTML]: "_build"
 129  |     DIR[BUILD,LATEX]: "_build_latex"
 130  |     DIR[RST]: "../../../docs/report"
 131  |
 132  |     DIR[WORKING,DATA]: "{DIR[WORKING]}/data"
 133  |
 134  |     DIR[STATIC,VPP]: "{DIR[STATIC]}/vpp"
 135  |     DIR[STATIC,ARCH]: "{DIR[STATIC]}/archive"
 136  |     DIR[STATIC,TREND]: "{DIR[STATIC]}/trending"
 137  |
 138  |     DIR[PLOT,DPDK]: "{DIR[WORKING]}/dpdk_plot"
 139  |
 140  |     DIR[DTR]: "{DIR[RST]}/detailed_test_results"
 141  |     DIR[DTR,PERF,DPDK]: "{DIR[DTR]}/dpdk_performance_results"
 142  |     DIR[DTR,PERF,VPP]: "{DIR[DTR]}/vpp_performance_results"
 143  |     DIR[DTR,PERF,HC]: "{DIR[DTR]}/honeycomb_performance_results"
 144  |     DIR[DTR,FUNC,VPP]: "{DIR[DTR]}/vpp_functional_results"
 145  |     DIR[DTR,FUNC,HC]: "{DIR[DTR]}/honeycomb_functional_results"
 146  |     DIR[DTR,FUNC,NSHSFC]: "{DIR[DTR]}/nshsfc_functional_results"
 147  |     DIR[DTR,PERF,VPP,IMPRV]: "{DIR[RST]}/vpp_performance_tests/performance_improvements"
 148  |
 149  |     DIR[DTC]: "{DIR[RST]}/test_configuration"
 150  |     DIR[DTC,PERF,VPP]: "{DIR[DTC]}/vpp_performance_configuration"
 151  |     DIR[DTC,FUNC,VPP]: "{DIR[DTC]}/vpp_functional_configuration"
 152  |
 153  |     DIR[DTO]: "{DIR[RST]}/test_operational_data"
 154  |     DIR[DTO,PERF,VPP]: "{DIR[DTO]}/vpp_performance_operational_data"
 155  |
 156  |     DIR[CSS_PATCH_FILE]: "{DIR[STATIC]}/theme_overrides.css"
 157  |
 158  |   urls:
 159  |     URL[JENKINS,CSIT]: "https://jenkins.fd.io/view/csit/job"
 160  |     URL[JENKINS,HC]: "https://jenkins.fd.io/view/hc2vpp/job"
 161  |
 162  |   make-dirs:
 163  |   # List the directories which are created while preparing the environment.
 164  |   # All directories MUST be defined in "paths" section.
 165  |   - "DIR[WORKING,DATA]"
 166  |   - "DIR[STATIC,VPP]"
 167  |   - "DIR[STATIC,DPDK]"
 168  |   - "DIR[STATIC,ARCH]"
 169  |   - "DIR[STATIC,TREND]"
 170  |   - "DIR[PLOT,VPP]"
 171  |   - "DIR[PLOT,DPDK]"
 172  |   - "DIR[BUILD,LATEX]"
 173  |
 174  |   remove-dirs:
 175  |   # List the directories which are deleted while cleaning the environment.
 176  |   # All directories MUST be defined in "paths" section.
 177  |   - "DIR[WORKING]"
 178  |
 179  |   build-dirs:
 180  |   # List the directories where the results (build) is stored.
 181  |   # All directories MUST be defined in "paths" section.
 182  |   - "DIR[BUILD,HTML]"
 183  |   - "DIR[BUILD,LATEX]"
 184
 185 It is possible to use defined items in the definition of other items, e.g.:
 186
 187  | DIR[WORKING,DATA]: "{DIR[WORKING]}/data"
 188
 189 will be automatically changed to
 190
 191  | DIR[WORKING,DATA]: "_tmp/data"
 192
 193
 194 Section: Debug mode
 195 '''''''''''''''''''
 196
 197 This section is optional and it configures the debug mode. It is used if we
 198 do not want to download data files and use local files instead of them.
 199
 200 If the debug mode is configured, the "input" section is ignored.
 201
 202 This section has these parts:
 203
 204  - type: "debug" - says that this is the section "debug"
 205  - general
 206
 207    - input-format - xml or zip
 208    - extract - if "zip" is defined as the input format, this file is extracted
 209      from the zip file, otherwise this parameter is ignored
 210
 211  - builds - list of builds which data is used. There must be defined the job
 212    name as the key and then list of builds and their output files.
 213
 214 The structure of the section "Debug" is as follows (example):
 215
 216  | -
 217  |   type: "debug"
 218  |   general:
 219  |     input-format: "xml"  # zip or xml
 220  |     extract: "output.xml"  # Only for zip
 221  |   builds:
 222  |     # The files must be in the directory DIR[WORKING,DATA]
 223  |     csit-vpp-perf-1704-all:
 224  |     -
 225  |       build: 17
 226  |       file: "{DIR[WORKING,DATA]}/csit-vpp-perf-1707-all__17__output.xml"
 227
 228
 229 Section: Input
 230 ''''''''''''''
 231
 232 This section is mandatory if the debug mode is not used, and defines the data
 233 which will be used to generate elements.
 234
 235 This section has these parts:
 236
 237  - type: "input" - says that this section is the "input"
 238  - general - parameters common to all builds:
 239
 240    - file-name: file to be downloaded
 241    - download-path: path to be added to url pointing to the file, e.g.:
 242      "{job}/{build}/robot/report/*zip*/{filename}"; {job}, {build} and
 243      {filename} are replaced by proper values defined in this section
 244    - extract: file to be extracted from downloaded zip file, e.g.: "output.xml";
 245      if xml file is downloaded, this parameter is ignored.
 246
 247  - builds - list of jobs (keys) and builds which output data will be downloaded
 248
 249 The structure of the section "Input" is as follows (example from 17.07 report):
 250
 251  | -
 252  |   type: "input"  # Ignored in the debug mode
 253  |   general:
 254  |     file-name: "robot-plugin.zip"
 255  |     download-path: "{job}/{build}/robot/report/*zip*/{filename}"
 256  |     extract: "output.xml"
 257  |   builds:
 258  |     csit-vpp-perf-1707-all:
 259  |     - 9
 260  |     - 10
 261  |     - 13
 262  |     - 14
 263  |     - 15
 264  |     - 16
 265  |     - 17
 266  |     - 18
 267  |     - 19
 268  |     - 21
 269  |     - 22
 270  |     csit-dpdk-perf-1704-all:
 271  |     - 1
 272  |     - 2
 273  |     - 3
 274  |     - 4
 275  |     - 5
 276  |     - 6
 277  |     - 7
 278  |     - 8
 279  |     - 9
 280  |     - 10
 281  |     csit-vpp-functional-1707-ubuntu1604-virl:
 282  |     - lastSuccessfulBuild
 283  |     hc2vpp-csit-perf-master-ubuntu1604:
 284  |     - 8
 285  |     - 9
 286  |     hc2vpp-csit-integration-1707-ubuntu1604:
 287  |     - lastSuccessfulBuild
 288  |     csit-nsh_sfc-verify-func-1707-ubuntu1604-virl:
 289  |     - 2
 290  |     csit-vpp-perf-1704-all:
 291  |     - 6
 292  |     - 7
 293  |     - 8
 294  |     - 9
 295  |     - 10
 296  |     - 12
 297  |     - 14
 298  |     - 15
 299  |     - 16
 300  |     - 17
 301  |     csit-dpdk-perf-1704-all:
 302  |     - 1
 303  |     - 2
 304  |     - 3
 305  |     - 4
 306  |     - 6
 307  |     - 7
 308  |     - 8
 309  |     - 9
 310  |     - 10
 311  |     - 11
 312
 313
 314 Section: Output
 315 '''''''''''''''
 316
 317 This section specifies which format(s) will be generated (html, pdf) and which
 318 versions for each format will be generated.
 319
 320 This section has these parts:
 321
 322  - type: "output" - says that this section is the "output"
 323  - format: html or pdf
 324  - version: defined for each format separately
 325
 326 The structure of the section "Output" is as follows (example):
 327
 328  | -
 329  |   type: "output"
 330  |   format:
 331  |     html:
 332  |     - full
 333  |     pdf:
 334  |     - full
 335  |     - minimal
 336
 337 TODO: define the names of versions
 338
 339
 340 Content of "minimal" version
 341 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 342
 343 TODO: define the name and content of this version
 344
 345
 346 Section: Table
 347 ''''''''''''''
 348
 349 This section defines a table to be generated. There can be 0 or more "table"
 350 sections.
 351
 352 This section has these parts:
 353
 354  - type: "table" - says that this section defines a table
 355  - algorithm: Algorithm which is used to generate the table. The other
 356    parameters in this section must provide all information needed by the used
 357    algorithm.
 358  - template: (optional) a .csv file used as a template while generating the
 359    table
 360  - output-file-format: (optional) format of the output file.
 361  - output-file: file which the table will be written to
 362  - columns: specification of table columns
 363  - data: Specify the jobs and builds which data is used to generate the table
 364  - filter: filter based on tags applied on the input data
 365  - parameters: Only these parameters will be put to the output data structure
 366
 367 The structure of the section "Table" is as follows (example):
 368
 369  | -
 370  |   type: "table"
 371  |   title: "Performance improvments"
 372  |   algoritm: "performance-improvements"
 373  |   template: "templates/tmpl_performance_improvements.csv"
 374  |   output-file-format: "csv"
 375  |   output-file: "{DIR[WORKING]}/path/to/my_table.csv"
 376  |   columns:
 377  |   -
 378  |     title: "VPP Functionality"
 379  |     data: "template 2"
 380  |   -
 381  |     title: "Test Name"
 382  |     data: "template 3"
 383  |   -
 384  |     title: "VPP-17.04 mean [Mpps]"
 385  |     data: "vpp 1704 performance mean"
 386  |   -
 387  |     title: "VPP-17.07 mean [Mpps]"
 388  |     data: "vpp 1707 performance mean"
 389  |   -
 390  |     title: "VPP-17.07 stdev [Mpps]"
 391  |     data: "vpp 1707 performance stdev"
 392  |   -
 393  |     title: "17.04 to 17.07 change"
 394  |     data: "change-relative 4 5"
 395  |   rows: "generated"
 396  |   data:
 397  |     csit-vpp-perf-1707-all:
 398  |     - 13
 399  |     - 16
 400  |     - 17
 401  |   # Keep this formatting, the filter is enclosed with " (quotation mark) and
 402  |   # each tag is enclosed with ' (apostrophe).
 403  |   filter: "'64B' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"
 404  |   parameters:
 405  |   - "throughput"
 406  |   - "latency"
 407
 408
 409 Section: Plot
 410 '''''''''''''
 411
 412 This section defines a plot to be generated. There can be 0 or more "plot"
 413 sections.
 414
 415 This section has these parts:
 416
 417  - type: "plot" - says that this section defines a plot
 418  - output-file-format: (optional) format of the output file.
 419  - output-file: file which the plot will be written to
 420  - plot-type: Type of the plot. The other parameters in this section must
 421    provide all information needed by plot.ly to generate the plot. For example:
 422
 423    - x-axis: x-axis title
 424    - y-axis: y-axis title
 425
 426  - data: Specify the jobs and builds which data is used to generate the plot
 427  - filter: filter applied on the input data
 428
 429 The structure of the section "Plot" is as follows (example):
 430
 431  | -
 432  |   type: "plot"
 433  |   plot-type: "performance-box"   # box, line
 434  |   output-file-type: "html"
 435  |   output-file: "{DIR[WORKING]}/path/to/my_plot.html"
 436  |   plot-title: "plot title"
 437  |   x-axis: "x-axis title"
 438  |   y-axis: "y-axis title"
 439  |   data:
 440  |     csit-vpp-perf-1707-all:
 441  |     - 9
 442  |     - 10
 443  |     - 13
 444  |     - 14
 445  |     - 15
 446  |     - 16
 447  |     - 17
 448  |     - 18
 449  |     - 19
 450  |     - 21
 451  |   filter:
 452  |     - "'64B' and 'BASE' and 'NDRDISC' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"
 453
 454
 455 Static content
 456 ``````````````
 457
 458  - Manually created / edited files
 459  - .rst files, static .csv files, static pictures (.svg), ...
 460  - Stored in CSIT gerrit
 461
 462 No more details about the static content in this document.
 463
 464
 465 Data to process
 466 ```````````````
 467
 468 The PAL processes tests results and other information produced by Jenkins jobs.
 469 The data are now stored as robot results in Jenkins (TODO: store the data in
 470 nexus) either as .zip and / or .xml files.
 471
 472
 473 Data processing
 474 ---------------
 475
 476 As the first step, the data are downloaded and stored locally (typically on a
 477 Jenkins slave). If .zip files are used, the given .xml files are extracted for
 478 further processing.
 479
 480 Parsing of the .xml files is performed by a class derived from
 481 "robot.api.ResultVisitor", only necessary methods are overridden. All and only
 482 necessary data is extracted from .xml file and stored in a structured form.
 483
 484 The parsed data are stored as the multi-indexed pandas.Series data type. Its
 485 structure is as follows:
 486
 487  | <job name>
 488  |   <build>
 489  |     <metadata>
 490  |     <suites>
 491  |     <tests>
 492
 493 "job name", "build", "metadata", "suites", "tests" are indexes to access the
 494 data. For example:
 495
 496  | data =
 497  |
 498  | job 1 name:
 499  |   build 1:
 500  |     metadata: metadata
 501  |     suites: suites
 502  |     tests: tests
 503  |   ...
 504  |   build N:
 505  |     metadata: metadata
 506  |     suites: suites
 507  |     build 1: tests
 508  | ...
 509  | job M name:
 510  |   build 1:
 511  |     metadata: metadata
 512  |     suites: suites
 513  |     tests: tests
 514  |   ...
 515  |   build N:
 516  |     metadata: metadata
 517  |     suites: suites
 518  |     tests: tests
 519
 520 Using indexes data["job 1 name"]["build 1"]["tests"] (e.g.:
 521 data["csit-vpp-perf-1704-all"]["17"]["tests"]) we get a list of all tests with
 522 all tests data.
 523
 524 Data will not be accessible directly using indexes, but using getters and
 525 filters.
 526
 527 **Structure of metadata:**
 528
 529  | "metadata": {
 530  |     "version": "VPP version",
 531  |     "job": "Jenkins job name"
 532  |     "build": "Information about the build"
 533  | },
 534
 535 **Structure of suites:**
 536
 537  | "suites": {
 538  |     "Suite name 1": {
 539  |         "doc": "Suite 1 documentation"
 540  |     }
 541  |     "Suite name N": {
 542  |         "doc": "Suite N documentation"
 543  |     }
 544
 545 **Structure of tests:**
 546
 547  | "tests": {
 548  |     "ID": {
 549  |         "name": "Test name",
 550  |         "parent": "Name of the parent of the test",
 551  |         "tags": ["tag 1", "tag 2", "tag n"],
 552  |         "type": "PDR" | "NDR",
 553  |         "throughput": {
 554  |             "value": int,
 555  |             "unit": "pps" | "bps" | "percentage"
 556  |         },
 557  |         "latency": {
 558  |             "direction1": {
 559  |                 "100": {
 560  |                     "min": int,
 561  |                     "avg": int,
 562  |                     "max": int
 563  |                 },
 564  |                 "50": {  # Only for NDR
 565  |                     "min": int,
 566  |                     "avg": int,
 567  |                     "max": int
 568  |                 },
 569  |                 "10": {  # Only for NDR
 570  |                     "min": int,
 571  |                     "avg": int,
 572  |                     "max": int
 573  |                 }
 574  |             },
 575  |             "direction2": {
 576  |                 "100": {
 577  |                     "min": int,
 578  |                     "avg": int,
 579  |                     "max": int
 580  |                 },
 581  |                 "50": {  # Only for NDR
 582  |                     "min": int,
 583  |                     "avg": int,
 584  |                     "max": int
 585  |                 },
 586  |                 "10": {  # Only for NDR
 587  |                     "min": int,
 588  |                     "avg": int,
 589  |                     "max": int
 590  |                 }
 591  |             }
 592  |         },
 593  |         "lossTolerance": "lossTolerance"  # Only for PDR
 594  |         "vat-history": {
 595  |             "DUT1": " DUT1 VAT History",
 596  |             "DUT2": " DUT2 VAT History"
 597  |         },
 598  |         "show-run": "Show Run"
 599  |     },
 600  |     "ID" {
 601  |         # next test
 602  |     }
 603
 604 Note: ID is the lowercase full path to the test.
 605
 606
 607 Data filtering
 608 ``````````````
 609
 610 The first step when generating an element is getting the data needed to
 611 construct the element. The data are filtered from the processed input data.
 612
 613 The data filtering is based on:
 614
 615  - job name(s)
 616  - build number(s)
 617  - tag(s)
 618  - required data - only this data is included in the output.
 619
 620 WARNING: The filtering is based on tags, so be careful with tagging.
 621
 622 For example, the element which specification includes:
 623
 624  |   data:
 625  |     csit-vpp-perf-1707-all:
 626  |     - 9
 627  |     - 10
 628  |     - 13
 629  |     - 14
 630  |     - 15
 631  |     - 16
 632  |     - 17
 633  |     - 18
 634  |     - 19
 635  |     - 21
 636  |   filter:
 637  |     - "'64B' and 'BASE' and 'NDRDISC' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"
 638
 639 will be constructed using data from the job "csit-vpp-perf-1707-all", all listed
 640 builds and the tests which list of tags fulfils the condition specified in the
 641 filter.
 642
 643 The output data structure for filtered test data is:
 644
 645  | - job 1
 646  |   - build 1
 647  |     - test 1 ID:
 648  |       - parameter 1
 649  |       - parameter 2
 650  |       ...
 651  |       - parameter n
 652  |     ...
 653  |     - test n ID:
 654  |     ...
 655  |   ...
 656  |   - build n
 657  | ...
 658  | - job n
 659
 660
 661 Data analytics
 662 ``````````````
 663
 664 Data analytics part implements:
 665
 666  - methods to compute statistical data from the filtered input data
 667  - trending
 668  - etc.
 669
 670
 671 Data presentation
 672 -----------------
 673
 674 Generates the plots an tables according to the report models specified in
 675 specification file. The elements are generated using algorithms and data
 676 specified in their models.
 677
 678 Tables
 679 ``````
 680
 681  - tables are generated by algorithms implemented in PAL, the model includes the
 682    algorithm and all necessary information.
 683  - output format: csv
 684  - generated tables are stored in specified directories and linked to .rst files
 685
 686
 687 Plots
 688 `````
 689
 690  - `plot.ly <https://plot.ly/>`_ is currently used to generate plots, the model
 691    includes the type of plot and all necessary information.
 692  - output format: html
 693  - generated plots are stored in specified directories and linked to .rst files
 694
 695
 696 Report generation
 697 -----------------
 698
 699 Report is generated using Sphinx and Read the docs template. PAL generates html
 700 and pdf format. It is possible to define the content of report by specifying
 701 the version (TODO: define the names and content of versions)
 702
 703 The process
 704 ```````````
 705
 706 1. Read the specification
 707 2. Read the input data
 708 3. Process the input data
 709 4. For element (plot, table) defined in specification:
 710
 711    a. Get the data needed to construct the element using a filter
 712    b. Generate the element
 713    c. Store the element
 714
 715 5. Generate the report
 716 6. Store the report (Nexus)
 717
 718 The process is model driven. The elements’ models (tables, plots and report
 719 itself) are defined in the specification file. Script reads the elements’ models
 720 from specification file and generates the elements.
 721
 722 It is easy to add elements to be generated, if a new kind of element is
 723 required, only a new algorithm is implemented and integrated.