X-Git-Url: https://gerrit.fd.io/r/gitweb?a=blobdiff_plain;f=docs%2Freport%2Fintroduction%2Fmethodology_telemetry.rst;fp=docs%2Freport%2Fintroduction%2Fmethodology_telemetry.rst;h=c10f99affe1fa0f7bedcd1f4c2cee56e781d8449;hb=14fe52e81c59a5a3558f4f587e56b75e388191dc;hp=dcd2d06541fc1b6fb7d4a3ca577dcd938f140c2a;hpb=c6bfe865e4a62dda2c5e635df53083e909a6558b;p=csit.git diff --git a/docs/report/introduction/methodology_telemetry.rst b/docs/report/introduction/methodology_telemetry.rst index dcd2d06541..c10f99affe 100644 --- a/docs/report/introduction/methodology_telemetry.rst +++ b/docs/report/introduction/methodology_telemetry.rst @@ -37,7 +37,8 @@ Telemetry module in CSIT currently support only Gauge, Counter and Info. Example metric file ~~~~~~~~~~~~~~~~~~~ -``` +:: + # HELP calls_total Number of calls total # TYPE calls_total counter calls_total{name="api-rx-from-ring",state="active",thread_id="0",thread_lcore="1",thread_name="vpp_main"} 0.0 @@ -60,7 +61,7 @@ Example metric file calls_total{name="ip4-lookup",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0 calls_total{name="ip4-rewrite",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0 calls_total{name="unix-epoll-input",state="polling",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 1.0 -``` + Anatomy of existing CSIT telemetry implementation ------------------------------------------------- @@ -81,7 +82,8 @@ them. MRR measurement ~~~~~~~~~~~~~~~ -``` +:: + traffic_start(r=mrr) traffic_stop |< measure >| | | | (r=mrr) | | pre_run_stat post_run_stat | pre_stat | | post_stat @@ -89,23 +91,23 @@ MRR measurement --o--------o---------------o---------o-------o--------+-------------------+------o------------> t -Legend: - - pre_run_stat - - vpp-clear-runtime - - post_run_stat - - vpp-show-runtime - - bash-perf-stat // if extended_debug == True - - pre_stat - - vpp-clear-stats - - vpp-enable-packettrace // if extended_debug == True - - vpp-enable-elog - - post_stat - - vpp-show-stats - - vpp-show-packettrace // if extended_debug == True - - vpp-show-elog -``` - -``` + Legend: + - pre_run_stat + - vpp-clear-runtime + - post_run_stat + - vpp-show-runtime + - bash-perf-stat // if extended_debug == True + - pre_stat + - vpp-clear-stats + - vpp-enable-packettrace // if extended_debug == True + - vpp-enable-elog + - post_stat + - vpp-show-stats + - vpp-show-packettrace // if extended_debug == True + - vpp-show-elog + +:: + |< measure >| | (r=mrr) | | | @@ -114,12 +116,13 @@ Legend: | | | | --o------------------------o------------------------o------------------------o---> t -``` + MLR measurement ~~~~~~~~~~~~~~~ -``` +:: + |< measure >| traffic_start(r=pdr) traffic_stop traffic_start(r=ndr) traffic_stop |< [ latency ] >| | (r=mlr) | | | | | | .9/.5/.1/.0 | | | | pre_run_stat post_run_stat | | pre_run_stat post_run_stat | | | @@ -127,21 +130,20 @@ MLR measurement --+-------------------+----o--------o---------------o---------o--------------o--------o---------------o---------o------------[---------------------]---> t -Legend: - - pre_run_stat - - vpp-clear-runtime - - post_run_stat - - vpp-show-runtime - - bash-perf-stat // if extended_debug == True - - pre_stat - - vpp-clear-stats - - vpp-enable-packettrace // if extended_debug == True - - vpp-enable-elog - - post_stat - - vpp-show-stats - - vpp-show-packettrace // if extended_debug == True - - vpp-show-elog -``` + Legend: + - pre_run_stat + - vpp-clear-runtime + - post_run_stat + - vpp-show-runtime + - bash-perf-stat // if extended_debug == True + - pre_stat + - vpp-clear-stats + - vpp-enable-packettrace // if extended_debug == True + - vpp-enable-elog + - post_stat + - vpp-show-stats + - vpp-show-packettrace // if extended_debug == True + - vpp-show-elog Improving existing solution @@ -171,7 +173,8 @@ integration with post processing module. MRR measurement ~~~~~~~~~~~~~~~ -``` +:: + traffic_start(r=mrr) traffic_stop |< measure >| | | | (r=mrr) | | |< stat_runtime >| | stat_pre_trial | | stat_post_trial @@ -179,18 +182,19 @@ MRR measurement ----o---+--------------------------+---o-------------o------------+-------------------+-----o-------------> t -Legend: - - stat_runtime - - vpp-runtime - - stat_pre_trial - - vpp-clear-stats - - vpp-enable-packettrace // if extended_debug == True - - stat_post_trial - - vpp-show-stats - - vpp-show-packettrace // if extended_debug == True -``` - -``` + Legend: + - stat_runtime + - vpp-runtime + - stat_pre_trial + - vpp-clear-stats + - vpp-enable-packettrace // if extended_debug == True + - stat_post_trial + - vpp-show-stats + - vpp-show-packettrace // if extended_debug == True + + +:: + |< measure >| | (r=mrr) | | | @@ -199,9 +203,9 @@ Legend: | | | | --o------------------------o------------------------o------------------------o---> t -``` -``` +:: + |< stat_runtime >| | | |< program0 >|< program1 >|< programN >| @@ -209,13 +213,13 @@ Legend: | | | | --o------------------------o------------------------o------------------------o---> t -``` MLR measurement ~~~~~~~~~~~~~~~ -``` +:: + |< measure >| traffic_start(r=pdr) traffic_stop traffic_start(r=ndr) traffic_stop |< [ latency ] >| | (r=mlr) | | | | | | .9/.5/.1/.0 | | | | |< stat_runtime >| | | |< stat_runtime >| | | | @@ -223,281 +227,12 @@ MLR measurement --+-------------------+-----o---+--------------------------+---o--------------o---+--------------------------+---o-----------[---------------------]---> t -Legend: - - stat_runtime - - vpp-runtime - - stat_pre_trial - - vpp-clear-stats - - vpp-enable-packettrace // if extended_debug == True - - stat_post_trial - - vpp-show-stats - - vpp-show-packettrace // if extended_debug == True -``` - - -Tooling -------- - -Prereqisities: -- bpfcc-tools -- python-bpfcc -- libbpfcc -- libbpfcc-dev -- libclang1-9 libllvm9 - -```bash - $ sudo apt install bpfcc-tools python3-bpfcc libbpfcc libbpfcc-dev libclang1-9 libllvm9 -``` - - -Configuration -------------- - -```yaml - logging: - version: 1 - formatters: - console: - format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s' - prom: - format: '%(message)s' - handlers: - console: - class: logging.StreamHandler - level: INFO - formatter: console - stream: ext://sys.stdout - prom: - class: logging.handlers.RotatingFileHandler - level: INFO - formatter: prom - filename: /tmp/metric.prom - mode: w - loggers: - prom: - handlers: [prom] - level: INFO - propagate: False - root: - level: INFO - handlers: [console] - scheduler: - duration: 1 - programs: - - name: bundle_bpf - metrics: - counter: - - name: cpu_cycle - documentation: Cycles processed by CPUs - namespace: bpf - labelnames: - - name - - cpu - - pid - - name: cpu_instruction - documentation: Instructions retired by CPUs - namespace: bpf - labelnames: - - name - - cpu - - pid - - name: llc_reference - documentation: Last level cache operations by type - namespace: bpf - labelnames: - - name - - cpu - - pid - - name: llc_miss - documentation: Last level cache operations by type - namespace: bpf - labelnames: - - name - - cpu - - pid - events: - - type: 0x0 # HARDWARE - name: 0x0 # PERF_COUNT_HW_CPU_CYCLES - target: on_cpu_cycle - table: cpu_cycle - - type: 0x0 # HARDWARE - name: 0x1 # PERF_COUNT_HW_INSTRUCTIONS - target: on_cpu_instruction - table: cpu_instruction - - type: 0x0 # HARDWARE - name: 0x2 # PERF_COUNT_HW_CACHE_REFERENCES - target: on_cache_reference - table: llc_reference - - type: 0x0 # HARDWARE - name: 0x3 # PERF_COUNT_HW_CACHE_MISSES - target: on_cache_miss - table: llc_miss - code: | - #include - #include - - const int max_cpus = 256; - - struct key_t { - int cpu; - int pid; - char name[TASK_COMM_LEN]; - }; - - BPF_HASH(llc_miss, struct key_t); - BPF_HASH(llc_reference, struct key_t); - BPF_HASH(cpu_instruction, struct key_t); - BPF_HASH(cpu_cycle, struct key_t); - - static inline __attribute__((always_inline)) void get_key(struct key_t* key) { - key->cpu = bpf_get_smp_processor_id(); - key->pid = bpf_get_current_pid_tgid(); - bpf_get_current_comm(&(key->name), sizeof(key->name)); - } - - int on_cpu_cycle(struct bpf_perf_event_data *ctx) { - struct key_t key = {}; - get_key(&key); - - cpu_cycle.increment(key, ctx->sample_period); - return 0; - } - int on_cpu_instruction(struct bpf_perf_event_data *ctx) { - struct key_t key = {}; - get_key(&key); - - cpu_instruction.increment(key, ctx->sample_period); - return 0; - } - int on_cache_reference(struct bpf_perf_event_data *ctx) { - struct key_t key = {}; - get_key(&key); - - llc_reference.increment(key, ctx->sample_period); - return 0; - } - int on_cache_miss(struct bpf_perf_event_data *ctx) { - struct key_t key = {}; - get_key(&key); - - llc_miss.increment(key, ctx->sample_period); - return 0; - } -``` - -CSIT captured metrics ---------------------- - -SUT -~~~ - -Compute resource -________________ - -- BPF /process - - BPF_HASH(llc_miss, struct key_t); - - BPF_HASH(llc_reference, struct key_t); - - BPF_HASH(cpu_instruction, struct key_t); - - BPF_HASH(cpu_cycle, struct key_t); - -Memory resource -_______________ - -- BPF /process - - tbd - -Network resource -________________ - -- BPF /process - - tbd - -DUT VPP metrics -~~~~~~~~~~~~~~~ - -Compute resource -________________ - -- runtime /node `show runtime` - - calls - - vectors - - suspends - - clocks - - vectors_calls -- perfmon /bundle - - inst-and-clock node intel-core instructions/packet, cycles/packet and IPC - - cache-hierarchy node intel-core cache hits and misses - - context-switches thread linux per-thread context switches - - branch-mispred node intel-core Branches, branches taken and mis-predictions - - page-faults thread linux per-thread page faults - - load-blocks node intel-core load operations blocked due to various uarch reasons - - power-licensing node intel-core Thread power licensing - - memory-bandwidth system intel-uncore memory reads and writes per memory controller channel - -Memory resource - tbd -_____________________ - -- memory /segment `show memory verbose api-segment stats-segment main-heap` - - total - - used - - free - - trimmable - - free-chunks - - free-fastbin-blks - - max-total-allocated -- physmem `show physmem` - - pages - - subpage-size - -Network resource -________________ - -- counters /node `show node counters` - - count - - severity -- hardware /interface `show interface` - - rx_stats - - tx_stats -- packets /interface `show hardware` - - rx_packets - - rx_bytes - - rx_errors - - tx_packets - - tx_bytes - - tx_errors - - drops - - punt - - ip4 - - ip6 - - rx_no_buf - - rx_miss - - -DUT DPDK metrics - tbd -~~~~~~~~~~~~~~~~~~~~~~ - -Compute resource -________________ - -- BPF /process - - BPF_HASH(llc_miss, struct key_t); - - BPF_HASH(llc_reference, struct key_t); - - BPF_HASH(cpu_instruction, struct key_t); - - BPF_HASH(cpu_cycle, struct key_t); - -Memory resource -_______________ - -- BPF /process - - tbd - -Network resource -________________ - -- packets /interface - - inPackets - - outPackets - - inBytes - - outBytes - - outErrorPackets - - dropPackets + Legend: + - stat_runtime + - vpp-runtime + - stat_pre_trial + - vpp-clear-stats + - vpp-enable-packettrace // if extended_debug == True + - stat_post_trial + - vpp-show-stats + - vpp-show-packettrace // if extended_debug == True