docs/report/introduction/methodology_trex_traffic_generator.rst

   1 TRex Traffic Generator
   2 ----------------------
   3
   4 Usage
   5 ~~~~~
   6
   7 `TRex traffic generator <https://trex-tgn.cisco.com>`_ is used for majority of
   8 CSIT performance tests. TRex is used in multiple types of performance tests,
   9 see :ref:`data_plane_throughput` for more detail.
  10
  11 TRex is installed and run on the TG compute node.
  12 Versioning, installation and startup is documented in
  13 :ref:`test_environment_tg`.
  14
  15 Traffic modes
  16 ~~~~~~~~~~~~~
  17
  18 TRex is primarily used in two (mutually incompatible) modes.
  19
  20 Stateless mode
  21 ______________
  22
  23 Sometimes abbreviated as STL.
  24 A mode with high performance, which is unable to react to incoming traffic.
  25 We use this mode whenever it is possible.
  26 Typical test where this mode is not applicable is NAT44ED,
  27 as DUT does not assign deterministic outside address+port combinations,
  28 so we are unable to create traffic that does not lose packets
  29 in out2in direction.
  30
  31 Measurement results are based on simple L2 counters
  32 (opackets, ipackets) for each traffic direction.
  33
  34 Stateful mode
  35 _____________
  36
  37 A mode capable of reacting to incoming traffic.
  38 Contrary to the stateless mode, only UDP and TCP is supported
  39 (carried over IPv4 or IPv6 packets).
  40 Performance is limited, as TRex needs to do more CPU processing.
  41 TRex suports two subtypes of stateful traffic,
  42 CSIT uses ASTF (Advanced STateFul mode).
  43
  44 This mode is suitable for NAT44ED tests, as clients send packets from inside,
  45 and servers react to it, so they see the outside address and port to respond to.
  46 Also, they do not send traffic before NAT44ED has opened the sessions.
  47
  48 When possible, L2 counters (opackets, ipackets) are used.
  49 Some tests need L7 counters, which track protocol state (e.g. TCP),
  50 but the values are less than reliable on high loads.
  51
  52 Traffic Continuity
  53 ~~~~~~~~~~~~~~~~~~
  54
  55 Generated traffic is either continuous, or limited.
  56 Both modes support both continuities in principle.
  57
  58 Continuous traffic
  59 __________________
  60
  61 Traffic is started without any size goal.
  62 Traffic is ended based on time duration as hinted by search algorithm.
  63 This is useful when DUT behavior does not depend on the traffic duration.
  64 The default for stateless mode.
  65
  66 Limited traffic
  67 _______________
  68
  69 Traffic has defined size goal, duration is computed based on the goal.
  70 Traffic is ended when the size goal is reached,
  71 or when the computed duration is reached.
  72 This is useful when DUT behavior depends on traffic size,
  73 e.g. target number of session, each to be hit once.
  74 This is used mainly for stateful mode.
  75
  76 Traffic synchronicity
  77 ~~~~~~~~~~~~~~~~~~~~~
  78
  79 Traffic can be generated synchronously (test waits for duration)
  80 or asynchronously (test operates during traffic and stops traffic explicitly).
  81
  82 Synchronous traffic
  83 ___________________
  84
  85 Trial measurement is driven by given (or precomputed) duration,
  86 no activity from test driver during the traffic.
  87 Used for most trials.
  88
  89 Asynchronous traffic
  90 ____________________
  91
  92 Traffic is started, but then the test driver is free to perform
  93 other actions, before stopping the traffic explicitly.
  94 This is used mainly by reconf tests, but also by some trials
  95 used for runtime telemetry.
  96
  97 Trafic profiles
  98 ~~~~~~~~~~~~~~~
  99
 100 TRex supports several ways to define the traffic.
 101 CSIT uses small Python modules based on Scapy as definitions.
 102 Details of traffic profiles depend on modes (STL or ASTF),
 103 but some are common for both modes.
 104
 105 Search algorithms are intentionally unaware of the traffic mode used,
 106 so CSIT defines some terms to use instead of mode-specific TRex terms.
 107
 108 Transactions
 109 ____________
 110
 111 TRex traffic profile defines a small number of behaviors,
 112 in CSIT called transaction templates. Traffic profiles also instruct
 113 TRex how to create a large number of transactions based on the templates.
 114
 115 Continuous traffic loops over the generated transactions.
 116 Limited traffic usually executes each transaction once.
 117
 118 Currently, ASTF profiles define one transaction template each.
 119 Number of packets expected per one transaction varies based on profile details,
 120 as does the criterion for when a transaction is considered successful.
 121
 122 Stateless transactions are just one packet (sent from one TG port,
 123 successful if received on the other TG port).
 124 Thus unidirectional stateless profiles define one transaction template,
 125 bidirectional stateless profiles define two transaction templates.
 126
 127 TPS multiplier
 128 ______________
 129
 130 TRex aims to open transaction specified by the profile at a steady rate.
 131 While TRex allows the transaction template to define its intended "cps" value,
 132 CSIT does not specify it, so the default value of 1 is applied,
 133 meaning TRex will open one transaction per second (and transaction template)
 134 by default. But CSIT invocation uses "multiplier" (mult) argument
 135 when starting the traffic, that multiplies the cps value,
 136 meaning it acts as TPS (transactions per second) input.
 137
 138 With a slight abuse of nomenclature, bidirectional stateless tests
 139 set "packets per transaction" value to 2, just to keep the TPS semantics
 140 as a unidirectional input value.
 141
 142 Duration stretching
 143 ___________________
 144
 145 TRex can be IO-bound, CPU-bound, or have any other reason
 146 why it is not able to generate the traffic at the requested TPS.
 147 Some conditions are detected, leading to TRex failure,
 148 for example when the bandwidth does not fit into the line capacity.
 149 But many reasons are not detected.
 150
 151 Unfortunately, TRex frequently reacts by not honoring the duration
 152 in synchronous mode, taking longer to send the traffic,
 153 leading to lower then requested load offered to DUT.
 154 This usualy breaks assumptions used in search algorithms,
 155 so it has to be avoided.
 156
 157 For stateless traffic, the behavior is quite deterministic,
 158 so the workaround is to apply a fictional TPS limit (max_rate)
 159 to search algorithms, usually depending only on the NIC used.
 160
 161 For stateful traffic the behavior is not deterministic enough,
 162 for example the limit for TCP traffic depends on DUT packet loss.
 163 In CSIT we decided to use logic similar to asynchronous traffic.
 164 The traffic driver sleeps for a time, then stops the traffic explicitly.
 165 The library that parses counters into measurement results
 166 than usually treats unsent packets as lost.
 167
 168 We have added a IP4base tests for every NAT44ED test,
 169 so that users can compare results.
 170 Of the results are very similar, it is probable TRex was the bottleneck.
 171
 172 Startup delay
 173 _____________
 174
 175 By investigating TRex behavior, it was found that TRex does not start
 176 the traffic in ASTF mode immediately. There is a delay of zero traffic,
 177 after which the traffic rate ramps up to the defined TPS value.
 178
 179 It is possible to poll for counters during the traffic
 180 (fist nonzero means traffic has started),
 181 but that was found to influence the NDR results.
 182
 183 Thus "sleep and stop" stategy is used, which needs a correction
 184 to the computed duration so traffic is stopped after the intended
 185 duration of real traffic. Luckily, it turns out this correction
 186 is not dependend on traffic profile nor CPU used by TRex,
 187 so a fixed constant (0.1115 seconds) works well.
 188
 189 The result computations need a precise enough duration of the real traffic,
 190 luckily server side of TRex has precise enough counter for that.
 191
 192 It is unknown whether stateless traffic profiles also exhibit a startup delay.
 193 Unfortunately, stateless mode does not have similarly precise duration counter,
 194 so some results (mostly MRR) are affected by less precise duration measurement
 195 in Python part of CSIT code.
 196
 197 Measuring Latency
 198 ~~~~~~~~~~~~~~~~~
 199
 200 If measurement of latency is requested, two more packet streams are
 201 created (one for each direction) with TRex flow_stats parameter set to
 202 STLFlowLatencyStats. In that case, returned statistics will also include
 203 min/avg/max latency values and encoded HDRHistogram data.