docs/gettingstarted/developers/infrastructure.md

   1 VPPINFRA (Infrastructure)
   2 =========================
   3
   4 The files associated with the VPP Infrastructure layer are located in
   5 the ./src/vppinfra folder.
   6
   7 VPPinfra is a collection of basic c-library services, quite
   8 sufficient to build standalone programs to run directly on bare metal.
   9 It also provides high-performance dynamic arrays, hashes, bitmaps,
  10 high-precision real-time clock support, fine-grained event-logging, and
  11 data structure serialization.
  12
  13 One fair comment / fair warning about vppinfra: you can\'t always tell a
  14 macro from an inline function from an ordinary function simply by name.
  15 Macros are used to avoid function calls in the typical case, and to
  16 cause (intentional) side-effects.
  17
  18 Vppinfra has been around for almost 20 years and tends not to change
  19 frequently. The VPP Infrastructure layer contains the following
  20 functions:
  21
  22 Vectors
  23 -------
  24
  25 Vppinfra vectors are ubiquitous dynamically resized arrays with by user
  26 defined \"headers\". Many vpppinfra data structures (e.g. hash, heap,
  27 pool) are vectors with various different headers.
  28
  29 The memory layout looks like this:
  30
  31 ```
  32                    User header (optional, uword aligned)
  33                    Alignment padding (if needed)
  34                    Vector length in elements
  35  User's pointer -> Vector element 0
  36                    Vector element 1
  37                    ...
  38                    Vector element N-1
  39 ```
  40
  41 As shown above, the vector APIs deal with pointers to the 0th element of
  42 a vector. Null pointers are valid vectors of length zero.
  43
  44 To avoid thrashing the memory allocator, one often resets the length of
  45 a vector to zero while retaining the memory allocation. Set the vector
  46 length field to zero via the vec\_reset\_length(v) macro. \[Use the
  47 macro! It's smart about NULL pointers.\]
  48
  49 Typically, the user header is not present. User headers allow for other
  50 data structures to be built atop vppinfra vectors. Users may specify the
  51 alignment for first data element of a vector via the \[vec\]()\*\_aligned
  52 macros.
  53
  54 Vector elements can be any C type e.g. (int, double, struct bar). This
  55 is also true for data types built atop vectors (e.g. heap, pool, etc.).
  56 Many macros have \_a variants supporting alignment of vector elements
  57 and \_h variants supporting non-zero-length vector headers. The \_ha
  58 variants support both.  Additionally cacheline alignment within a
  59 vector element structure can be specified using the
  60 \[CLIB_CACHE_LINE_ALIGN_MARK\]() macro.
  61
  62 Inconsistent usage of header and/or alignment related macro variants
  63 will cause delayed, confusing failures.
  64
  65 Standard programming error: memorize a pointer to the ith element of a
  66 vector, and then expand the vector. Vectors expand by 3/2, so such code
  67 may appear to work for a period of time. Correct code almost always
  68 memorizes vector **indices** which are invariant across reallocations.
  69
  70 In typical application images, one supplies a set of global functions
  71 designed to be called from gdb. Here are a few examples:
  72
  73 -   vl(v) - prints vec\_len(v)
  74 -   pe(p) - prints pool\_elts(p)
  75 -   pifi(p, index) - prints pool\_is\_free\_index(p, index)
  76 -   debug\_hex\_bytes (p, nbytes) - hex memory dump nbytes starting at p
  77
  78 Use the "show gdb" debug CLI command to print the current set.
  79
  80 Bitmaps
  81 -------
  82
  83 Vppinfra bitmaps are dynamic, built using the vppinfra vector APIs.
  84 Quite handy for a variety jobs.
  85
  86 Pools
  87 -----
  88
  89 Vppinfra pools combine vectors and bitmaps to rapidly allocate and free
  90 fixed-size data structures with independent lifetimes. Pools are perfect
  91 for allocating per-session structures.
  92
  93 Hashes
  94 ------
  95
  96 Vppinfra provides several hash flavors. Data plane problems involving
  97 packet classification / session lookup often use
  98 ./src/vppinfra/bihash\_template.\[ch\] bounded-index extensible
  99 hashes. These templates are instantiated multiple times, to efficiently
 100 service different fixed-key sizes.
 101
 102 Bihashes are thread-safe. Read-locking is not required. A simple
 103 spin-lock ensures that only one thread writes an entry at a time.
 104
 105 The original vppinfra hash implementation in
 106 ./src/vppinfra/hash.\[ch\] are simple to use, and are often used in
 107 control-plane code which needs exact-string-matching.
 108
 109 In either case, one almost always looks up a key in a hash table to
 110 obtain an index in a related vector or pool. The APIs are simple enough,
 111 but one must take care when using the unmanaged arbitrary-sized key
 112 variant. Hash\_set\_mem (hash\_table, key\_pointer, value) memorizes
 113 key\_pointer. It is usually a bad mistake to pass the address of a
 114 vector element as the second argument to hash\_set\_mem. It is perfectly
 115 fine to memorize constant string addresses in the text segment.
 116
 117 Timekeeping
 118 -----------
 119
 120 Vppinfra includes high-precision, low-cost timing services. The
 121 datatype clib_time_t and associated functions reside in
 122 ./src/vppinfra/time.\[ch\]. Call clib_time_init (clib_time_t \*cp) to
 123 initialize the clib_time_t object.
 124
 125 Clib_time_init(...) can use a variety of different ways to establish
 126 the hardware clock frequency. At the end of the day, vppinfra
 127 timekeeping takes the attitude that the operating system's clock is
 128 the closest thing to a gold standard it has handy.
 129
 130 When properly configured, NTP maintains kernel clock synchronization
 131 with a highly accurate off-premises reference clock.  Notwithstanding
 132 network propagation delays, a synchronized NTP client will keep the
 133 kernel clock accurate to within 50ms or so.
 134
 135 Why should one care? Simply put, oscillators used to generate CPU
 136 ticks aren't super accurate. They work pretty well, but a 0.1% error
 137 wouldn't be out of the question. That's a minute and a half's worth of
 138 error in 1 day. The error changes constantly, due to temperature
 139 variation, and a host of other physical factors.
 140
 141 It's far too expensive to use system calls for timing, so we're left
 142 with the problem of continously adjusting our view of the CPU tick
 143 register's clocks_per_second parameter.
 144
 145 The clock rate adjustment algorithm measures the number of cpu ticks
 146 and the "gold standard" reference time across an interval of
 147 approximately 16 seconds. We calculate clocks_per_second for the
 148 interval: use rdtsc (on x86_64) and a system call to get the latest
 149 cpu tick count and the kernel's latest nanosecond timestamp. We
 150 subtract the previous interval end values, and use exponential
 151 smoothing to merge the new clock rate sample into the clocks_per_second
 152 parameter.
 153
 154 As of this writing, we maintain the clock rate by way of the following
 155 first-order differential equation:
 156
 157
 158 ```
 159    clocks_per_second(t) = clocks_per_second(t-1) * K + sample_cps(t)*(1-K)
 160    where K = e**(-1.0/3.75);
 161 ```
 162
 163 This yields a per observation "half-life" of 1 minute. Empirically,
 164 the clock rate converges within 5 minutes, and appears to maintain
 165 near-perfect agreement with the kernel clock in the face of ongoing
 166 NTP time adjustments.
 167
 168 See ./src/vppinfra/time.c:clib_time_verify_frequency(...) to look at
 169 the rate adjustment algorithm. The code rejects frequency samples
 170 corresponding to the sort of adjustment which might occur if someone
 171 changes the gold standard kernel clock by several seconds.
 172
 173
 174 Format
 175 ------
 176
 177 Vppinfra format is roughly equivalent to printf.
 178
 179 Format has a few properties worth mentioning. Format's first argument is
 180 a (u8 \*) vector to which it appends the result of the current format
 181 operation. Chaining calls is very easy:
 182
 183 ```c
 184     u8 * result;
 185
 186     result = format (0, "junk = %d, ", junk);
 187     result = format (result, "more junk = %d\n", more_junk);
 188 ```
 189
 190 As previously noted, NULL pointers are perfectly proper 0-length
 191 vectors. Format returns a (u8 \*) vector, **not** a C-string. If you
 192 wish to print a (u8 \*) vector, use the "%v" format string. If you need
 193 a (u8 \*) vector which is also a proper C-string, either of these
 194 schemes may be used:
 195
 196 ```c
 197     vec_add1 (result, 0)
 198     or
 199     result = format (result, "<whatever>%c", 0);
 200 ```
 201
 202 Remember to vec\_free() the result if appropriate. Be careful not to
 203 pass format an uninitialized (u8 \*).
 204
 205 Format implements a particularly handy user-format scheme via the "%U"
 206 format specification. For example:
 207
 208 ```c
 209     u8 * format_junk (u8 * s, va_list *va)
 210     {
 211       junk = va_arg (va, u32);
 212       s = format (s, "%s", junk);
 213       return s;
 214     }
 215
 216     result = format (0, "junk = %U, format_junk, "This is some junk");
 217 ```
 218
 219 format\_junk() can invoke other user-format functions if desired. The
 220 programmer shoulders responsibility for argument type-checking. It is
 221 typical for user format functions to blow up spectacularly if the
 222 va\_arg(va, type) macros don't match the caller's idea of reality.
 223
 224 Unformat
 225 --------
 226
 227 Vppinfra unformat is vaguely related to scanf, but considerably more
 228 general.
 229
 230 A typical use case involves initializing an unformat\_input\_t from
 231 either a C-string or a (u8 \*) vector, then parsing via unformat() as
 232 follows:
 233
 234 ```c
 235     unformat_input_t input;
 236
 237     unformat_init_string (&input, "<some-C-string>");
 238     /* or */
 239     unformat_init_vector (&input, <u8-vector>);
 240 ```
 241
 242 Then loop parsing individual elements:
 243
 244 ```c
 245     while (unformat_check_input (&input) != UNFORMAT_END_OF_INPUT)
 246     {
 247       if (unformat (&input, "value1 %d", &value1))
 248         ;/* unformat sets value1 */
 249       else if (unformat (&input, "value2 %d", &value2)
 250         ;/* unformat sets value2 */
 251       else
 252         return clib_error_return (0, "unknown input '%U'",
 253                                   format_unformat_error, input);
 254     }
 255 ```
 256
 257 As with format, unformat implements a user-unformat function capability
 258 via a "%U" user unformat function scheme. Generally, one can trivially
 259 transform "format (s, "foo %d", foo) -> "unformat (input, "foo %d", &foo)".
 260
 261 Unformat implements a couple of handy non-scanf-like format specifiers:
 262
 263 ```c
 264     unformat (input, "enable %=", &enable, 1 /* defaults to 1 */);
 265     unformat (input, "bitzero %|", &mask, (1<<0));
 266     unformat (input, "bitone %|", &mask, (1<<1));
 267     <etc>
 268 ```
 269
 270 The phrase "enable %=" means "set the supplied variable to the default
 271 value" if unformat parses the "enable" keyword all by itself. If
 272 unformat parses "enable 123" set the supplied variable to 123.
 273
 274 We could clean up a number of hand-rolled "verbose" + "verbose %d"
 275 argument parsing codes using "%=".
 276
 277 The phrase "bitzero %|" means "set the specified bit in the supplied
 278 bitmask" if unformat parses "bitzero". Although it looks like it could
 279 be fairly handy, it's very lightly used in the code base.
 280
 281 `%_` toggles whether or not to skip input white space.
 282
 283 For transition from skip to no-skip in middle of format string, skip input white space.  For example, the following:
 284
 285 ```c
 286 fmt = "%_%d.%d%_->%_%d.%d%_"
 287 unformat (input, fmt, &one, &two, &three, &four);
 288 ```
 289 matches input "1.2 -> 3.4".
 290 Without this, the space after -> does not get skipped.
 291
 292
 293 ```
 294
 295 ### How to parse a single input line
 296
 297 Debug CLI command functions MUST NOT accidentally consume input
 298 belonging to other debug CLI commands. Otherwise, it's impossible to
 299 script a set of debug CLI commands which "work fine" when issued one
 300 at a time.
 301
 302 This bit of code is NOT correct:
 303
 304 ```c
 305   /* Eats script input NOT beloging to it, and chokes! */
 306   while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
 307     {
 308       if (unformat (input, ...))
 309         ;
 310       else if (unformat (input, ...))
 311         ;
 312       else
 313         return clib_error_return (0, "parse error: '%U'",
 314                                      format_unformat_error, input);
 315         }
 316     }
 317 ```
 318
 319 When executed as part of a script, such a function will return "parse
 320 error: '<next-command-text>'" every time, unless it happens to be the
 321 last command in the script.
 322
 323 Instead, use "unformat_line_input" to consume the rest of a line's
 324 worth of input - everything past the path specified in the
 325 VLIB_CLI_COMMAND declaration.
 326
 327 For example, unformat_line_input with "my_command" set up as shown
 328 below and user input "my path is clear" will produce an
 329 unformat_input_t that contains "is clear".
 330
 331 ```c
 332     VLIB_CLI_COMMAND (...) = {
 333         .path = "my path",
 334     };
 335 ```
 336
 337 Here's a bit of code which shows the required mechanics, in full:
 338
 339 ```c
 340     static clib_error_t *
 341     my_command_fn (vlib_main_t * vm,
 342                    unformat_input_t * input,
 343                    vlib_cli_command_t * cmd)
 344     {
 345       unformat_input_t _line_input, *line_input = &_line_input;
 346       u32 this, that;
 347       clib_error_t *error = 0;
 348
 349       if (!unformat_user (input, unformat_line_input, line_input))
 350         return 0;
 351
 352       /*
 353        * Here, UNFORMAT_END_OF_INPUT is at the end of the line we consumed,
 354        * not at the end of the script...
 355        */
 356       while (unformat_check_input (line_input) != UNFORMAT_END_OF_INPUT)
 357         {
 358            if (unformat (line_input, "this %u", &this))
 359              ;
 360            else if (unformat (line_input, "that %u", &that))
 361              ;
 362            else
 363              {
 364                error = clib_error_return (0, "parse error: '%U'",
 365                                      format_unformat_error, line_input);
 366                goto done;
 367              }
 368           }
 369
 370     <do something based on "this" and "that", etc>
 371
 372     done:
 373       unformat_free (line_input);
 374       return error;
 375     }
 376    /* *INDENT-OFF* */
 377    VLIB_CLI_COMMAND (my_command, static) = {
 378      .path = "my path",
 379      .function = my_command_fn",
 380    };
 381    /* *INDENT-ON* */
 382
 383 ```
 384
 385
 386 Vppinfra errors and warnings
 387 ----------------------------
 388
 389 Many functions within the vpp dataplane have return-values of type
 390 clib\_error\_t \*. Clib\_error\_t's are arbitrary strings with a bit of
 391 metadata \[fatal, warning\] and are easy to announce. Returning a NULL
 392 clib\_error\_t \* indicates "A-OK, no error."
 393
 394 Clib\_warning(format-args) is a handy way to add debugging
 395 output; clib warnings prepend function:line info to unambiguously locate
 396 the message source. Clib\_unix\_warning() adds perror()-style Linux
 397 system-call information. In production images, clib\_warnings result in
 398 syslog entries.
 399
 400 Serialization
 401 -------------
 402
 403 Vppinfra serialization support allows the programmer to easily serialize
 404 and unserialize complex data structures.
 405
 406 The underlying primitive serialize/unserialize functions use network
 407 byte-order, so there are no structural issues serializing on a
 408 little-endian host and unserializing on a big-endian host.