vppinfra: write up clib_time_t

[vpp.git] / docs / gettingstarted / developers / infrastructure.md
diff --git a/docs/gettingstarted/developers/infrastructure.md b/docs/gettingstarted/developers/infrastructure.md

index 688c421..12f96d5 100644 (file)
--- a/docs/gettingstarted/developers/infrastructure.md
+++ b/docs/gettingstarted/developers/infrastructure.md
@@ -48,13 +48,16 @@ macro! It's smart about NULL pointers.\]
  
  Typically, the user header is not present. User headers allow for other
  data structures to be built atop vppinfra vectors. Users may specify the
  
  Typically, the user header is not present. User headers allow for other
  data structures to be built atop vppinfra vectors. Users may specify the
-alignment for data elements via the [vec]()\*\_aligned macros.
+alignment for first data element of a vector via the \[vec\]()\*\_aligned
+macros.
  
  
-Vectors elements can be any C type e.g. (int, double, struct bar). This
+Vector elements can be any C type e.g. (int, double, struct bar). This
  is also true for data types built atop vectors (e.g. heap, pool, etc.).
  is also true for data types built atop vectors (e.g. heap, pool, etc.).
-Many macros have \_a variants supporting alignment of vector data and
-\_h variants supporting non-zero-length vector headers. The \_ha
-variants support both.
+Many macros have \_a variants supporting alignment of vector elements
+and \_h variants supporting non-zero-length vector headers. The \_ha
+variants support both.  Additionally cacheline alignment within a
+vector element structure can be specified using the
+\[CLIB_CACHE_LINE_ALIGN_MARK\]() macro.
  
  Inconsistent usage of header and/or alignment related macro variants
  will cause delayed, confusing failures.
  
  Inconsistent usage of header and/or alignment related macro variants
  will cause delayed, confusing failures.
@@ -111,6 +114,63 @@ key\_pointer. It is usually a bad mistake to pass the address of a
  vector element as the second argument to hash\_set\_mem. It is perfectly
  fine to memorize constant string addresses in the text segment.
  
  vector element as the second argument to hash\_set\_mem. It is perfectly
  fine to memorize constant string addresses in the text segment.
  
+Timekeeping
+-----------
+
+Vppinfra includes high-precision, low-cost timing services. The
+datatype clib_time_t and associated functions reside in
+./src/vppinfra/time.\[ch\]. Call clib_time_init (clib_time_t \*cp) to
+initialize the clib_time_t object.
+
+Clib_time_init(...) can use a variety of different ways to establish
+the hardware clock frequency. At the end of the day, vppinfra
+timekeeping takes the attitude that the operating system's clock is
+the closest thing to a gold standard it has handy.
+
+When properly configured, NTP maintains kernel clock synchronization
+with a highly accurate off-premises reference clock.  Notwithstanding
+network propagation delays, a synchronized NTP client will keep the
+kernel clock accurate to within 50ms or so.
+
+Why should one care? Simply put, oscillators used to generate CPU
+ticks aren't super accurate. They work pretty well, but a 0.1% error
+wouldn't be out of the question. That's a minute and a half's worth of
+error in 1 day. The error changes constantly, due to temperature
+variation, and a host of other physical factors.
+
+It's far too expensive to use system calls for timing, so we're left
+with the problem of continously adjusting our view of the CPU tick
+register's clocks_per_second parameter.
+
+The clock rate adjustment algorithm measures the number of cpu ticks
+and the "gold standard" reference time across an interval of
+approximately 16 seconds. We calculate clocks_per_second for the
+interval: use rdtsc (on x86_64) and a system call to get the latest
+cpu tick count and the kernel's latest nanosecond timestamp. We
+subtract the previous interval end values, and use exponential
+smoothing to merge the new clock rate sample into the clocks_per_second
+parameter.
+
+As of this writing, we maintain the clock rate by way of the following
+first-order differential equation:
+
+
+```
+   clocks_per_second(t) = clocks_per_second(t-1) * K + sample_cps(t)*(1-K)
+   where K = e**(-1.0/3.75);
+```
+
+This yields a per observation "half-life" of 1 minute. Empirically,
+the clock rate converges within 5 minutes, and appears to maintain
+near-perfect agreement with the kernel clock in the face of ongoing
+NTP time adjustments.
+
+See ./src/vppinfra/time.c:clib_time_verify_frequency(...) to look at
+the rate adjustment algorithm. The code rejects frequency samples
+corresponding to the sort of adjustment which might occur if someone
+changes the gold standard kernel clock by several seconds.
+
+
  Format
  ------
  
  Format
  ------
  
@@ -135,8 +195,8 @@ schemes may be used:
  
  ```c
      vec_add1 (result, 0)
  
  ```c
      vec_add1 (result, 0)
-    or 
-    result = format (result, "<whatever>%c", 0); 
+    or
+    result = format (result, "<whatever>%c", 0);
  ```
  
  Remember to vec\_free() the result if appropriate. Be careful not to
  ```
  
  Remember to vec\_free() the result if appropriate. Be careful not to
@@ -158,8 +218,8 @@ format specification. For example:
  
  format\_junk() can invoke other user-format functions if desired. The
  programmer shoulders responsibility for argument type-checking. It is
  
  format\_junk() can invoke other user-format functions if desired. The
  programmer shoulders responsibility for argument type-checking. It is
-typical for user format functions to blow up if the va\_arg(va,
-type) macros don't match the caller's idea of reality.
+typical for user format functions to blow up spectacularly if the
+va\_arg(va, type) macros don't match the caller's idea of reality.
  
  Unformat
  --------
  
  Unformat
  --------
@@ -182,149 +242,153 @@ follows:
  Then loop parsing individual elements:
  
  ```c
  Then loop parsing individual elements:
  
  ```c
-    while (unformat_check_input (&input) != UNFORMAT_END_OF_INPUT) 
+    while (unformat_check_input (&input) != UNFORMAT_END_OF_INPUT)
      {
        if (unformat (&input, "value1 %d", &value1))
          ;/* unformat sets value1 */
        else if (unformat (&input, "value2 %d", &value2)
          ;/* unformat sets value2 */
        else
      {
        if (unformat (&input, "value1 %d", &value1))
          ;/* unformat sets value1 */
        else if (unformat (&input, "value2 %d", &value2)
          ;/* unformat sets value2 */
        else
-        return clib_error_return (0, "unknown input '%U'", 
+        return clib_error_return (0, "unknown input '%U'",
                                    format_unformat_error, input);
      }
  ```
  
  As with format, unformat implements a user-unformat function capability
                                    format_unformat_error, input);
      }
  ```
  
  As with format, unformat implements a user-unformat function capability
-via a "%U" user unformat function scheme.
+via a "%U" user unformat function scheme. Generally, one can trivially
+transform "format (s, "foo %d", foo) -> "unformat (input, "foo %d", &foo)".
  
  
-Vppinfra errors and warnings
-----------------------------
+Unformat implements a couple of handy non-scanf-like format specifiers:
  
  
-Many functions within the vpp dataplane have return-values of type
-clib\_error\_t \*. Clib\_error\_t's are arbitrary strings with a bit of
-metadata \[fatal, warning\] and are easy to announce. Returning a NULL
-clib\_error\_t \* indicates "A-OK, no error."
+```c
+    unformat (input, "enable %=", &enable, 1 /* defaults to 1 */);
+    unformat (input, "bitzero %|", &mask, (1<<0));
+    unformat (input, "bitone %|", &mask, (1<<1));
+    <etc>
+```
  
  
-Clib\_warning(format-args) is a handy way to add debugging
-output; clib warnings prepend function:line info to unambiguously locate
-the message source. Clib\_unix\_warning() adds perror()-style Linux
-system-call information. In production images, clib\_warnings result in
-syslog entries.
+The phrase "enable %=" means "set the supplied variable to the default
+value" if unformat parses the "enable" keyword all by itself. If
+unformat parses "enable 123" set the supplied variable to 123.
  
  
-Serialization
--------------
+We could clean up a number of hand-rolled "verbose" + "verbose %d"
+argument parsing codes using "%=".
  
  
-Vppinfra serialization support allows the programmer to easily serialize
-and unserialize complex data structures.
+The phrase "bitzero %|" means "set the specified bit in the supplied
+bitmask" if unformat parses "bitzero". Although it looks like it could
+be fairly handy, it's very lightly used in the code base.
  
  
-The underlying primitive serialize/unserialize functions use network
-byte-order, so there are no structural issues serializing on a
-little-endian host and unserializing on a big-endian host.
-
-Event-logger, graphical event log viewer
-----------------------------------------
-
-The vppinfra event logger provides very lightweight (sub-100ns)
-precisely time-stamped event-logging services. See
-./src/vppinfra/{elog.c, elog.h}
+### How to parse a single input line
  
  
-Serialization support makes it easy to save and ultimately to combine a
-set of event logs. In a distributed system running NTP over a local LAN,
-we find that event logs collected from multiple system elements can be
-combined with a temporal uncertainty no worse than 50us.
+Debug CLI command functions MUST NOT accidentally consume input
+belonging to other debug CLI commands. Otherwise, it's impossible to
+script a set of debug CLI commands which "work fine" when issued one
+at a time.
  
  
-A typical event definition and logging call looks like this:
+This bit of code is NOT correct:
  
  ```c
  
  ```c
-    ELOG_TYPE_DECLARE (e) = 
+  /* Eats script input NOT beloging to it, and chokes! */
+  while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
      {
      {
-      .format = "tx-msg: stream %d local seq %d attempt %d",
-      .format_args = "i4i4i4",
-    };
-    struct { u32 stream_id, local_sequence, retry_count; } * ed;
-    ed = ELOG_DATA (m->elog_main, e);
-    ed->stream_id = stream_id;
-    ed->local_sequence = local_sequence;
-    ed->retry_count = retry_count;
+      if (unformat (input, ...))
+       ;
+      else if (unformat (input, ...))
+       ;
+      else
+        return clib_error_return (0, "parse error: '%U'",
+                                    format_unformat_error, input);
+       }
+    }
  ```
  
  ```
  
-The ELOG\_DATA macro returns a pointer to 20 bytes worth of arbitrary
-event data, to be formatted (offline, not at runtime) as described by
-format\_args. Aside from obvious integer formats, the CLIB event logger
-provides a couple of interesting additions. The "t4" format
-pretty-prints enumerated values:
+When executed as part of a script, such a function will return "parse
+error: '<next-command-text>'" every time, unless it happens to be the
+last command in the script.
+
+Instead, use "unformat_line_input" to consume the rest of a line's
+worth of input - everything past the path specified in the
+VLIB_CLI_COMMAND declaration.
+
+For example, unformat_line_input with "my_command" set up as shown
+below and user input "my path is clear" will produce an
+unformat_input_t that contains "is clear".
  
  ```c
  
  ```c
-    ELOG_TYPE_DECLARE (e) = 
-    {
-      .format = "get_or_create: %s",
-      .format_args = "t4",
-      .n_enum_strings = 2,
-      .enum_strings = { "old", "new", },
+    VLIB_CLI_COMMAND (...) = {
+        .path = "my path",
      };
  ```
  
      };
  ```
  
-The "t" format specifier indicates that the corresponding datum is an
-index in the event's set of enumerated strings, as shown in the previous
-event type definition.
-
-The “T” format specifier indicates that the corresponding datum is an
-index in the event log’s string heap. This allows the programmer to emit
-arbitrary formatted strings. One often combines this facility with a
-hash table to keep the event-log string heap from growing arbitrarily
-large.
-
-Noting the 20-octet limit per-log-entry data field, the event log
-formatter supports arbitrary combinations of these data types. As in:
-the ".format" field may contain one or more instances of the following:
-
--   i1 - 8-bit unsigned integer
--   i2 - 16-bit unsigned integer
--   i4 - 32-bit unsigned integer
--   i8 - 64-bit unsigned integer
--   f4 - float
--   f8 - double
--   s - NULL-terminated string - be careful
--   sN - N-byte character array
--   t1,2,4 - per-event enumeration ID
--   T4 - Event-log string table offset
-
-The vpp engine event log is thread-safe, and is shared by all threads.
-Take care not to serialize the computation. Although the event-logger is
-about as fast as practicable, it's not appropriate for per-packet use in
-hard-core data plane code. It's most appropriate for capturing rare
-events - link up-down events, specific control-plane events and so
-forth.
-
-The vpp engine has several debug CLI commands for manipulating its event
-log:
+Here's a bit of code which shows the required mechanics, in full:
  
  
-```
-    vpp# event-logger clear
-    vpp# event-logger save <filename> # for security, writes into /tmp/<filename>.
-                                      # <filename> must not contain '.' or '/' characters
-    vpp# show event-logger [all] [<nnn>] # display the event log
-                                       # by default, the last 250 entries
-```
+```c
+    static clib_error_t *
+    my_command_fn (vlib_main_t * vm,
+                   unformat_input_t * input,
+                   vlib_cli_command_t * cmd)
+    {
+      unformat_input_t _line_input, *line_input = &_line_input;
+      u32 this, that;
+      clib_error_t *error = 0;
+
+      if (!unformat_user (input, unformat_line_input, line_input))
+        return 0;
+
+      /*
+       * Here, UNFORMAT_END_OF_INPUT is at the end of the line we consumed,
+       * not at the end of the script...
+       */
+      while (unformat_check_input (line_input) != UNFORMAT_END_OF_INPUT)
+        {
+           if (unformat (line_input, "this %u", &this))
+             ;
+           else if (unformat (line_input, "that %u", &that))
+             ;
+           else
+             {
+               error = clib_error_return (0, "parse error: '%U'",
+                                    format_unformat_error, line_input);
+               goto done;
+             }
+          }
+
+    <do something based on "this" and "that", etc>
+
+    done:
+      unformat_free (line_input);
+      return error;
+    }
+   /* *INDENT-OFF* */
+   VLIB_CLI_COMMAND (my_command, static) = {
+     .path = "my path",
+     .function = my_command_fn",
+   };
+   /* *INDENT-ON* */
  
  
-The event log defaults to 128K entries. The command-line argument "...
-vlib { elog-events nnn } ..." configures the size of the event log.
+```
  
  
-As described above, the vpp engine event log is thread-safe and shared.
-To avoid confusing non-appearance of events logged by worker threads,
-make sure to code vlib\_global\_main.elog\_main - instead of
-vm->elog\_main. The latter form is correct in the main thread, but
-will almost certainly produce bad results in worker threads.
  
  
-G2 graphical event viewer
--------------------------
+Vppinfra errors and warnings
+----------------------------
  
  
-The g2 graphical event viewer can display serialized vppinfra event logs
-directly, or via the c2cpel tool.
+Many functions within the vpp dataplane have return-values of type
+clib\_error\_t \*. Clib\_error\_t's are arbitrary strings with a bit of
+metadata \[fatal, warning\] and are easy to announce. Returning a NULL
+clib\_error\_t \* indicates "A-OK, no error."
  
  
-<div class="admonition note">
+Clib\_warning(format-args) is a handy way to add debugging
+output; clib warnings prepend function:line info to unambiguously locate
+the message source. Clib\_unix\_warning() adds perror()-style Linux
+system-call information. In production images, clib\_warnings result in
+syslog entries.
  
  
-Todo: please convert wiki page and figures
+Serialization
+-------------
  
  
-</div>
+Vppinfra serialization support allows the programmer to easily serialize
+and unserialize complex data structures.
  
  
+The underlying primitive serialize/unserialize functions use network
+byte-order, so there are no structural issues serializing on a
+little-endian host and unserializing on a big-endian host.