src/plugins/acl/acl_hash_lookup_doc.md

   1 ACL plugin constant-time lookup design    {#acl_hash_lookup}
   2 ======================================
   3
   4 The initial implementation of ACL plugin performs a trivial for() cycle,
   5 going through the assigned ACLs on a per-packet basis. This is not very
   6 efficient, even if for very short ACLs due to its simplicity it can beat
   7 more advanced methods.
   8
   9 However, to cover the case of longer ACLs with acceptable performance,
  10 we need to have a better way of matching. This write-up proposes
  11 a mechanism to make a lookup from O(M) where M is number of entries
  12 to O(N) where N is number of different mask combinations.
  13
  14 Preparation of ACL(s)
  15 ---------------------
  16
  17 The ACL plugin will maintain a global list of "mask types", i.e. the specific
  18 configurations of "do not care" bits within the ACEs.
  19 Upon the creation of a new ACL, a pass will be made through all the
  20 ACEs, to assign and possibly allocate the "mask type number".
  21
  22 Each ACL has a structure *hash_acl_info_t* representing the "hash-based"
  23 parts of information related to that ACL, primarily the array of
  24 *hash_ace_info_t* structures - each of the members of that array
  25 corresponding to one of the rules (ACEs) in the original ACL,
  26 for this they have a pair of *(acl_index, ace_index)* to keep track,
  27 predominantly for debugging.
  28
  29 Why do we need a whole separate structure, and are not adding new fields
  30 to the existing rule structure? First, encapsulation, to minimize
  31 the pollution of the main ACL code with the hash-based lookup artifacts.
  32 Second, one rule may correspond to more than one "hash-based" ACE.
  33 In fact, most of the rules do correspond to two of those. Why ?
  34
  35 Consider that the current ACL lookup logic is that if a packet
  36 is not the initial fragment, and there is an L4 entry acting on the packet,
  37 the comparison will be made only on the L4 protocol field value rather
  38 than on the protocol and port values. This behavior is governed by
  39 *l4_match_nonfirst_fragment* flag in the *acl_main*, and is needed to
  40 maintain the compatibility with the existing software switch implementation.
  41
  42 While for the sequential check in *single_acl_match_5tuple()*
  43 it is very easy to implement by just breaking out at the right moment,
  44 in case of hash-based matching this cost us two checks:
  45 one on full 5-tuple and the flag *pkt.is_nonfirst_fragment* being zero,
  46 the second on 3-tuple and the flag *pkt.is_nonfirst_fragment* being one,
  47 with the second check triggered by the *acl_main.l4_match_nonfirst_fragment*
  48 setting being the default 1. This dictates the necessity of having a "match"
  49 field in a given *hash_ace_info_t* element, which would reflect the value
  50 we are supposed to match after applying the mask.
  51
  52 There can be other circumstances when it might be beneficial to expand
  53 the given rule in the original ACL into multiple - for example, as an
  54 optimization within the port range handling for small port ranges
  55 (this is not done as of the time of writing).
  56
  57 Assigning ACLs to an interface
  58 ------------------------------
  59
  60 Once the ACL list is assigned to an interface, or, rather, a new ACL
  61 is added to the list of the existing ACLs applied to the interface,
  62 we need to update the bihash accelerating the lookup.
  63
  64 All the entries for the lookups are stored within a single *48_8* bihash,
  65 which captures the 5-tuple from the packet as well as the miscellaneous
  66 per-packet information flags, e.g. *l4_valid*, *is_non_first_fragment*,
  67 and so on. To facilitate the use of the single bihash by all the interfaces,
  68 the *is_ip6*, *is_input*, *sw_if_index* are part of the key,
  69 as well as *mask_type_index* - the latter being necessary because
  70 there can be entries with the same value but different masks, e.g.:
  71 `permit ::/0, permit::/128`.
  72
  73 At the moment of an ACL being applied to an interface, we need to
  74 walk the list of *hash_ace_info_t* entries corresponding to that ACL,
  75 and update the bihash with the keys corresponding to the match
  76 values in these entries.
  77
  78 The value of the hash match contains the index into a per-*sw_if_index* vector
  79 of *applied_ace_hash_entry_t* elements, as well as a couple of flags:
  80 *shadowed* (optimization: if this flag on a matched entry is zero, means
  81 we can stop the lookup early and declare a match - see below),
  82 and *need_portrange_check* - meaning that what matched was a superset
  83 of the actual match, and we need to perform an extra check.
  84
  85 Also, upon insertion, we must keep in mind there can be
  86 multiple *applied_ace_hash_entry_t* for the same key and must keep
  87 a list of those. This is necessary to incrementally apply/unapply
  88 the ACLs as part of the ACL vector: say, two ACLs have
  89 "permit 2001:db8::1/128 any" - we should be able to retain the entry
  90 for the second ACL even if we have deleted the first one.
  91 Also, in case there are two entries with the same key but
  92 different port ranges, say 0..42 and 142..65535 - we need
  93 to be able to sequentially match on those if we decide not
  94 to expand them into individual port-specific entries.
  95
  96 Per-packet lookup
  97 -----------------
  98
  99 The simple single-packet lookup is defined in
 100 *multi_acl_match_get_applied_ace_index*, which returns the index
 101 of the applied hash ACE if there was a match, or ~0 if there wasn't.
 102
 103 The future optimized per-packet lookup may be batched in three phases:
 104
 105 1. Prepare the keys in the per-worker vector by doing logical AND of
 106    original 5-tuple record with the elements of the mask vector.
 107 2. Lookup the keys in the bihash in a batch manner, collecting the
 108    result with lowest u64 (acl index within vector, ACE index) from
 109    the hash lookup value, and performing the list walk if necessary
 110    (for portranges).
 111 3. Take the action from the ACL record as defined by (ACL#, ACE#) from the
 112    resulting lookup winner, or, if no match found, then perform default deny.
 113
 114 Shadowed/independent/redundant ACEs
 115 ------------------------------------
 116
 117 During the phase of combining multiple ACLs into one rulebase, when they
 118 are applied to interface, we also can perform several optimizations.
 119
 120 If a given ACE is a strict subset of another ACE located up in the linear
 121 search order, we can ignore this ACE completely - because by definition
 122 it will never match. We will call such an ACE *redundant*. Here is an example:
 123
 124 ```
 125 permit 2001:db8:1::/48 2001:db8:2::/48   (B)
 126 deny 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A)
 127 ```
 128
 129 A bit more formally, we can define this relationship of an ACE A to ACE B as:
 130
 131 ```
 132 redundant(aceA, aceB) := (contains(protoB, protoA) && contains(srcB, srcA)
 133                           && contains(dstB, dstA) && is_after(A, B))
 134 ```
 135
 136 Here as "contains" we define an operation operating on the sets defined by
 137 the protocol, (srcIP, srcPortDefinition) and (dstIP, dstPortDefinition)
 138 respectively, and returning true if all the elements represented by
 139 the second argument are represented by the first argument. The "is_after"
 140 is true if A is located below B in the ruleset.
 141
 142 If a given ACE does not intersect at all with any other ACE
 143 in front of it, we can mark it as such.
 144
 145 Then during the sequence of the lookups the successful hit on this ACE means
 146 we do not need to look up other mask combinations - thus potentially
 147 significantly speeding up the match process. Here is an example,
 148 assuming we have the following ACL:
 149
 150 ```
 151 permit 2001:db8:1::/48 2001:db8:2::/48    (B)
 152 deny 2001:db8:3::/48 2001:db8:2:1::/64    (A)
 153 ```
 154
 155 In this case if we match the second entry, we do not need to check whether
 156 we have matched the first one - the source addresses are completely
 157 different. We call such an ACE *independent* from another.
 158
 159 We can define this as
 160
 161 ```
 162 independent(aceA, aceB) := (!intersect(protoA, protoB) ||
 163                             !intersect(srcA, srcB) ||
 164                             !intersect(dstA, dstB))
 165 ```
 166
 167 where intersect is defined as operation returning true if there are
 168 elements belonging to the sets of both arguments.
 169
 170 If the entry A is neither redundant nor independent from B, and is below
 171 B in the ruleset, we call such an entry *shadowed* by B, here is an example:
 172
 173 ```
 174 deny tcp 2001:db8:1::/48 2001:db8:2::/48         (B)
 175 permit 2001:d8b:1:1::/64 2001:db8:2:1::/64    (A)
 176 ```
 177
 178 This means the earlier rule "carves out" a subset of A, thus leaving
 179 a "shadow". (Evidently, the action needs to be different for the shadow
 180 to have an effect, but for for the terminology sake we do not care).
 181
 182 The more formal definition:
 183
 184 ```
 185 shadowed(aceA, aceB) := !redundant(aceA, aceB) &&
 186                         !independent(aceA, aceB) &&
 187                         is_after(aceA, aceB)
 188 ```
 189
 190 Using this terminology, any ruleset can be represented as
 191 a DAG (Directed Acyclic Graph), with the bottom being the implicit
 192 "deny any", pointing to the set of rules shadowing it or the ones
 193 it is redundant for.
 194
 195 These rules may in turn be shadowing each other. There is no cycles in
 196 this graph because of the natural order of the rules - the rule located
 197 closer to the end of the ruleset can never shadow or make redundant a rule
 198 higher up.
 199
 200 The optimization that enables can allow for is to skip matching certain
 201 masks on a per-lookup basis - if a given rule has matched,
 202 the only adjustments that can happen is the match with one of
 203 the shadowing rules.
 204
 205 Also, another avenue for the optimization can be starting the lookup process
 206 with the mask type that maximizes the chances of the independent ACE match,
 207 thus resulting in an ACE lookup being a single hash table hit.
 208
 209
 210 Plumbing
 211 --------
 212
 213 All the new routines are located in a separate file,
 214 so we can cleanly experiment with a different approach if this
 215 does not fit all of the use cases.
 216
 217 The constant-time lookup within the data path has the API with
 218 the same signature as:
 219
 220 ```
 221 u8
 222 multi_acl_match_5tuple (u32 sw_if_index, fa_5tuple_t * pkt_5tuple, int is_l2,
 223                        int is_ip6, int is_input, u32 * acl_match_p,
 224                        u32 * rule_match_p, u32 * trace_bitmap)
 225 ```
 226
 227 There should be a new upper-level function with the same signature, which
 228 will make a decision whether to use a linear lookup, or to use the
 229 constant-time lookup implemented by this work, or to add some other
 230 optimizations (e.g. by keeping the cache of the last N lookups).
 231
 232 The calls to the routine doing preparatory work should happen
 233 in `acl_add_list()` after creating the linear-lookup structures,
 234 and the routine doing the preparatory work populating the hashtable
 235 should be called from `acl_interface_add_del_inout_acl()` or its callees.
 236
 237 The initial implementation will be geared towards looking up a single
 238 match at a time, with the subsequent optimizations possible to make
 239 the lookup for more than one packet.
 240