src/plugins/acl/acl_hash_lookup_doc.rst

   1 ACL plugin constant-time lookup
   2 ===============================
   3
   4 The initial implementation of ACL plugin performs a trivial for() cycle,
   5 going through the assigned ACLs on a per-packet basis. This is not very
   6 efficient, even if for very short ACLs due to its simplicity it can beat
   7 more advanced methods.
   8
   9 However, to cover the case of longer ACLs with acceptable performance,
  10 we need to have a better way of matching. This write-up proposes a
  11 mechanism to make a lookup from O(M) where M is number of entries to
  12 O(N) where N is number of different mask combinations.
  13
  14 Preparation of ACL(s)
  15 ---------------------
  16
  17 The ACL plugin will maintain a global list of “mask types”, i.e. the
  18 specific configurations of “do not care” bits within the ACEs. Upon the
  19 creation of a new ACL, a pass will be made through all the ACEs, to
  20 assign and possibly allocate the “mask type number”.
  21
  22 Each ACL has a structure *hash_acl_info_t* representing the “hash-based”
  23 parts of information related to that ACL, primarily the array of
  24 *hash_ace_info_t* structures - each of the members of that array
  25 corresponding to one of the rules (ACEs) in the original ACL, for this
  26 they have a pair of *(acl_index, ace_index)* to keep track,
  27 predominantly for debugging.
  28
  29 Why do we need a whole separate structure, and are not adding new fields
  30 to the existing rule structure? First, encapsulation, to minimize the
  31 pollution of the main ACL code with the hash-based lookup artifacts.
  32 Second, one rule may correspond to more than one “hash-based” ACE. In
  33 fact, most of the rules do correspond to two of those. Why ?
  34
  35 Consider that the current ACL lookup logic is that if a packet is not
  36 the initial fragment, and there is an L4 entry acting on the packet, the
  37 comparison will be made only on the L4 protocol field value rather than
  38 on the protocol and port values. This behavior is governed by
  39 *l4_match_nonfirst_fragment* flag in the *acl_main*, and is needed to
  40 maintain the compatibility with the existing software switch
  41 implementation.
  42
  43 While for the sequential check in *single_acl_match_5tuple()* it is very
  44 easy to implement by just breaking out at the right moment, in case of
  45 hash-based matching this cost us two checks: one on full 5-tuple and the
  46 flag *pkt.is_nonfirst_fragment* being zero, the second on 3-tuple and
  47 the flag *pkt.is_nonfirst_fragment* being one, with the second check
  48 triggered by the *acl_main.l4_match_nonfirst_fragment* setting being the
  49 default 1. This dictates the necessity of having a “match” field in a
  50 given *hash_ace_info_t* element, which would reflect the value we are
  51 supposed to match after applying the mask.
  52
  53 There can be other circumstances when it might be beneficial to expand
  54 the given rule in the original ACL into multiple - for example, as an
  55 optimization within the port range handling for small port ranges (this
  56 is not done as of the time of writing).
  57
  58 Assigning ACLs to an interface
  59 ------------------------------
  60
  61 Once the ACL list is assigned to an interface, or, rather, a new ACL is
  62 added to the list of the existing ACLs applied to the interface, we need
  63 to update the bihash accelerating the lookup.
  64
  65 All the entries for the lookups are stored within a single *48_8*
  66 bihash, which captures the 5-tuple from the packet as well as the
  67 miscellaneous per-packet information flags, e.g. *l4_valid*,
  68 *is_non_first_fragment*, and so on. To facilitate the use of the single
  69 bihash by all the interfaces, the *is_ip6*, *is_input*, *sw_if_index*
  70 are part of the key, as well as *mask_type_index* - the latter being
  71 necessary because there can be entries with the same value but different
  72 masks, e.g.: ``permit ::/0, permit::/128``.
  73
  74 At the moment of an ACL being applied to an interface, we need to walk
  75 the list of *hash_ace_info_t* entries corresponding to that ACL, and
  76 update the bihash with the keys corresponding to the match values in
  77 these entries.
  78
  79 The value of the hash match contains the index into a per-*sw_if_index*
  80 vector of *applied_ace_hash_entry_t* elements, as well as a couple of
  81 flags: *shadowed* (optimization: if this flag on a matched entry is
  82 zero, means we can stop the lookup early and declare a match - see
  83 below), and *need_portrange_check* - meaning that what matched was a
  84 superset of the actual match, and we need to perform an extra check.
  85
  86 Also, upon insertion, we must keep in mind there can be multiple
  87 *applied_ace_hash_entry_t* for the same key and must keep a list of
  88 those. This is necessary to incrementally apply/unapply the ACLs as part
  89 of the ACL vector: say, two ACLs have “permit 2001:db8::1/128 any” - we
  90 should be able to retain the entry for the second ACL even if we have
  91 deleted the first one. Also, in case there are two entries with the same
  92 key but different port ranges, say 0..42 and 142..65535 - we need to be
  93 able to sequentially match on those if we decide not to expand them into
  94 individual port-specific entries.
  95
  96 Per-packet lookup
  97 -----------------
  98
  99 The simple single-packet lookup is defined in
 100 *multi_acl_match_get_applied_ace_index*, which returns the index of the
 101 applied hash ACE if there was a match, or ~0 if there wasn’t.
 102
 103 The future optimized per-packet lookup may be batched in three phases:
 104
 105 1. Prepare the keys in the per-worker vector by doing logical AND of
 106    original 5-tuple record with the elements of the mask vector.
 107 2. Lookup the keys in the bihash in a batch manner, collecting the
 108    result with lowest u64 (acl index within vector, ACE index) from the
 109    hash lookup value, and performing the list walk if necessary (for
 110    portranges).
 111 3. Take the action from the ACL record as defined by (ACL#, ACE#) from
 112    the resulting lookup winner, or, if no match found, then perform
 113    default deny.
 114
 115 Shadowed/independent/redundant ACEs
 116 -----------------------------------
 117
 118 During the phase of combining multiple ACLs into one rulebase, when they
 119 are applied to interface, we also can perform several optimizations.
 120
 121 If a given ACE is a strict subset of another ACE located up in the
 122 linear search order, we can ignore this ACE completely - because by
 123 definition it will never match. We will call such an ACE *redundant*.
 124 Here is an example:
 125
 126 ::
 127
 128    permit 2001:db8:1::/48 2001:db8:2::/48   (B)
 129    deny 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A)
 130
 131 A bit more formally, we can define this relationship of an ACE A to ACE
 132 B as:
 133
 134 ::
 135
 136    redundant(aceA, aceB) := (contains(protoB, protoA) && contains(srcB, srcA)
 137                              && contains(dstB, dstA) && is_after(A, B))
 138
 139 Here as “contains” we define an operation operating on the sets defined
 140 by the protocol, (srcIP, srcPortDefinition) and (dstIP,
 141 dstPortDefinition) respectively, and returning true if all the elements
 142 represented by the second argument are represented by the first
 143 argument. The “is_after” is true if A is located below B in the ruleset.
 144
 145 If a given ACE does not intersect at all with any other ACE in front of
 146 it, we can mark it as such.
 147
 148 Then during the sequence of the lookups the successful hit on this ACE
 149 means we do not need to look up other mask combinations - thus
 150 potentially significantly speeding up the match process. Here is an
 151 example, assuming we have the following ACL:
 152
 153 ::
 154
 155    permit 2001:db8:1::/48 2001:db8:2::/48    (B)
 156    deny 2001:db8:3::/48 2001:db8:2:1::/64    (A)
 157
 158 In this case if we match the second entry, we do not need to check
 159 whether we have matched the first one - the source addresses are
 160 completely different. We call such an ACE *independent* from another.
 161
 162 We can define this as
 163
 164 ::
 165
 166    independent(aceA, aceB) := (!intersect(protoA, protoB) ||
 167                                !intersect(srcA, srcB) ||
 168                                !intersect(dstA, dstB))
 169
 170 where intersect is defined as operation returning true if there are
 171 elements belonging to the sets of both arguments.
 172
 173 If the entry A is neither redundant nor independent from B, and is below
 174 B in the ruleset, we call such an entry *shadowed* by B, here is an
 175 example:
 176
 177 ::
 178
 179    deny tcp 2001:db8:1::/48 2001:db8:2::/48         (B)
 180    permit 2001:d8b:1:1::/64 2001:db8:2:1::/64    (A)
 181
 182 This means the earlier rule “carves out” a subset of A, thus leaving a
 183 “shadow”. (Evidently, the action needs to be different for the shadow to
 184 have an effect, but for for the terminology sake we do not care).
 185
 186 The more formal definition:
 187
 188 ::
 189
 190    shadowed(aceA, aceB) := !redundant(aceA, aceB) &&
 191                            !independent(aceA, aceB) &&
 192                            is_after(aceA, aceB)
 193
 194 Using this terminology, any ruleset can be represented as a DAG
 195 (Directed Acyclic Graph), with the bottom being the implicit “deny any”,
 196 pointing to the set of rules shadowing it or the ones it is redundant
 197 for.
 198
 199 These rules may in turn be shadowing each other. There is no cycles in
 200 this graph because of the natural order of the rules - the rule located
 201 closer to the end of the ruleset can never shadow or make redundant a
 202 rule higher up.
 203
 204 The optimization that enables can allow for is to skip matching certain
 205 masks on a per-lookup basis - if a given rule has matched, the only
 206 adjustments that can happen is the match with one of the shadowing
 207 rules.
 208
 209 Also, another avenue for the optimization can be starting the lookup
 210 process with the mask type that maximizes the chances of the independent
 211 ACE match, thus resulting in an ACE lookup being a single hash table
 212 hit.
 213
 214 Plumbing
 215 --------
 216
 217 All the new routines are located in a separate file, so we can cleanly
 218 experiment with a different approach if this does not fit all of the use
 219 cases.
 220
 221 The constant-time lookup within the data path has the API with the same
 222 signature as:
 223
 224 ::
 225
 226    u8
 227    multi_acl_match_5tuple (u32 sw_if_index, fa_5tuple_t * pkt_5tuple, int is_l2,
 228                           int is_ip6, int is_input, u32 * acl_match_p,
 229                           u32 * rule_match_p, u32 * trace_bitmap)
 230
 231 There should be a new upper-level function with the same signature,
 232 which will make a decision whether to use a linear lookup, or to use the
 233 constant-time lookup implemented by this work, or to add some other
 234 optimizations (e.g. by keeping the cache of the last N lookups).
 235
 236 The calls to the routine doing preparatory work should happen in
 237 ``acl_add_list()`` after creating the linear-lookup structures, and the
 238 routine doing the preparatory work populating the hashtable should be
 239 called from ``acl_interface_add_del_inout_acl()`` or its callees.
 240
 241 The initial implementation will be geared towards looking up a single
 242 match at a time, with the subsequent optimizations possible to make the
 243 lookup for more than one packet.