From bb912f2e25b5205f0705c4b8a5bd325aed078754 Mon Sep 17 00:00:00 2001 From: Klement Sekera Date: Tue, 25 Jan 2022 17:32:38 +0000 Subject: [PATCH] ip: reassembly: add documentation Type: docs Signed-off-by: Klement Sekera Change-Id: I23008cde47d8b7a531346eab02902e2ced18742a --- docs/developer/corefeatures/index.rst | 1 + docs/developer/corefeatures/reassembly.rst | 1 + src/vnet/ip/reass/reassembly.rst | 221 +++++++++++++++++++++++++++++ 3 files changed, 223 insertions(+) create mode 120000 docs/developer/corefeatures/reassembly.rst create mode 100644 src/vnet/ip/reass/reassembly.rst diff --git a/docs/developer/corefeatures/index.rst b/docs/developer/corefeatures/index.rst index adc086e3c98..1012387a17a 100644 --- a/docs/developer/corefeatures/index.rst +++ b/docs/developer/corefeatures/index.rst @@ -12,6 +12,7 @@ Core Features punt ipsec bfd_doc + reassembly ipfix_doc span_doc mtu diff --git a/docs/developer/corefeatures/reassembly.rst b/docs/developer/corefeatures/reassembly.rst new file mode 120000 index 00000000000..24762318b31 --- /dev/null +++ b/docs/developer/corefeatures/reassembly.rst @@ -0,0 +1 @@ +../../../src/vnet/ip/reass/reassembly.rst \ No newline at end of file diff --git a/src/vnet/ip/reass/reassembly.rst b/src/vnet/ip/reass/reassembly.rst new file mode 100644 index 00000000000..d6861ed8a05 --- /dev/null +++ b/src/vnet/ip/reass/reassembly.rst @@ -0,0 +1,221 @@ +.. _reassembly: + +IP Reassembly +============= + +Some VPP functions need access to whole packet and/or stream +classification based on L4 headers. Reassembly functionality allows +both former and latter. + +Full reassembly vs shallow (virtual) reassembly +----------------------------------------------- + +There are two kinds of reassembly available in VPP: + +1. Full reassembly changes a stream of packet fragments into one +packet containing all data reassembled with fragment bits cleared +and fragment header stripped (in case of ip6). Note that resulting +packet may come out of reassembly as a buffer chain. Because it's +impractical to parse headers which are split over multiple vnet +buffers, vnet_buffer_chain_linearize() is called after reassembly so +that L2/L3/L4 headers can be found in first buffer. Full reassembly +is costly and shouldn't be used unless necessary. Full reassembly is by +default enabled for both ipv4 and ipv6 traffic for "forus" traffic +- that is packets aimed at VPP addresses. This can be disabled via API +if desired, in which case "forus" fragments are dropped. + +2. Shallow (virtual) reassembly allows various classifying and/or +translating features to work with fragments without having to +understand fragmentation. It works by extracting L4 data and adding +them to vnet_buffer for each packet/fragment passing throught SVR +nodes. This operation is performed for both fragments and regular +packets, allowing consuming code to treat all packets in same way. SVR +caches incoming packet fragments (buffers) until first fragment is +seen. Then it extracts L4 data from that first fragment, fills it for +any cached fragments and transmits them in the same order as they were +received. From that point on, any other passing fragments get L4 data +populated in vnet_buffer based on reassembly context. + +Multi-worker behaviour +^^^^^^^^^^^^^^^^^^^^^^ + +Both reassembly types deal with fragments arriving on different workers +via handoff mechanism. All reassembly contexts are stored in pools. +Bihash mapping 5-tuple key to a value containing pool index and thread +index is used for lookups. When a lookup finds an existing reasembly on +a different thread, it hands off the fragment to that thread. If lookup +fails, a new reassembly context is created and current worker becomes +owner of that context. Further fragments received on other worker +threads are then handed off owner worker thread. + +Full reassembly also remembers thread index where first fragment (as in +fragment with fragment offset 0) was seen and uses handoff mechanism to +send the reassembled packet out on that thread even if pool owner is +a different thread. This then requires an additional handoff to free +reassembly context as only pool owner can do that in a thread-safe way. + +Limits +^^^^^^ + +Because reassembly could be an attack vector, there is a configurable +limit on the number of concurrent reassemblies and also maximum +fragments per packet. + +Custom applications +^^^^^^^^^^^^^^^^^^^ + +Both reassembly features allow to be used by custom applicatind which +are not part of VPP source tree. Be it patches or 3rd party plugins, +they can build their own graph paths by using "-custom*" versions of +nodes. Reassembly then reads next_index and error_next_index for each +buffer from vnet_buffer, allowing custom application to steer +both reassembled packets and any packets which are considered an error +in a way the custom application requires. + +Full reassembly +--------------- + +Configuration +^^^^^^^^^^^^^ + +Configuration is via API (``ip_reassembly_enable_disable``) or CLI: + +``set interface reassembly [on|off|ip4|ip6]`` + +here ``on`` means both ip4 and ip6. + +A show command is provided to see reassembly contexts: + +For ip4: + +``show ip4-full-reassembly [details]`` + +For ip6: + +``show ip6-full-reassembly [details]`` + +Global full reassembly parameters can be modified using API +``ip_reassembly_set`` and retrieved using ``ip_reassembly_get``. + +Defaults +"""""""" + +For defaults values, see #defines in + +`ip4_full_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip4_full_reass.c>`_ + +========================================= ========================================== +#define description +----------------------------------------- ------------------------------------------ +IP4_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds +IP4_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions +IP4_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies +IP4_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly +========================================= ========================================== + +and + +`ip6_full_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip6_full_reass.c>`_ + +========================================= ========================================== +#define description +----------------------------------------- ------------------------------------------ +IP6_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds +IP6_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions +IP6_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies +IP6_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly +========================================= ========================================== + +Finished/expired contexts +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Reassembly contexts are freed either when reassembly is finished - when +all data has been received or in case of timeout. There is a process +walking all reassemblies, freeing any expired ones. + +Shallow (virtual) reassembly +---------------------------- + +Configuration +^^^^^^^^^^^^^ + +Configuration is via API (``ip_reassembly_enable_disable``) only as +there is no value in turning SVR on by hand without a feature consuming +buffer metadata. SVR is designed to be turned on by a feature requiring +it in a programmatic way. + +A show command is provided to see reassembly contexts: + +For ip4: + +``show ip4-sv-reassembly [details]`` + +For ip6: + +``show ip6-sv-reassembly [details]`` + +Global shallow reassembly parameters can be modified using API +``ip_reassembly_set`` and retrieved using ``ip_reassembly_get``. + +Defaults +"""""""" + +For defaults values, see #defines in + +`ip4_sv_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip4_sv_reass.c>`_ + +============================================ ========================================== +#define description +-------------------------------------------- ------------------------------------------ +IP4_SV_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds +IP4_SV_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions +IP4_SV_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies +IP4_SV_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly +============================================ ========================================== + +and + +`ip6_sv_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip6_sv_reass.c>`_ + +============================================ ========================================== +#define description +-------------------------------------------- ------------------------------------------ +IP6_SV_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds +IP6_SV_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions +IP6_SV_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies +IP6_SV_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly +============================================ ========================================== + +Expiring contexts +^^^^^^^^^^^^^^^^^ + +There is no way of knowing when a reassembly is finished without +performing (an almost) full reassembly, so contexts in SVR cannot be +freed in the same way as in full reassembly. Instead a different +approach is taken. Least recently used (LRU) list is maintained where +reassembly contexts are ordered based on last update. The oldest +context is then freed whenever SVR hits limit on number of concurrent +reassembly contexts. There is also a process reaping expired sessions +similar as in full reassembly. + +Truncated packets +^^^^^^^^^^^^^^^^^ + +When SVR detects that a packet has been truncated in a way where L4 +headers are not available, it will mark it as such in vnet_buffer, +allowing downstream features to handle such packets as they deem fit. + +Fast path/slow path +^^^^^^^^^^^^^^^^^^^ + +SVR runs is implemented fast path/slow path way. By default, it assumes +that any passing traffic doesn't contain fragments, processing buffers +in a dual-loop. If it sees a fragment, it then jumps to single-loop +processing. + +Feature enabled by other features/reference counting +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +SVR feature is enabled by some other features, like NAT, when those +features are enabled. For this to work, it implements a reference +counted API for enabling/disabling SVR. -- 2.16.6