doc/guides/prog_guide/generic_receive_offload_lib.rst

   1 ..  SPDX-License-Identifier: BSD-3-Clause
   2     Copyright(c) 2017 Intel Corporation.
   3
   4 Generic Receive Offload Library
   5 ===============================
   6
   7 Generic Receive Offload (GRO) is a widely used SW-based offloading
   8 technique to reduce per-packet processing overheads. By reassembling
   9 small packets into larger ones, GRO enables applications to process
  10 fewer large packets directly, thus reducing the number of packets to
  11 be processed. To benefit DPDK-based applications, like Open vSwitch,
  12 DPDK also provides own GRO implementation. In DPDK, GRO is implemented
  13 as a standalone library. Applications explicitly use the GRO library to
  14 reassemble packets.
  15
  16 Overview
  17 --------
  18
  19 In the GRO library, there are many GRO types which are defined by packet
  20 types. One GRO type is in charge of process one kind of packets. For
  21 example, TCP/IPv4 GRO processes TCP/IPv4 packets.
  22
  23 Each GRO type has a reassembly function, which defines own algorithm and
  24 table structure to reassemble packets. We assign input packets to the
  25 corresponding GRO functions by MBUF->packet_type.
  26
  27 The GRO library doesn't check if input packets have correct checksums and
  28 doesn't re-calculate checksums for merged packets. The GRO library
  29 assumes the packets are complete (i.e., MF==0 && frag_off==0), when IP
  30 fragmentation is possible (i.e., DF==0). Additionally, it complies RFC
  31 6864 to process the IPv4 ID field.
  32
  33 Currently, the GRO library provides GRO supports for TCP/IPv4 packets and
  34 VxLAN packets which contain an outer IPv4 header and an inner TCP/IPv4
  35 packet.
  36
  37 Two Sets of API
  38 ---------------
  39
  40 For different usage scenarios, the GRO library provides two sets of API.
  41 The one is called the lightweight mode API, which enables applications to
  42 merge a small number of packets rapidly; the other is called the
  43 heavyweight mode API, which provides fine-grained controls to
  44 applications and supports to merge a large number of packets.
  45
  46 Lightweight Mode API
  47 ~~~~~~~~~~~~~~~~~~~~
  48
  49 The lightweight mode only has one function ``rte_gro_reassemble_burst()``,
  50 which process N packets at a time. Using the lightweight mode API to
  51 merge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is
  52 enough. The GROed packets are returned to applications as soon as it
  53 finishes.
  54
  55 In ``rte_gro_reassemble_burst()``, table structures of different GRO
  56 types are allocated in the stack. This design simplifies applications'
  57 operations. However, limited by the stack size, the maximum number of
  58 packets that ``rte_gro_reassemble_burst()`` can process in an invocation
  59 should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``.
  60
  61 Heavyweight Mode API
  62 ~~~~~~~~~~~~~~~~~~~~
  63
  64 Compared with the lightweight mode, using the heavyweight mode API is
  65 relatively complex. Firstly, applications need to create a GRO context
  66 by ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables
  67 structures in the heap and stores their pointers in the GRO context.
  68 Secondly, applications use ``rte_gro_reassemble()`` to merge packets.
  69 If input packets have invalid parameters, ``rte_gro_reassemble()``
  70 returns them to applications. For example, packets of unsupported GRO
  71 types or TCP SYN packets are returned. Otherwise, the input packets are
  72 either merged with the existed packets in the tables or inserted into the
  73 tables. Finally, applications use ``rte_gro_timeout_flush()`` to flush
  74 packets from the tables, when they want to get the GROed packets.
  75
  76 Note that all update/lookup operations on the GRO context are not thread
  77 safe. So if different processes or threads want to access the same
  78 context object simultaneously, some external syncing mechanisms must be
  79 used.
  80
  81 Reassembly Algorithm
  82 --------------------
  83
  84 The reassembly algorithm is used for reassembling packets. In the GRO
  85 library, different GRO types can use different algorithms. In this
  86 section, we will introduce an algorithm, which is used by TCP/IPv4 GRO
  87 and VxLAN GRO.
  88
  89 Challenges
  90 ~~~~~~~~~~
  91
  92 The reassembly algorithm determines the efficiency of GRO. There are two
  93 challenges in the algorithm design:
  94
  95 - a high cost algorithm/implementation would cause packet dropping in a
  96   high speed network.
  97
  98 - packet reordering makes it hard to merge packets. For example, Linux
  99   GRO fails to merge packets when encounters packet reordering.
 100
 101 The above two challenges require our algorithm is:
 102
 103 - lightweight enough to scale fast networking speed
 104
 105 - capable of handling packet reordering
 106
 107 In DPDK GRO, we use a key-based algorithm to address the two challenges.
 108
 109 Key-based Reassembly Algorithm
 110 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 111
 112 :numref:`figure_gro-key-algorithm` illustrates the procedure of the
 113 key-based algorithm. Packets are classified into "flows" by some header
 114 fields (we call them as "key"). To process an input packet, the algorithm
 115 searches for a matched "flow" (i.e., the same value of key) for the
 116 packet first, then checks all packets in the "flow" and tries to find a
 117 "neighbor" for it. If find a "neighbor", merge the two packets together.
 118 If can't find a "neighbor", store the packet into its "flow". If can't
 119 find a matched "flow", insert a new "flow" and store the packet into the
 120 "flow".
 121
 122 .. note::
 123         Packets in the same "flow" that can't merge are always caused
 124         by packet reordering.
 125
 126 The key-based algorithm has two characters:
 127
 128 - classifying packets into "flows" to accelerate packet aggregation is
 129   simple (address challenge 1).
 130
 131 - storing out-of-order packets makes it possible to merge later (address
 132   challenge 2).
 133
 134 .. _figure_gro-key-algorithm:
 135
 136 .. figure:: img/gro-key-algorithm.*
 137    :align: center
 138
 139    Key-based Reassembly Algorithm
 140
 141 TCP/IPv4 GRO
 142 ------------
 143
 144 The table structure used by TCP/IPv4 GRO contains two arrays: flow array
 145 and item array. The flow array keeps flow information, and the item array
 146 keeps packet information.
 147
 148 Header fields used to define a TCP/IPv4 flow include:
 149
 150 - source and destination: Ethernet and IP address, TCP port
 151
 152 - TCP acknowledge number
 153
 154 TCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set
 155 won't be processed.
 156
 157 Header fields deciding if two packets are neighbors include:
 158
 159 - TCP sequence number
 160
 161 - IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should
 162   be increased by 1.
 163
 164 VxLAN GRO
 165 ---------
 166
 167 The table structure used by VxLAN GRO, which is in charge of processing
 168 VxLAN packets with an outer IPv4 header and inner TCP/IPv4 packet, is
 169 similar with that of TCP/IPv4 GRO. Differently, the header fields used
 170 to define a VxLAN flow include:
 171
 172 - outer source and destination: Ethernet and IP address, UDP port
 173
 174 - VxLAN header (VNI and flag)
 175
 176 - inner source and destination: Ethernet and IP address, TCP port
 177
 178 Header fields deciding if packets are neighbors include:
 179
 180 - outer IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
 181   outer IPv4 header is 0, should be increased by 1.
 182
 183 - inner TCP sequence number
 184
 185 - inner IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
 186   inner IPv4 header is 0, should be increased by 1.
 187
 188 .. note::
 189         We comply RFC 6864 to process the IPv4 ID field. Specifically,
 190         we check IPv4 ID fields for the packets whose DF bit is 0 and
 191         ignore IPv4 ID fields for the packets whose DF bit is 1.
 192         Additionally, packets which have different value of DF bit can't
 193         be merged.