doc/guides/sample_app_ug/server_node_efd.rst

   1 ..  SPDX-License-Identifier: BSD-3-Clause
   2     Copyright(c) 2016-2017 Intel Corporation.
   3
   4 Server-Node EFD Sample Application
   5 ==================================
   6
   7 This sample application demonstrates the use of EFD library as a flow-level
   8 load balancer, for more information about the EFD Library please refer to the
   9 DPDK programmer's guide.
  10
  11 This sample application is a variant of the
  12 :ref:`client-server sample application <multi_process_app>`
  13 where a specific target node is specified for every and each flow
  14 (not in a round-robin fashion as the original load balancing sample application).
  15
  16 Overview
  17 --------
  18
  19 The architecture of the EFD flow-based load balancer sample application is
  20 presented in the following figure.
  21
  22 .. _figure_efd_sample_app_overview:
  23
  24 .. figure:: img/server_node_efd.*
  25
  26    Using EFD as a Flow-Level Load Balancer
  27
  28 As shown in :numref:`figure_efd_sample_app_overview`,
  29 the sample application consists of a front-end node (server)
  30 using the EFD library to create a load-balancing table for flows,
  31 for each flow a target backend worker node is specified. The EFD table does not
  32 store the flow key (unlike a regular hash table), and hence, it can
  33 individually load-balance millions of flows (number of targets * maximum number
  34 of flows fit in a flow table per target) while still fitting in CPU cache.
  35
  36 It should be noted that although they are referred to as nodes, the frontend
  37 server and worker nodes are processes running on the same platform.
  38
  39 Front-end Server
  40 ~~~~~~~~~~~~~~~~
  41
  42 Upon initializing, the frontend server node (process) creates a flow
  43 distributor table (based on the EFD library) which is populated with flow
  44 information and its intended target node.
  45
  46 The sample application assigns a specific target node_id (process) for each of
  47 the IP destination addresses as follows:
  48
  49 .. code-block:: c
  50
  51     node_id = i % num_nodes; /* Target node id is generated */
  52     ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is
  53                                      assigned to this target node */
  54
  55 then the pair of <key,target> is inserted into the flow distribution table.
  56
  57 The main loop of the server process receives a burst of packets, then for
  58 each packet, a flow key (IP destination address) is extracted. The flow
  59 distributor table is looked up and the target node id is returned.  Packets are
  60 then enqueued to the specified target node id.
  61
  62 It should be noted that flow distributor table is not a membership test table.
  63 I.e. if the key has already been inserted the target node id will be correct,
  64 but for new keys the flow distributor table will return a value (which can be
  65 valid).
  66
  67 Backend Worker Nodes
  68 ~~~~~~~~~~~~~~~~~~~~
  69
  70 Upon initializing, the worker node (process) creates a flow table (a regular
  71 hash table that stores the key default size 1M flows) which is populated with
  72 only the flow information that is serviced at this node. This flow key is
  73 essential to point out new keys that have not been inserted before.
  74
  75 The worker node's main loop is simply receiving packets then doing a hash table
  76 lookup. If a match occurs then statistics are updated for flows serviced by
  77 this node. If no match is found in the local hash table then this indicates
  78 that this is a new flow, which is dropped.
  79
  80
  81 Compiling the Application
  82 -------------------------
  83
  84 To compile the sample application see :doc:`compiling`.
  85
  86 The application is located in the ``server_node_efd`` sub-directory.
  87
  88 Running the Application
  89 -----------------------
  90
  91 The application has two binaries to be run: the front-end server
  92 and the back-end node.
  93
  94 The frontend server (server) has the following command line options::
  95
  96     ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS
  97
  98 Where,
  99
 100 * ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure
 101 * ``-n NUM_NODES:`` Number of back-end nodes that will be used
 102 * ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default)
 103
 104 The back-end node (node) has the following command line options::
 105
 106     ./node [EAL options] -- -n NODE_ID
 107
 108 Where,
 109
 110 * ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES
 111
 112
 113 First, the server app must be launched, with the number of nodes that will be run.
 114 Once it has been started, the node instances can be run, with different NODE_ID.
 115 These instances have to be run as secondary processes, with ``--proc-type=secondary``
 116 in the EAL options, which will attach to the primary process memory, and therefore,
 117 they can access the queues created by the primary process to distribute packets.
 118
 119 To successfully run the application, the command line used to start the
 120 application has to be in sync with the traffic flows configured on the traffic
 121 generator side.
 122
 123 For examples of application command lines and traffic generator flows, please
 124 refer to the DPDK Test Report. For more details on how to set up and run the
 125 sample applications provided with DPDK package, please refer to the
 126 :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and
 127 :ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`.
 128
 129
 130 Explanation
 131 -----------
 132
 133 As described in previous sections, there are two processes in this example.
 134
 135 The first process, the front-end server, creates and populates the EFD table,
 136 which is used to distribute packets to nodes, which the number of flows
 137 specified in the command line (1 million, by default).
 138
 139
 140 .. code-block:: c
 141
 142     static void
 143     create_efd_table(void)
 144     {
 145         uint8_t socket_id = rte_socket_id();
 146
 147         /* create table */
 148         efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t),
 149                         1 << socket_id, socket_id);
 150
 151         if (efd_table == NULL)
 152             rte_exit(EXIT_FAILURE, "Problem creating the flow table\n");
 153     }
 154
 155     static void
 156     populate_efd_table(void)
 157     {
 158         unsigned int i;
 159         int32_t ret;
 160         uint32_t ip_dst;
 161         uint8_t socket_id = rte_socket_id();
 162         uint64_t node_id;
 163
 164         /* Add flows in table */
 165         for (i = 0; i < num_flows; i++) {
 166             node_id = i % num_nodes;
 167
 168             ip_dst = rte_cpu_to_be_32(i);
 169             ret = rte_efd_update(efd_table, socket_id,
 170                             (void *)&ip_dst, (efd_value_t)node_id);
 171             if (ret < 0)
 172                 rte_exit(EXIT_FAILURE, "Unable to add entry %u in "
 173                                     "EFD table\n", i);
 174         }
 175
 176         printf("EFD table: Adding 0x%x keys\n", num_flows);
 177     }
 178
 179 After initialization, packets are received from the enabled ports, and the IPv4
 180 address from the packets is used as a key to look up in the EFD table,
 181 which tells the node where the packet has to be distributed.
 182
 183 .. code-block:: c
 184
 185     static void
 186     process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[],
 187             uint16_t rx_count, unsigned int socket_id)
 188     {
 189         uint16_t i;
 190         uint8_t node;
 191         efd_value_t data[EFD_BURST_MAX];
 192         const void *key_ptrs[EFD_BURST_MAX];
 193
 194         struct ipv4_hdr *ipv4_hdr;
 195         uint32_t ipv4_dst_ip[EFD_BURST_MAX];
 196
 197         for (i = 0; i < rx_count; i++) {
 198             /* Handle IPv4 header.*/
 199             ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *,
 200                     sizeof(struct ether_hdr));
 201             ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
 202             key_ptrs[i] = (void *)&ipv4_dst_ip[i];
 203         }
 204
 205         rte_efd_lookup_bulk(efd_table, socket_id, rx_count,
 206                     (const void **) key_ptrs, data);
 207         for (i = 0; i < rx_count; i++) {
 208             node = (uint8_t) ((uintptr_t)data[i]);
 209
 210             if (node >= num_nodes) {
 211                 /*
 212                  * Node is out of range, which means that
 213                  * flow has not been inserted
 214                  */
 215                 flow_dist_stats.drop++;
 216                 rte_pktmbuf_free(pkts[i]);
 217             } else {
 218                 flow_dist_stats.distributed++;
 219                 enqueue_rx_packet(node, pkts[i]);
 220             }
 221         }
 222
 223         for (i = 0; i < num_nodes; i++)
 224             flush_rx_queue(i);
 225     }
 226
 227 The burst of packets received is enqueued in temporary buffers (per node),
 228 and enqueued in the shared ring between the server and the node.
 229 After this, a new burst of packets is received and this process is
 230 repeated infinitely.
 231
 232 .. code-block:: c
 233
 234     static void
 235     flush_rx_queue(uint16_t node)
 236     {
 237         uint16_t j;
 238         struct node *cl;
 239
 240         if (cl_rx_buf[node].count == 0)
 241             return;
 242
 243         cl = &nodes[node];
 244         if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
 245                 cl_rx_buf[node].count, NULL) != cl_rx_buf[node].count){
 246             for (j = 0; j < cl_rx_buf[node].count; j++)
 247                 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
 248             cl->stats.rx_drop += cl_rx_buf[node].count;
 249         } else
 250             cl->stats.rx += cl_rx_buf[node].count;
 251
 252         cl_rx_buf[node].count = 0;
 253     }
 254
 255 The second process, the back-end node, receives the packets from the shared
 256 ring with the server and send them out, if they belong to the node.
 257
 258 At initialization, it attaches to the server process memory, to have
 259 access to the shared ring, parameters and statistics.
 260
 261 .. code-block:: c
 262
 263     rx_ring = rte_ring_lookup(get_rx_queue_name(node_id));
 264     if (rx_ring == NULL)
 265         rte_exit(EXIT_FAILURE, "Cannot get RX ring - "
 266                 "is server process running?\n");
 267
 268     mp = rte_mempool_lookup(PKTMBUF_POOL_NAME);
 269     if (mp == NULL)
 270         rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n");
 271
 272     mz = rte_memzone_lookup(MZ_SHARED_INFO);
 273     if (mz == NULL)
 274         rte_exit(EXIT_FAILURE, "Cannot get port info structure\n");
 275     info = mz->addr;
 276     tx_stats = &(info->tx_stats[node_id]);
 277     filter_stats = &(info->filter_stats[node_id]);
 278
 279 Then, the hash table that contains the flows that will be handled
 280 by the node is created and populated.
 281
 282 .. code-block:: c
 283
 284     static struct rte_hash *
 285     create_hash_table(const struct shared_info *info)
 286     {
 287         uint32_t num_flows_node = info->num_flows / info->num_nodes;
 288         char name[RTE_HASH_NAMESIZE];
 289         struct rte_hash *h;
 290
 291         /* create table */
 292         struct rte_hash_parameters hash_params = {
 293             .entries = num_flows_node * 2, /* table load = 50% */
 294             .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */
 295             .socket_id = rte_socket_id(),
 296             .hash_func_init_val = 0,
 297         };
 298
 299         snprintf(name, sizeof(name), "hash_table_%d", node_id);
 300         hash_params.name = name;
 301         h = rte_hash_create(&hash_params);
 302
 303         if (h == NULL)
 304             rte_exit(EXIT_FAILURE,
 305                     "Problem creating the hash table for node %d\n",
 306                     node_id);
 307         return h;
 308     }
 309
 310     static void
 311     populate_hash_table(const struct rte_hash *h, const struct shared_info *info)
 312     {
 313         unsigned int i;
 314         int32_t ret;
 315         uint32_t ip_dst;
 316         uint32_t num_flows_node = 0;
 317         uint64_t target_node;
 318
 319         /* Add flows in table */
 320         for (i = 0; i < info->num_flows; i++) {
 321             target_node = i % info->num_nodes;
 322             if (target_node != node_id)
 323                 continue;
 324
 325             ip_dst = rte_cpu_to_be_32(i);
 326
 327             ret = rte_hash_add_key(h, (void *) &ip_dst);
 328             if (ret < 0)
 329                 rte_exit(EXIT_FAILURE, "Unable to add entry %u "
 330                         "in hash table\n", i);
 331             else
 332                 num_flows_node++;
 333
 334         }
 335
 336         printf("Hash table: Adding 0x%x keys\n", num_flows_node);
 337     }
 338
 339 After initialization, packets are dequeued from the shared ring
 340 (from the server) and, like in the server process,
 341 the IPv4 address from the packets is used as a key to look up in the hash table.
 342 If there is a hit, packet is stored in a buffer, to be eventually transmitted
 343 in one of the enabled ports. If key is not there, packet is dropped, since the
 344 flow is not handled by the node.
 345
 346 .. code-block:: c
 347
 348     static inline void
 349     handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets)
 350     {
 351         struct ipv4_hdr *ipv4_hdr;
 352         uint32_t ipv4_dst_ip[PKT_READ_SIZE];
 353         const void *key_ptrs[PKT_READ_SIZE];
 354         unsigned int i;
 355         int32_t positions[PKT_READ_SIZE] = {0};
 356
 357         for (i = 0; i < num_packets; i++) {
 358             /* Handle IPv4 header.*/
 359             ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *,
 360                     sizeof(struct ether_hdr));
 361             ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
 362             key_ptrs[i] = &ipv4_dst_ip[i];
 363         }
 364         /* Check if packets belongs to any flows handled by this node */
 365         rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions);
 366
 367         for (i = 0; i < num_packets; i++) {
 368             if (likely(positions[i] >= 0)) {
 369                 filter_stats->passed++;
 370                 transmit_packet(bufs[i]);
 371             } else {
 372                 filter_stats->drop++;
 373                 /* Drop packet, as flow is not handled by this node */
 374                 rte_pktmbuf_free(bufs[i]);
 375             }
 376         }
 377     }
 378
 379 Finally, note that both processes updates statistics, such as transmitted, received
 380 and dropped packets, which are shown and refreshed by the server app.
 381
 382 .. code-block:: c
 383
 384     static void
 385     do_stats_display(void)
 386     {
 387         unsigned int i, j;
 388         const char clr[] = {27, '[', '2', 'J', '\0'};
 389         const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'};
 390         uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS];
 391         uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES];
 392
 393         /* to get TX stats, we need to do some summing calculations */
 394         memset(port_tx, 0, sizeof(port_tx));
 395         memset(port_tx_drop, 0, sizeof(port_tx_drop));
 396         memset(node_tx, 0, sizeof(node_tx));
 397         memset(node_tx_drop, 0, sizeof(node_tx_drop));
 398
 399         for (i = 0; i < num_nodes; i++) {
 400             const struct tx_stats *tx = &info->tx_stats[i];
 401
 402             for (j = 0; j < info->num_ports; j++) {
 403                 const uint64_t tx_val = tx->tx[info->id[j]];
 404                 const uint64_t drop_val = tx->tx_drop[info->id[j]];
 405
 406                 port_tx[j] += tx_val;
 407                 port_tx_drop[j] += drop_val;
 408                 node_tx[i] += tx_val;
 409                 node_tx_drop[i] += drop_val;
 410             }
 411         }
 412
 413         /* Clear screen and move to top left */
 414         printf("%s%s", clr, topLeft);
 415
 416         printf("PORTS\n");
 417         printf("-----\n");
 418         for (i = 0; i < info->num_ports; i++)
 419             printf("Port %u: '%s'\t", (unsigned int)info->id[i],
 420                     get_printable_mac_addr(info->id[i]));
 421         printf("\n\n");
 422         for (i = 0; i < info->num_ports; i++) {
 423             printf("Port %u - rx: %9"PRIu64"\t"
 424                     "tx: %9"PRIu64"\n",
 425                     (unsigned int)info->id[i], info->rx_stats.rx[i],
 426                     port_tx[i]);
 427         }
 428
 429         printf("\nSERVER\n");
 430         printf("-----\n");
 431         printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n",
 432                 flow_dist_stats.distributed, flow_dist_stats.drop);
 433
 434         printf("\nNODES\n");
 435         printf("-------\n");
 436         for (i = 0; i < num_nodes; i++) {
 437             const unsigned long long rx = nodes[i].stats.rx;
 438             const unsigned long long rx_drop = nodes[i].stats.rx_drop;
 439             const struct filter_stats *filter = &info->filter_stats[i];
 440
 441             printf("Node %2u - rx: %9llu, rx_drop: %9llu\n"
 442                     "            tx: %9"PRIu64", tx_drop: %9"PRIu64"\n"
 443                     "            filter_passed: %9"PRIu64", "
 444                     "filter_drop: %9"PRIu64"\n",
 445                     i, rx, rx_drop, node_tx[i], node_tx_drop[i],
 446                     filter->passed, filter->drop);
 447         }
 448
 449         printf("\n");
 450     }