doc/guides/sample_app_ug/multi_process.rst

   1 ..  SPDX-License-Identifier: BSD-3-Clause
   2     Copyright(c) 2010-2014 Intel Corporation.
   3
   4 .. _multi_process_app:
   5
   6 Multi-process Sample Application
   7 ================================
   8
   9 This chapter describes the example applications for multi-processing that are included in the DPDK.
  10
  11 Example Applications
  12 --------------------
  13
  14 Building the Sample Applications
  15 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  16 The multi-process example applications are built in the same way as other sample applications,
  17 and as documented in the *DPDK Getting Started Guide*.
  18
  19
  20 To compile the sample application see :doc:`compiling`.
  21
  22 The applications are located in the ``multi_process`` sub-directory.
  23
  24 .. note::
  25
  26     If just a specific multi-process application needs to be built,
  27     the final make command can be run just in that application's directory,
  28     rather than at the top-level multi-process directory.
  29
  30 Basic Multi-process Example
  31 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  32
  33 The examples/simple_mp folder in the DPDK release contains a basic example application to demonstrate how
  34 two DPDK processes can work together using queues and memory pools to share information.
  35
  36 Running the Application
  37 ^^^^^^^^^^^^^^^^^^^^^^^
  38
  39 To run the application, start one copy of the simple_mp binary in one terminal,
  40 passing at least two cores in the coremask/corelist, as follows:
  41
  42 .. code-block:: console
  43
  44     ./build/simple_mp -l 0-1 -n 4 --proc-type=primary
  45
  46 For the first DPDK process run, the proc-type flag can be omitted or set to auto,
  47 since all DPDK processes will default to being a primary instance,
  48 meaning they have control over the hugepage shared memory regions.
  49 The process should start successfully and display a command prompt as follows:
  50
  51 .. code-block:: console
  52
  53     $ ./build/simple_mp -l 0-1 -n 4 --proc-type=primary
  54     EAL: coremask set to 3
  55     EAL: Detected lcore 0 on socket 0
  56     EAL: Detected lcore 1 on socket 0
  57     EAL: Detected lcore 2 on socket 0
  58     EAL: Detected lcore 3 on socket 0
  59     ...
  60
  61     EAL: Requesting 2 pages of size 1073741824
  62     EAL: Requesting 768 pages of size 2097152
  63     EAL: Ask a virtual area of 0x40000000 bytes
  64     EAL: Virtual area found at 0x7ff200000000 (size = 0x40000000)
  65     ...
  66
  67     EAL: check igb_uio module
  68     EAL: check module finished
  69     EAL: Master core 0 is ready (tid=54e41820)
  70     EAL: Core 1 is ready (tid=53b32700)
  71
  72     Starting core 1
  73
  74     simple_mp >
  75
  76 To run the secondary process to communicate with the primary process,
  77 again run the same binary setting at least two cores in the coremask/corelist:
  78
  79 .. code-block:: console
  80
  81     ./build/simple_mp -l 2-3 -n 4 --proc-type=secondary
  82
  83 When running a secondary process such as that shown above, the proc-type parameter can again be specified as auto.
  84 However, omitting the parameter altogether will cause the process to try and start as a primary rather than secondary process.
  85
  86 Once the process type is specified correctly,
  87 the process starts up, displaying largely similar status messages to the primary instance as it initializes.
  88 Once again, you will be presented with a command prompt.
  89
  90 Once both processes are running, messages can be sent between them using the send command.
  91 At any stage, either process can be terminated using the quit command.
  92
  93 .. code-block:: console
  94
  95    EAL: Master core 10 is ready (tid=b5f89820)           EAL: Master core 8 is ready (tid=864a3820)
  96    EAL: Core 11 is ready (tid=84ffe700)                  EAL: Core 9 is ready (tid=85995700)
  97    Starting core 11                                      Starting core 9
  98    simple_mp > send hello_secondary                      simple_mp > core 9: Received 'hello_secondary'
  99    simple_mp > core 11: Received 'hello_primary'         simple_mp > send hello_primary
 100    simple_mp > quit                                      simple_mp > quit
 101
 102 .. note::
 103
 104     If the primary instance is terminated, the secondary instance must also be shut-down and restarted after the primary.
 105     This is necessary because the primary instance will clear and reset the shared memory regions on startup,
 106     invalidating the secondary process's pointers.
 107     The secondary process can be stopped and restarted without affecting the primary process.
 108
 109 How the Application Works
 110 ^^^^^^^^^^^^^^^^^^^^^^^^^
 111
 112 The core of this example application is based on using two queues and a single memory pool in shared memory.
 113 These three objects are created at startup by the primary process,
 114 since the secondary process cannot create objects in memory as it cannot reserve memory zones,
 115 and the secondary process then uses lookup functions to attach to these objects as it starts up.
 116
 117 .. code-block:: c
 118
 119     if (rte_eal_process_type() == RTE_PROC_PRIMARY){
 120         send_ring = rte_ring_create(_PRI_2_SEC, ring_size, SOCKET0, flags);
 121         recv_ring = rte_ring_create(_SEC_2_PRI, ring_size, SOCKET0, flags);
 122         message_pool = rte_mempool_create(_MSG_POOL, pool_size, string_size, pool_cache, priv_data_sz, NULL, NULL, NULL, NULL, SOCKET0, flags);
 123     } else {
 124         recv_ring = rte_ring_lookup(_PRI_2_SEC);
 125         send_ring = rte_ring_lookup(_SEC_2_PRI);
 126         message_pool = rte_mempool_lookup(_MSG_POOL);
 127     }
 128
 129 Note, however, that the named ring structure used as send_ring in the primary process is the recv_ring in the secondary process.
 130
 131 Once the rings and memory pools are all available in both the primary and secondary processes,
 132 the application simply dedicates two threads to sending and receiving messages respectively.
 133 The receive thread simply dequeues any messages on the receive ring, prints them,
 134 and frees the buffer space used by the messages back to the memory pool.
 135 The send thread makes use of the command-prompt library to interactively request user input for messages to send.
 136 Once a send command is issued by the user, a buffer is allocated from the memory pool, filled in with the message contents,
 137 then enqueued on the appropriate rte_ring.
 138
 139 Symmetric Multi-process Example
 140 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 141
 142 The second example of DPDK multi-process support demonstrates how a set of processes can run in parallel,
 143 with each process performing the same set of packet- processing operations.
 144 (Since each process is identical in functionality to the others,
 145 we refer to this as symmetric multi-processing, to differentiate it from asymmetric multi- processing -
 146 such as a client-server mode of operation seen in the next example,
 147 where different processes perform different tasks, yet co-operate to form a packet-processing system.)
 148 The following diagram shows the data-flow through the application, using two processes.
 149
 150 .. _figure_sym_multi_proc_app:
 151
 152 .. figure:: img/sym_multi_proc_app.*
 153
 154    Example Data Flow in a Symmetric Multi-process Application
 155
 156
 157 As the diagram shows, each process reads packets from each of the network ports in use.
 158 RSS is used to distribute incoming packets on each port to different hardware RX queues.
 159 Each process reads a different RX queue on each port and so does not contend with any other process for that queue access.
 160 Similarly, each process writes outgoing packets to a different TX queue on each port.
 161
 162 Running the Application
 163 ^^^^^^^^^^^^^^^^^^^^^^^
 164
 165 As with the simple_mp example, the first instance of the symmetric_mp process must be run as the primary instance,
 166 though with a number of other application- specific parameters also provided after the EAL arguments.
 167 These additional parameters are:
 168
 169 *   -p <portmask>, where portmask is a hexadecimal bitmask of what ports on the system are to be used.
 170     For example: -p 3 to use ports 0 and 1 only.
 171
 172 *   --num-procs <N>, where N is the total number of symmetric_mp instances that will be run side-by-side to perform packet processing.
 173     This parameter is used to configure the appropriate number of receive queues on each network port.
 174
 175 *   --proc-id <n>, where n is a numeric value in the range 0 <= n < N (number of processes, specified above).
 176     This identifies which symmetric_mp instance is being run, so that each process can read a unique receive queue on each network port.
 177
 178 The secondary symmetric_mp instances must also have these parameters specified,
 179 and the first two must be the same as those passed to the primary instance, or errors result.
 180
 181 For example, to run a set of four symmetric_mp instances, running on lcores 1-4,
 182 all performing level-2 forwarding of packets between ports 0 and 1,
 183 the following commands can be used (assuming run as root):
 184
 185 .. code-block:: console
 186
 187     # ./build/symmetric_mp -l 1 -n 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=0
 188     # ./build/symmetric_mp -l 2 -n 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1
 189     # ./build/symmetric_mp -l 3 -n 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=2
 190     # ./build/symmetric_mp -l 4 -n 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=3
 191
 192 .. note::
 193
 194     In the above example, the process type can be explicitly specified as primary or secondary, rather than auto.
 195     When using auto, the first process run creates all the memory structures needed for all processes -
 196     irrespective of whether it has a proc-id of 0, 1, 2 or 3.
 197
 198 .. note::
 199
 200     For the symmetric multi-process example, since all processes work in the same manner,
 201     once the hugepage shared memory and the network ports are initialized,
 202     it is not necessary to restart all processes if the primary instance dies.
 203     Instead, that process can be restarted as a secondary,
 204     by explicitly setting the proc-type to secondary on the command line.
 205     (All subsequent instances launched will also need this explicitly specified,
 206     as auto-detection will detect no primary processes running and therefore attempt to re-initialize shared memory.)
 207
 208 How the Application Works
 209 ^^^^^^^^^^^^^^^^^^^^^^^^^
 210
 211 The initialization calls in both the primary and secondary instances are the same for the most part,
 212 calling the rte_eal_init(), 1 G and 10 G driver initialization and then rte_pci_probe() functions.
 213 Thereafter, the initialization done depends on whether the process is configured as a primary or secondary instance.
 214
 215 In the primary instance, a memory pool is created for the packet mbufs and the network ports to be used are initialized -
 216 the number of RX and TX queues per port being determined by the num-procs parameter passed on the command-line.
 217 The structures for the initialized network ports are stored in shared memory and
 218 therefore will be accessible by the secondary process as it initializes.
 219
 220 .. code-block:: c
 221
 222     if (num_ports & 1)
 223        rte_exit(EXIT_FAILURE, "Application must use an even number of ports\n");
 224
 225     for(i = 0; i < num_ports; i++){
 226         if(proc_type == RTE_PROC_PRIMARY)
 227             if (smp_port_init(ports[i], mp, (uint16_t)num_procs) < 0)
 228                 rte_exit(EXIT_FAILURE, "Error initializing ports\n");
 229     }
 230
 231 In the secondary instance, rather than initializing the network ports, the port information exported by the primary process is used,
 232 giving the secondary process access to the hardware and software rings for each network port.
 233 Similarly, the memory pool of mbufs is accessed by doing a lookup for it by name:
 234
 235 .. code-block:: c
 236
 237     mp = (proc_type == RTE_PROC_SECONDARY) ? rte_mempool_lookup(_SMP_MBUF_POOL) : rte_mempool_create(_SMP_MBUF_POOL, NB_MBUFS, MBUF_SIZE, ... )
 238
 239 Once this initialization is complete, the main loop of each process, both primary and secondary,
 240 is exactly the same - each process reads from each port using the queue corresponding to its proc-id parameter,
 241 and writes to the corresponding transmit queue on the output port.
 242
 243 Client-Server Multi-process Example
 244 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 245
 246 The third example multi-process application included with the DPDK shows how one can
 247 use a client-server type multi-process design to do packet processing.
 248 In this example, a single server process performs the packet reception from the ports being used and
 249 distributes these packets using round-robin ordering among a set of client  processes,
 250 which perform the actual packet processing.
 251 In this case, the client applications just perform level-2 forwarding of packets by sending each packet out on a different network port.
 252
 253 The following diagram shows the data-flow through the application, using two client processes.
 254
 255 .. _figure_client_svr_sym_multi_proc_app:
 256
 257 .. figure:: img/client_svr_sym_multi_proc_app.*
 258
 259    Example Data Flow in a Client-Server Symmetric Multi-process Application
 260
 261
 262 Running the Application
 263 ^^^^^^^^^^^^^^^^^^^^^^^
 264
 265 The server process must be run initially as the primary process to set up all memory structures for use by the clients.
 266 In addition to the EAL parameters, the application- specific parameters are:
 267
 268 *   -p <portmask >, where portmask is a hexadecimal bitmask of what ports on the system are to be used.
 269     For example: -p 3 to use ports 0 and 1 only.
 270
 271 *   -n <num-clients>, where the num-clients parameter is the number of client processes that will process the packets received
 272     by the server application.
 273
 274 .. note::
 275
 276     In the server process, a single thread, the master thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
 277     If a coremask/corelist is specified with more than a single lcore bit set in it,
 278     an additional lcore will be used for a thread to periodically print packet count statistics.
 279
 280 Since the server application stores configuration data in shared memory, including the network ports to be used,
 281 the only application parameter needed by a client process is its client instance ID.
 282 Therefore, to run a server application on lcore 1 (with lcore 2 printing statistics) along with two client processes running on lcores 3 and 4,
 283 the following commands could be used:
 284
 285 .. code-block:: console
 286
 287     # ./mp_server/build/mp_server -l 1-2 -n 4 -- -p 3 -n 2
 288     # ./mp_client/build/mp_client -l 3 -n 4 --proc-type=auto -- -n 0
 289     # ./mp_client/build/mp_client -l 4 -n 4 --proc-type=auto -- -n 1
 290
 291 .. note::
 292
 293     If the server application dies and needs to be restarted, all client applications also need to be restarted,
 294     as there is no support in the server application for it to run as a secondary process.
 295     Any client processes that need restarting can be restarted without affecting the server process.
 296
 297 How the Application Works
 298 ^^^^^^^^^^^^^^^^^^^^^^^^^
 299
 300 The server process performs the network port and data structure initialization much as the symmetric multi-process application does when run as primary.
 301 One additional enhancement in this sample application is that the server process stores its port configuration data in a memory zone in hugepage shared memory.
 302 This eliminates the need for the client processes to have the portmask parameter passed into them on the command line,
 303 as is done for the symmetric multi-process application, and therefore eliminates mismatched parameters as a potential source of errors.
 304
 305 In the same way that the server process is designed to be run as a primary process instance only,
 306 the client processes are designed to be run as secondary instances only.
 307 They have no code to attempt to create shared memory objects.
 308 Instead, handles to all needed rings and memory pools are obtained via calls to rte_ring_lookup() and rte_mempool_lookup().
 309 The network ports for use by the processes are obtained by loading the network port drivers and probing the PCI bus,
 310 which will, as in the symmetric multi-process example,
 311 automatically get access to the network ports using the settings already configured by the primary/server process.
 312
 313 Once all applications are initialized, the server operates by reading packets from each network port in turn and
 314 distributing those packets to the client queues (software rings, one for each client process) in round-robin order.
 315 On the client side, the packets are read from the rings in as big of bursts as possible, then routed out to a different network port.
 316 The routing used is very simple. All packets received on the first NIC port are transmitted back out on the second port and vice versa.
 317 Similarly, packets are routed between the 3rd and 4th network ports and so on.
 318 The sending of packets is done by writing the packets directly to the network ports; they are not transferred back via the server process.
 319
 320 In both the server and the client processes, outgoing packets are buffered before being sent,
 321 so as to allow the sending of multiple packets in a single burst to improve efficiency.
 322 For example, the client process will buffer packets to send,
 323 until either the buffer is full or until we receive no further packets from the server.