-The distributor application consists of three types of threads: a receive
-thread (lcore_rx()), a set of worker threads(lcore_worker())
-and a transmit thread(lcore_tx()). How these threads work together is shown
-in :numref:`figure_dist_app` below. The main() function launches threads of these three types.
-Each thread has a while loop which will be doing processing and which is
-terminated only upon SIGINT or ctrl+C. The receive and transmit threads
-communicate using a software ring (rte_ring structure).
-
-The receive thread receives the packets using rte_eth_rx_burst() and gives
-them to the distributor (using rte_distributor_process() API) which will
-be called in context of the receive thread itself. The distributor distributes
-the packets to workers threads based on the tagging of the packet -
-indicated by the hash field in the mbuf. For IP traffic, this field is
-automatically filled by the NIC with the "usr" hash value for the packet,
-which works as a per-flow tag.
+The distributor application consists of four types of threads: a receive
+thread (``lcore_rx()``), a distributor thread (``lcore_dist()``), a set of
+worker threads (``lcore_worker()``), and a transmit thread(``lcore_tx()``).
+How these threads work together is shown in :numref:`figure_dist_app` below.
+The ``main()`` function launches threads of these four types. Each thread
+has a while loop which will be doing processing and which is terminated
+only upon SIGINT or ctrl+C.
+
+The receive thread receives the packets using ``rte_eth_rx_burst()`` and will
+enqueue them to an rte_ring. The distributor thread will dequeue the packets
+from the ring and assign them to workers (using ``rte_distributor_process()`` API).
+This assignment is based on the tag (or flow ID) of the packet - indicated by
+the hash field in the mbuf. For IP traffic, this field is automatically filled
+by the NIC with the "usr" hash value for the packet, which works as a per-flow
+tag. The distributor thread communicates with the worker threads using a
+cache-line swapping mechanism, passing up to 8 mbuf pointers at a time
+(one cache line) to each worker.