dpdk: apply dual loop unrolling in DPDK TX 68/21968/3
authorLijian.Zhang <Lijian.Zhang@arm.com>
Thu, 11 Jul 2019 08:44:22 +0000 (16:44 +0800)
committerDamjan Marion <dmarion@me.com>
Wed, 11 Sep 2019 19:20:27 +0000 (19:20 +0000)
commitfe2523d1a42c66ee3ddd594fad1cf5ac91c66c54
tree82998c6aa17601640f0258513ef1efb698c0721f
parent8a1dea4ce6fd0684aef6d0b0843a90658775129d
dpdk: apply dual loop unrolling in DPDK TX

Too many prefetches within loop unrollings induce bottleneck and
performance degradation on some CPUs which have less cache line fill
buffers, e.g, Arm Cortex-A72.
Apply dual loop unrolling and tune prefetches manually to remove
hot-spot with prefetch instructions, to get throughput improvement.
It brings about 1% throughput improvement and saves 8% clocks with
the target node on Cortex-A72.

Type: feature

Change-Id: If3a64a04a77e90cd0240bc4d1186dbb09dac7df0
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
src/plugins/dpdk/device/device.c