ip: apply dual loop unrolling in ip4_rewrite 69/21969/3
authorLijian.Zhang <Lijian.Zhang@arm.com>
Tue, 9 Jul 2019 09:54:32 +0000 (17:54 +0800)
committerDamjan Marion <dmarion@me.com>
Wed, 11 Sep 2019 19:20:27 +0000 (19:20 +0000)
commit840f64b4b2d6063adebb8c7b31c9357aaaf8dd5e
tree34af2303319245845dd701953d51681850d9d84b
parentfe2523d1a42c66ee3ddd594fad1cf5ac91c66c54
ip: apply dual loop unrolling in ip4_rewrite

Too many prefetches within loop unrollings induce bottleneck and
performance degradation on some CPUs which have less cache line fill
buffers, e.g, Arm Cortex-A72.
Apply dual loop unrolling and tune prefetches manually to remove
hot-spot with prefetch instructions, to get throughput improvement.
It brings about 7% throughput improvement and saves 28% clocks with
ip4_rewrite nodes on Cortex-A72 CPUs.

Type: feature

Change-Id: I0d35ef19faccbd7a5a4647f50bc369bfcb01a20d
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
src/vnet/ip/ip4_forward.c