ip: apply dual loop unrolling in ip4_input 70/21970/4
authorLijian.Zhang <Lijian.Zhang@arm.com>
Mon, 8 Jul 2019 02:33:34 +0000 (10:33 +0800)
committerDamjan Marion <dmarion@me.com>
Wed, 11 Sep 2019 19:20:27 +0000 (19:20 +0000)
commit86b1871ba212064ceb985be4a6b655ebfe2e32f9
tree71d0e9bb6e98a76f79628fdd72f91312b470e30d
parent840f64b4b2d6063adebb8c7b31c9357aaaf8dd5e
ip: apply dual loop unrolling in ip4_input

Too many prefetches within loop unrollings induce bottleneck and
performance degradation on some CPUs which have less cache line fill
buffers, e.g, Arm Cortex-A72.
Apply dual loop unrolling and tune prefetches manually to resolve
hot-spot with prefetch instructions.
It saves about 11.5% cycles with ip4_input node on Cortex-A72 CPUs.

Type: feature

Change-Id: I1ac9eb21061a804af2a414b420217fbcda3689c9
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
src/vnet/ip/ip4_input.c