l2: performance enhancement in l2input
Short Load/Stores combined with prefetching in the beginning of the loop
place too much pressure on AGUs and memory accesses.
The patch interleaves load/store operations with computational operations
to alleviate the pain point.
vlib_get_buffers is also leveraged.
Redefine u8 dst_and_src[12] instead of dst[6] and src[6] in struct
l2input_trace_t in order to merge two copys into one.
Type: improvement
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Change-Id: I7d3df7732c476069235e3019c68f0f53bca9637e