Optimize xxx_zero_byte_mask NEON function
Optimize zero byte mask NEON functions below with less intrinsics,
and get their outputs consistent with functions in vector_sse42.h
always_inline u32 u64x2_zero_byte_mask (u64x2 input)
always_inline u32 u32x4_zero_byte_mask (u32x4 input)
always_inline u32 u16x8_zero_byte_mask (u16x8 input)
always_inline u32 u8x16_zero_byte_mask (u8x16 input)
always_inline u32 i64x2_zero_byte_mask (i64x2 input)
always_inline u32 i32x4_zero_byte_mask (i32x4 input)
always_inline u32 i16x8_zero_byte_mask (i16x8 input)
always_inline u32 i8x16_zero_byte_mask (i8x16 input)
Change-Id: I7f485915baeb37fa2dd484699b8769e0136f6574
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>