u8x16_compare_byte_mask - optimize to use 128bit registers as suggested by Nintin