Three separate implementations, which vary by nearly a factor of two
in performance. Most of the performance difference is due to swapping
the src/dst mac addresses with an avx2 vector shuffle instruction.
Change-Id: Ieb36546d6074e4ac720d452a99d013c698135c57 Signed-off-by: Dave Barach <dave@barachs.net>