Is rcx aligned to 32 bytes?
movdqa xmm, m128 requires 16 byte alignment but
vmovdqa ymm, m256 requires 32 byte alignment, so if you just port the code to AVX2 without increasing the alignment, it won’t work.
Either increase the alignment to 32 byte or use
vmovdqu to sidestep all alignment issues instead. Contrary to SSE instructions, memory operands to AVX instructions generally do not have alignment requirements (
vmovdqa is one of the few exceptions). It is still a good idea to align your input data if possible as memory accesses crossing cache lines incur extra penalties.
CLICK HERE to find out more related problems solutions.