我想使用__m256i寄存器将整个256位AVX2寄存器右移。移位量是任意的(1-63位),而不是编译时间常数。目前,我可以使用此功能向左移动我的AVX2向量:(进位很有趣,因为它可以用于将多个__m256i向量作为一个向量移动)
inline __m256i sl256i(__m256i* a, int count)
{
__m256i carryOut;
__m256i innerCarry = _mm256_srli_epi64 (*a, 64 - count);
__m256i rotate = _mm256_permute4x64_epi64 (innerCarry, 0x93);
innerCarry = _mm256_blend_epi32 (_mm256_setzero_si256 (), rotate, 0xFC);
*a = _mm256_slli_epi64 (*a, count);
*a = _mm256_or_si256 (*a, innerCarry);
carryOut = _mm256_xor_si256 (innerCarry, rotate);
return carryOut;
}
我想:
inline __m256i sr256i(__m256i* a, int count) { ... }
用法:
sr256i(a, 1);
sr256i(a, 2);
sr256i(a, 60);
输出(示例):
[...] 00000000 00000001 00000000 00000000 (default)
[...] 00000000 00000000 10000000 00000000
[...] 00000000 00000000 00100000 00000000
[...] 00000000 00000000 00000000 00000000
This function was taken from this post but I can't figure out how to adapt it to perform right shift. To what I know, the processor is working on two distinct underlying __m128i vector (2 lanes at the same time) which is why we have to bring up the lane after the bit shift before performing the final OR.
任何的想法 ?现有的职位很多,但没有一个可以正常运行的AVX2右移。
You can use one of the following to shift right using one of these. This one is probably most of the relevant: