如何右移AVX2寄存器?

我想使用__m256i寄存器将整个256位AVX2寄存器右移。移位量是任意的(1-63位),而不是编译时间常数。目前,我可以使用此功能向左移动我的AVX2向量:(进位很有趣,因为它可以用于将多个__m256i向量作为一个向量移动)

inline __m256i sl256i(__m256i* a, int count)
{
        __m256i carryOut;
        __m256i innerCarry = _mm256_srli_epi64 (*a, 64 - count);             
        __m256i rotate     = _mm256_permute4x64_epi64 (innerCarry, 0x93);  
        innerCarry = _mm256_blend_epi32 (_mm256_setzero_si256 (), rotate, 0xFC);     
        *a = _mm256_slli_epi64 (*a, count);                                             
        *a = _mm256_or_si256 (*a, innerCarry);                                        
        carryOut   = _mm256_xor_si256 (innerCarry, rotate);                            
        return carryOut;
}

我想:

inline __m256i sr256i(__m256i* a, int count) { ... }

用法:

sr256i(a, 1);
sr256i(a, 2);
sr256i(a, 60);

输出(示例):

[...] 00000000 00000001 00000000 00000000 (default)
[...] 00000000 00000000 10000000 00000000
[...] 00000000 00000000 00100000 00000000
[...] 00000000 00000000 00000000 00000000

This function was taken from this post but I can't figure out how to adapt it to perform right shift. To what I know, the processor is working on two distinct underlying __m128i vector (2 lanes at the same time) which is why we have to bring up the lane after the bit shift before performing the final OR.

任何的想法 ?现有的职位很多,但没有一个可以正常运行的AVX2右移。