NEON vs Intel SSE - 某些操作的等效

我在計算幾個Intel SSE操作的NEON等效性時遇到了一些問題。看來NEON不能一次處理整個Q寄存器（128位值數據類型）。我在arm_neon.h頭文件或NEON intrinsics reference中沒有找到任何東西。NEON vs Intel SSE - 某些操作的等效

我想要做的是以下幾點：

// Intel SSE 
// shift the entire 128 bit value with 2 bytes to the right; this is done 
// without sign extension by shifting in zeros 
__m128i val = _mm_srli_si128(vector_of_8_s16, 2); 
// insert the least significant 16 bits of "some_16_bit_val" 
// the whole thing in this case, into the selected 16 bit 
// integer of vector "val"(the 16 bit element with index 7 in this case) 
val = _mm_insert_epi16(val, some_16_bit_val, 7);

我看了由NEON提供的換檔操作，但無法找到做上述的等效方法（我沒有太多的經驗與NEON）。是否有可能做到上述（我猜這是我只是不知道如何）？非常感謝任何指針。

來源

2011-08-26 celavek

您需要VEXT指令。你的榜樣看起來是這樣的：

int16x8_t val = vextq_s16(vector_of_8_s16, another_vector_s16, 1);

在此之後，位的val 0-111將包含16-127的vector_of_8_s16位，位的val 112-127將包含位another_vector_s16 0-15。

來源

2011-08-27 14:53:31

我實際上已經這樣實現了。你能否提供一個例子來驗證我的方法？ – celavek

刪除了我關於vtbl和vtbx的回答。 vext是要走的路！ –

@celavek：我提供了一個例子，但驗證方法的方法是測試它，而不是通過將其與示例進行比較。它可以工作，也可以不工作。 –

NEON vs Intel SSE - 某些操作的等效

回答

相關問題