SSE2 内在函数 - 比较无符号整数

时间：2023-12-02

本文介绍了SSE2 内在函数 - 比较无符号整数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我有兴趣在添加无符号 8 位整数时识别溢出值，并将结果限制为 0xFF:

I'm interested in identifying overflowing values when adding unsigned 8-bit integers, and clamping the result to 0xFF:

__m128i m1 = _mm_loadu_si128(/* 16 8-bit unsigned integers */);
__m128i m2 = _mm_loadu_si128(/* 16 8-bit unsigned integers */);

__m128i m3 = _mm_adds_epu8(m1, m2);

我有兴趣对小于"进行比较在这些无符号整数上，类似于 _mm_cmplt_epi8 用于有符号:

I would be interested in performing comparison for "less than" on these unsigned integers, similar to _mm_cmplt_epi8 for signed:

__m128i mask = _mm_cmplt_epi8 (m3, m1);
m1 = _mm_or_si128(m3, mask);

如果一个epu8"等效可用，mask 将具有 0xFF 其中 m3[i] (溢出！)，0x00 否则，我们将能够使用或"来钳位 m1，所以 m1 将在有效的地方保存加法结果，在溢出的地方保存 0xFF.


If an "epu8" equivalent was available, mask would have 0xFF where m3[i] < m1[i] (overflow!), 0x00 otherwise, and we would be able to clamp m1 using the "or", so m1 will hold the addition result where valid, and 0xFF where it overflowed.
问题是，_mm_cmplt_epi8 执行有符号比较，例如如果 m1[i] = 0x70 和 m2[i] = 0x10，然后 m3[i] = 0x80 和 mask[i] = 0xFF，这显然不是我需要的.
Problem is, _mm_cmplt_epi8 performs a signed comparison, so for instance if m1[i] = 0x70 and m2[i] = 0x10, then m3[i] = 0x80 and mask[i] = 0xFF, which is obviously not what I require.
使用 VS2012.
我很感激另一种执行此操作的方法.谢谢！
I would appreciate another approach for performing this. Thanks!
推荐答案
实现无符号 8 位向量比较的一种方法是利用 _mm_max_epu8，它返回最大的无符号 8 位 int 元素.您可以比较两个元素的(无符号)最大值与源元素之一是否相等，然后返回适当的结果.这意味着 >= 或 <= 有 2 条指令，> 或 < 有 3 条指令>.
One way of implementing compares for unsigned 8 bit vectors is to exploit _mm_max_epu8, which returns the maximum of unsigned 8 bit int elements. You can compare for equality the (unsigned) maximum value of two elements with one of the source elements and then return the appropriate result. This translates to 2 instructions for >= or <=, and 3 instructions for > or <.
示例代码:
#include <stdio.h>
#include <emmintrin.h>    // SSE2

#define _mm_cmpge_epu8(a, b) 
        _mm_cmpeq_epi8(_mm_max_epu8(a, b), a)

#define _mm_cmple_epu8(a, b) _mm_cmpge_epu8(b, a)

#define _mm_cmpgt_epu8(a, b) 
        _mm_xor_si128(_mm_cmple_epu8(a, b), _mm_set1_epi8(-1))

#define _mm_cmplt_epu8(a, b) _mm_cmpgt_epu8(b, a)

int main(void)
{
    __m128i va = _mm_setr_epi8(0,   0,   1,   1,   1, 127, 127, 127, 128, 128, 128, 254, 254, 254, 255, 255);
    __m128i vb = _mm_setr_epi8(0, 255,   0,   1, 255,   0, 127, 255,   0, 128, 255,   0, 254, 255,   0, 255);

    __m128i v_ge = _mm_cmpge_epu8(va, vb);
    __m128i v_le = _mm_cmple_epu8(va, vb);
    __m128i v_gt = _mm_cmpgt_epu8(va, vb);
    __m128i v_lt = _mm_cmplt_epu8(va, vb);

    printf("va   = %4vhhu
", va);
    printf("vb   = %4vhhu
", vb);
    printf("v_ge = %4vhhu
", v_ge);
    printf("v_le = %4vhhu
", v_le);
    printf("v_gt = %4vhhu
", v_gt);
    printf("v_lt = %4vhhu
", v_lt);

    return 0;
}

编译运行:
$ gcc -Wall _mm_cmplt_epu8.c && ./a.out 
va   =    0    0    1    1    1  127  127  127  128  128  128  254  254  254  255  255
vb   =    0  255    0    1  255    0  127  255    0  128  255    0  254  255    0  255
v_ge =  255    0  255  255    0  255  255    0  255  255    0  255  255    0  255  255
v_le =  255  255    0  255  255    0  255  255    0  255  255    0  255  255    0  255
v_gt =    0    0  255    0    0  255    0    0  255    0    0  255    0    0  255    0
v_lt =    0  255    0    0  255    0    0  255    0    0  255    0    0  255    0    0


                        这篇关于SSE2 内在函数 - 比较无符号整数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！



上一篇：x86 上两个 128 位整数的高效乘法/除法(非 64 位) 
下一篇：我的 x86 目标文件中这些看似无用的 callq 指令是做什么用的? 

 
相关文章

     
    
为什么 std::fill(0) 比 std::fill(1) 慢?
我可以在多核 x86 CPU 上强制缓存一致性吗?
循环展开以使用 Ivy Bridge 和 Haswell 实现最大吞吐量
使用 double 比 float 快吗?
如何使用 SIMD 实现 atoi?
现代硬件上的浮点与整数计算
硬件 SIMD 向量指针和相应类型之间的“reinterpret_cast"是否是未定义的行为?
绘制 8bpp 灰度位图(非托管 C++)
使用抖动将 24 位位图转换为 16 位的好的、优化的 C/C++ 算法是什么?
如何比 SetPixel() 更快地从原始 RGB 值数组直接在屏幕上显示像素?