使用 double 比 float 快吗?

时间：2023-12-02

本文介绍了使用 double 比 float 快吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

双值存储更高的精度并且是浮点数的两倍，但英特尔 CPU 是否针对浮点数进行了优化?

Double values store higher precision and are double the size of a float, but are Intel CPUs optimized for floats?

也就是说，对于 +、-、* 和/而言，双重运算是否与浮点运算一样快或更快?

That is, are double operations just as fast or faster than float operations for +, -, *, and /?

64 位架构的答案会改变吗?

Does the answer change for 64-bit architectures?

推荐答案

没有一个intel CPU"，尤其是在哪些操作方面相对于其他人进行了优化！级别(特别是在 FPU 内)，是您问题的答案:

There isn't a single "intel CPU", especially in terms of what operations are optimized with respect to others!, but most of them, at CPU level (specifically within the FPU), are such that the answer to your question:

是双重操作一样快或比 +、-、的浮点运算更快*, 和/?

are double operations just as fast or faster than float operations for +, -, *, and /?

是是"；-- 在CPU内，除了double 比 float 慢一些.(假设您的编译器使用 SSE2 进行标量 FP 数学运算，就像所有 x86-64 编译器一样，以及一些 32 位编译器取决于选项.传统 x87 在寄存器中没有不同的宽度，仅在内存中(它在加载/存储时转换))，所以从历史上看，对于 double 来说，即使是 sqrt 和除法也同样慢.

is "yes" -- within the CPU, except for division and sqrt which are somewhat slower for double than for float. (Assuming your compiler uses SSE2 for scalar FP math, like all x86-64 compilers do, and some 32-bit compilers depending on options. Legacy x87 doesn't have different widths in registers, only in memory (it converts on load/store), so historically even sqrt and division were just as slow for double).

例如，Haswell 的 divsd 吞吐量为每 8 到 14 个周期一个(取决于数据)，但 divss(单标量)吞吐量为每 7 个周期一个循环.x87 fdiv 是 8 到 18 个周期的吞吐量.(来自 https://agner.org/optimize/ 的数字.延迟与除法的吞吐量相关，但更高比吞吐量数字.)

For example, Haswell has a divsd throughput of one per 8 to 14 cycles (data-dependent), but a divss (scalar single) throughput of one per 7 cycles. x87 fdiv is 8 to 18 cycle throughput. (Numbers from https://agner.org/optimize/. Latency correlates with throughput for division, but is higher than the throughput numbers.)

logf(float) 和 sinf(float) 等许多库函数的 float 版本也会更快strong> 比 log(double) 和 sin(double)，因为它们的精度要少得多.他们可以使用具有较少项的多项式近似来获得 float 与 double

The float versions of many library functions like logf(float) and sinf(float) will also be faster than log(double) and sin(double), because they have many fewer bits of precision to get right. They can use polynomial approximations with fewer terms to get full precision for float vs. double

然而，每个数字占用两倍的内存显然意味着缓存负载更重，内存带宽更多来填充和溢出这些缓存行/到内存；当您执行大量这样的操作时，您关心浮点运算的性能，因此内存和缓存注意事项至关重要.

However, taking up twice the memory for each number clearly implies heavier load on the cache(s) and more memory bandwidth to fill and spill those cache lines from/to RAM; the time you care about performance of a floating-point operation is when you're doing a lot of such operations, so the memory and cache considerations are crucial.

@Richard 的回答指出还有其他方法可以执行 FP 操作(SSE/SSE2 指令；旧的 MMX 仅是整数)，特别适用于大量数据(SIMD"，单指令/多数据)的简单操作，其中每个向量寄存器可以打包 4 个单精度浮点数或仅2个双精度，这样效果会更显着.

@Richard's answer points out that there are also other ways to perform FP operations (the SSE / SSE2 instructions; good old MMX was integers-only), especially suitable for simple ops on lot of data ("SIMD", single instruction / multiple data) where each vector register can pack 4 single-precision floats or only 2 double-precision ones, so this effect will be even more marked.

最后，您确实必须进行基准测试，但我的预测是，对于合理的(即大;-)基准测试，您会发现坚持使用单精度的优势(当然假设您不需要需要额外的精度！-).

In the end, you do have to benchmark, but my prediction is that for reasonable (i.e., large;-) benchmarks, you'll find advantage to sticking with single precision (assuming of course that you don't need the extra bits of precision!-).

这篇关于使用 double 比 float 快吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

上一篇：如何使用 SIMD 实现 atoi? 下一篇：循环展开以使用 Ivy Bridge 和 Haswell 实现最大吞吐量

相关文章

现代硬件上的浮点与整数计算

硬件 SIMD 向量指针和相应类型之间的“reinterpret_cast"是否是未定义的行为?

绘制 8bpp 灰度位图(非托管 C++)

使用抖动将 24 位位图转换为 16 位的好的、优化的 C/C++ 算法是什么?

如何比 SetPixel() 更快地从原始 RGB 值数组直接在屏幕上显示像素?

高效获取windows桌面截图

C++ gdi::Bitmap to PNG Image in memory

从内存缓冲区创建 HBITMAP

是否可以将 C++ 小部件嵌入 PyQt 应用程序?

是否有一个相当于 NullPointerException 的 C++