std::lower_bound 对于 std::vector 比 std::map::find 慢

时间：2023-05-08

本文介绍了std::lower_bound 对于 std::vector 比 std::map::find 慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

限时送ChatGPT账号..

我编写了一个类来充当顺序容器的包装器 (std::vector/std::queue/std::list) 具有 std::map 的接口，以提高使用少量小对象时的性能.考虑到已经存在的算法，编码非常简单.这段代码显然是从我的完整代码中高度修剪的，但显示了问题.

I wrote a class to act as a wrapper around a sequential container (std::vector/std::queue/std::list) to have the interface of a std::map, for performance when using small numbers of small objects. The coding was all remarkably simple given the algorithms that already existed. This code is obviously highly trimmed from my full code, but shows the problem.

template <class key_, 
          class mapped_, 
          class traits_ = std::less<key_>,
          class undertype_ = std::vector<std::pair<key_,mapped_> >
         >
class associative
{
public:
    typedef traits_ key_compare;
    typedef key_ key_type;
    typedef mapped_ mapped_type;
    typedef std::pair<const key_type, mapped_type> value_type;
    typedef typename undertype_::allocator_type allocator_type;
    typedef typename allocator_type::template rebind<value_type>::other value_allocator_type;
    typedef typename undertype_::const_iterator const_iterator;

    class value_compare {
        key_compare pred_;
    public:
        inline value_compare(key_compare pred=key_compare()) : pred_(pred) {}
        inline bool operator()(const value_type& left, const value_type& right) const {return pred_(left.first,right.first);}
        inline bool operator()(const value_type& left, const key_type& right) const {return pred_(left.first,right);}
        inline bool operator()(const key_type& left, const value_type& right) const {return pred_(left,right.first);}
        inline bool operator()(const key_type& left, const key_type& right) const {return pred_(left,right);}
        inline key_compare key_comp( ) const {return pred_;}
    };
    class iterator  {
    public:       
        typedef typename value_allocator_type::difference_type difference_type;
        typedef typename value_allocator_type::value_type value_type;
        typedef typename value_allocator_type::reference reference;
        typedef typename value_allocator_type::pointer pointer;
        typedef std::bidirectional_iterator_tag iterator_category;
        inline iterator(const typename undertype_::iterator& rhs) : data(rhs) {}
    inline reference operator*() const { return reinterpret_cast<reference>(*data);}
        inline pointer operator->() const {return reinterpret_cast<pointer>(structure_dereference_operator(data));}
        operator const_iterator&() const {return data;}
    protected:
        typename undertype_::iterator data;
    };

    template<class input_iterator>
    inline associative(input_iterator first, input_iterator last) : internal_(first, last), comp_() 
    {if (std::is_sorted(internal_.begin(), internal_.end())==false) std::sort(internal_.begin(), internal_.end(), comp_);}

inline iterator find(const key_type& key) {
    iterator i = std::lower_bound(internal_.begin(), internal_.end(), key, comp_);
    return (comp_(key,*i) ? internal_.end() : i);
}

protected:
    undertype_ internal_;
    value_compare comp_;
};

SSCCE 在 http://ideone.com/Ufn7r，完整代码在 http://ideone.com/MQr0Z(注意:IdeOne 的结果时间非常不稳定，可能是由于服务器负载，并且不清楚显示有问题的结果)

SSCCE at http://ideone.com/Ufn7r, full code at http://ideone.com/MQr0Z (note: resulting times as IdeOne are highly erratic, probably due to server load, and do not clearly show the results in question)

我使用 std::string 和 4 到 128 字节的 POD 进行了测试，使用 MSVC10 测试了 8 到 2000 个元素.

I tested with std::string, and PODs from 4 to 128 bytes, ranging from 8 to 2000 elements with MSVC10.

我期望 (1) 从一系列小对象创建，(2) 随机插入/擦除少量小对象，以及 (3) 查找所有对象具有更高的性能.令人惊讶的是，向量在从所有测试的范围中创建明显更快，并且根据大小高达约 2048 字节(512 个 4 字节对象，或 128 个 16 字节对象，等等).然而，最令人震惊的是，使用 std::lower_bound 的 std::vector 比 std::map::find 慢> 适用于所有 POD.对于 4 字节和 8 字节 POD，差异很小，但对于 128 字节 POD，std::vector 慢了 36%！然而，对于 std::string，std::vector 平均快了 6%.

I expected higher performance for (1) creating from a range for small objects, (2) random insertion/erasure for small numbers of small objects, and (3) lookup for all objects. Surprisingly, the vector was significantly faster for creating from a range for all tests, and faster for random erasure depending on size up to about 2048 bytes (512 4-byte objects, or 128 16-byte objects, etc). However, most shocking of all, was that the std::vector using std::lower_bound was slower than the std::map::find for all PODs. The difference was miniscule for 4 and 8-byte PODs, and but for 128-byte PODs, std::vector was up to 36% slower! However, for std::string, the std::vector was 6% faster on average.

我觉得 std::lower_bound 在已排序的 std::vector 上应该优于 std::map 由于更好的缓存位置/更小的内存大小，并且由于map 可能不完全平衡，或者在最坏的情况它应该匹配 std::map，但我一生都想不出 std::map 应该更快的任何理由.我唯一的想法是谓词以某种方式减慢了它的速度，但我不知道如何.所以问题是:std::lower_bound 在排序的 std::vector 上怎么会被 std::map(在 MSVC10 中)?

I feel like std::lower_bound on a sorted std::vector should have outperformed std::map due to better cache locality/smaller memory size, and since the map can be imperfectly balanced, or in the worst case it should match std::map, but can't for the life of me think of any reason that std::map should be faster. My only thought is the predicate is somehow slowing it down, but I can't figure out how. So the question is: How could it be that std::lower_bound on a sorted std::vector be outperformed by a std::map (in MSVC10)?

我已经确认 std::vector<std::pair<4BYTEPOD,4BYTEPOD>> 上的 std::lower_bound 使用较少的 comparisons 平均比 std::map<4BYTEPOD,4BYTEPOD>::find(0-0.25)，但我的实现仍然慢了 26%.

I've confirmed that std::lower_bound on std::vector<std::pair<4BYTEPOD,4BYTEPOD>> uses fewer comparisons on average than std::map<4BYTEPOD,4BYTEPOD>::find (by 0-0.25), but my implementation is still up to 26% slower.

[POST-ANSWER-EDIT] 我在 http://ideone.com/41iKt 做了一个 SSCCE删除所有不需要的绒毛，并清楚地表明在排序的 vector 上的 find 比 map 慢约 15%.

[POST-ANSWER-EDIT] I made a SSCCE at http://ideone.com/41iKt that removes all unneeded fluff, and clearly shows that find on the sorted vector is slower than that of the map, by ~15%.

推荐答案

这是一个更有趣的问题！在讨论到目前为止的发现之前，让我指出 associative::find() 函数的行为与 std::map::find() 不同:如果键未找到前者返回下限而后者返回end().为了解决这个问题，associative::find() 需要改变成这样:

This is a somewhat more interesting nut to crack! Before discussing my findings so far, let me point out that the associative::find() function behaves differently to std::map::find(): if the key isn't found the former returns the lower bound while the latter returns end(). To fix this, associative::find() needs to be changed to become something like this:

auto rc = std::lower_bound(this->internal_.begin(), this->internal_.end(), key, this->comp_);
return rc != this->internal_.end() && !this->comp_(key, rc->first)? rc: this->internal_.end();

既然我们更可能将苹果与苹果进行比较(我现在还没有验证逻辑是否真的正确)，让我们继续研究性能.我不太相信用于测试性能的方法是否真的站得住脚，但我现在坚持使用它，我绝对可以提高 associative 容器的性能.我认为我没有完全发现代码中的所有性能问题，但至少取得了一些进展.最重要的是注意到associative中使用的比较函数非常糟糕，因为它不断复制.这使这个容器处于某种劣势.如果您现在正在检查比较器，您可能看不到它，因为看起来这个比较器正在通过引用传递！这个问题实际上相当微妙:底层容器的 value_type 为 std::pair<key_type,mapped_type> 但比较器采用 std::pair<key_type const，mapped_type> 作为参数！解决这个问题似乎给关联容器带来了相当多的性能提升.

Now that we are more likely to compare apples to apples (I haven't verified if the logic is really correct now), let's go on to investigate the performance. I'm not quite convinced that the approach used to test performance really hold water but I'm sticking with it for now and I could definitely improve the performance of the associative container. I don't think I have quite found all performance issues in the code but, at least, made some progress. The biggest is to noticed that the comparison function used in the associative is pretty bad because it keeps making copies. This is putting this container somewhat at a disadvantage. If you are checking the comparator now you probably don't see it because it looks as if this comparator is passing by reference! The issue is actually rather subtle: the underlying container has a value_type of std::pair<key_type, mapped_type> but the comparator is taking std::pair<key_type const, mapped_type> as argument! Fixing this seems to give the associative container quite a bit of a performance boost.

为了实现一个比较器类，它没有机会完全匹配参数，我使用一个简单的帮助器来检测类型是否为 std::pair:

To implement a comparator class which has not opportunity to fail matching the arguments exactly I'm using a simple helper to detect if a type is a std::pair<L, R>:

template <typename>               struct is_pair                  { enum { value = false }; };
template <typename F, typename S> struct is_pair<std::pair<F, S>> { enum { value = true }; };

...然后我用这个替换了比较器，稍微复杂一点，一个:

... and then I replaced the comparator with this, slightly more complicated, one:

class value_compare {
    key_compare pred_;
public:
    inline value_compare(key_compare pred=key_compare()) : pred_(pred) {}
    template <typename L, typename R>
    inline typename std::enable_if<is_pair<L>::value && is_pair<R>::value, bool>::type
    operator()(L const& left, R const& right) const {
        return pred_(left.first,right.first);
    }
    template <typename L, typename R>
    inline typename std::enable_if<is_pair<L>::value && !is_pair<R>::value, bool>::type
    operator()(L const& left, R const& right) const {
        return pred_(left.first,right);
    }
    template <typename L, typename R>
    inline typename std::enable_if<!is_pair<L>::value && is_pair<R>::value, bool>::type
    operator()(L const& left, R const& right) const {
        return pred_(left,right.first);
    }
    template <typename L, typename R>
    inline typename std::enable_if<!is_pair<L>::value && !is_pair<R>::value, bool>::type
    operator()(L const& left, R const& right) const {
        return pred_(left,right);
    }
    inline key_compare key_comp( ) const {return pred_;}
};

这通常会使两种方法更加接近.鉴于我希望 std::vector 和 lower_bound() 方法应该比使用 std::map 我觉得调查还没有结束.


This generally gets the two approaches quite a bit closer together. Given that I would expect that the std::vector<T> with lower_bound() approach should be a lot better than using std::map<K, T> I feel the investigation isn't over, yet.
附录:
重新思考这个练习，我发现为什么我对谓词类的实现感到不舒服:它方式复杂！这可以通过不使用 std::enable_if 进行更改来更简单地完成:这很好地将代码简化为更易于阅读的内容.关键是要拿到钥匙:
Rethinking the exercise a bit more I spotted why I felt uncomfortable with the implementation of the predicate class: it is way to complex! This can be done a lot simpler by not using std::enable_if for a change: this nicely reduces the code to something which is much easier to read. The key is to get the key:
template <typename Key>
Key const& get_key(Key const& value)                  { return value; }
template <typename Key,  typename Value>
Key const& get_key(std::pair<Key, Value> const& pair) { return pair.first; }

通过这个实现从一个值或一对值中获取键"，谓词对象可以只定义一个非常简单的函数调用运算符:
With this implementation to get hold of a "key" from either a value or a pair of values, the predicate object can just define one very simple function call operator:
template <typename L, typename R>
bool operator()(L const& l, R const& r)
{
    return this->pred_(get_key<key_type>(l), get_key<key_type>(r));
}

这也有一个小技巧:需要将预期的 key_type 传递给 get_key() 函数.没有这个，在 key_type 本身就是一个 std::pair 对象的情况下，谓词将不起作用.
There is a slight trick in this as well, though: the expected key_type needs to be passed to the get_key() function. Without this the predicate wouldn't work in cases where the key_type is itself a std::pair<F, S> of objects.

                        这篇关于std::lower_bound 对于 std::vector 比 std::map::find 慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！



上一篇：push_back 是如何在 STL 向量中实现的? 
下一篇：给定起点和终点以及距离，计算沿线的点 

 
相关文章

     
    
向量<无符号字符>vs 二进制数据的字符串
使用布尔值向量是否比动态位集慢?
对象向量上的 C++ remove_if
转换向量&lt;int&gt;串起来
在 C++ 中切片向量
shrink_to_fit 是将 `std::vector` 的容量减少到其大小的正确方法吗?
将 C++ 向量读取和写入文件
在 C++ 中检查向量的所有元素是否相等
对结构向量进行排序
“对"释放 std::vector 对象的方法