listのsizeについて - y

Effective STLによるとlist::sizeはリストのサイズに比例した時間がかかることがあるそうです(実装次第らしい)。
実際にg++で試したところ、たしかにそのようになっていました。

もしそうだとリスト(lst)のサイズが大きい時に、
lst.size() < 3
などの呼び出しが非常に遅くなるのではないかと思いました。

そこでこれを改善するために、次のようなものを用意すればいいのでは、という提案

#include <iostream>

template<class T>
class size_tmp{
private:
  const T &target_;

  // 代入等を禁止
  size_tmp(const size_tmp &);
  size_tmp operator = (const size_tmp &);
  size_tmp(const T &t) : target_(t) {}
  template<class S> friend size_tmp<S> size(const S &); // size 関数を通してのみ作れる
public:
  const T &target() const{ return target_; }
  operator std::size_t () { return target_.size(); }
};

template<class T>
bool operator < (const size_tmp<T> &lhs, std::size_t rhs){
  for(typename T::const_iterator it = lhs.target().begin(); rhs > 0 && it != lhs.target().end(); ++it, --rhs);
  return rhs != 0;
}

template<class T>
size_tmp<T> size(const T &t){ return size_tmp<T>(t); }

// 以下ベンチマーク

#define ORIGINAL 0
#define MY_IMPL  1

#include <list>
#include <vector>

#include <boost/timer.hpp>
#include <boost/format.hpp>
#include <boost/assert.hpp>

template<class T>
void bench(const T &lst, const char *str, int which, unsigned int bound, unsigned int iter = 10000){
  const unsigned int expected = (lst.size() < bound ? iter : 0);

  {
    boost::timer t;
    int s = 0;

    if(which == ORIGINAL) for(int i = 0; i < iter; i++) { if(lst.size() < bound) s++; }
    else                  for(int i = 0; i < iter; i++) { if(size(lst)  < bound) s++; }

    BOOST_ASSERT( s == expected );

    std::cout << boost::format("%s %.3f") % str % t.elapsed() << std::endl;
  }
}

int main(){
#ifndef USE_VECTOR
  typedef std::list<int> Container;
#else
  typedef std::vector<int> Container;
#endif

  Container lst1;
  Container lst2;

  const unsigned int small = 1000;   // 小さいサイズ
  const unsigned int large = 100000; // 大きいサイズ

  // 適当な要素を入れたリストを作る
  srand( time(NULL) );
  for(int i = 0; i < small; i++) lst1.push_back( rand() % 1000 );
  for(int i = 0; i < large; i++) lst2.push_back( rand() % 1000 );

  // 普通にサイズを取得できる
  std::cout << "Small size: " << size(lst1) << std::endl; BOOST_ASSERT(size(lst1) == small);
  std::cout << "Large size: " << size(lst2) << std::endl; BOOST_ASSERT(size(lst2) == large);

  std::cout << std::endl;

  // 普通のメンバ関数 size を使う
  std::cout << "== Original implimentation (compare to small number) ==" << std::endl;
  bench(lst1, "Small list: ", ORIGINAL, 200);
  bench(lst2, "Large list: ", ORIGINAL, 200);

  std::cout << std::endl;

  std::cout << "== Original implimentation (compare to large number) ==" << std::endl;
  bench(lst1, "Small list: ", ORIGINAL, 1000000);
  bench(lst2, "Large list: ", ORIGINAL, 1000000);

  std::cout << std::endl;

  // 今回定義した size 関数を使う
  std::cout << "== My implimentation (compare to small number) ==" << std::endl;
  bench(lst1, "Small list: ", MY_IMPL, 200);
  bench(lst2, "Large list: ", MY_IMPL, 200);

  std::cout << std::endl;

  std::cout << "== My implimentation (compare to large number) ==" << std::endl;
  bench(lst1, "Small list: ", MY_IMPL, 1000000);
  bench(lst2, "Large list: ", MY_IMPL, 1000000);

  return 0;
}

(ベンチマークの部分で、boostを使ってますが、メインとなる部分は普通のC++)
コアとなる考え方は、size関数でなんかオブジェクトつくっといて、それと数値の比較は特殊な処理を行う、というもの。
特殊な処理はIteratorを使ってサイズ徐々にを計算していき、結果をわかり次第返すというもの。
これはサンプルなので、operator < しか実装してません。
ベンチマークとしては、複数回それぞれのsizeを呼び出して、時間を測っています。
実際にこのプログラムを動かすと(最適化オプションは-O2)、

Small size: 1000
Large size: 100000

== Original implimentation (compare to small number) ==
Small list: 0.020
Large list: 1.580

== Original implimentation (compare to large number) ==
Small list: 0.010
Large list: 1.560

== My implimentation (compare to small number) ==
Small list: 0.000
Large list: 0.010

== My implimentation (compare to large number) ==
Small list: 0.020
Large list: 1.610

という感じになります。
Original implimentationは、list::sizeを使った物、My implimentationは、今回作ったsize関数を使った物です。
差があるのは、大きいリストを使って、小さい値と比較したものです。
list::sizeを使うと1.56秒かかっているのに対して、size関数を使うと0.01秒となっています。
これは大分早くなっているので有用かもしれません。

最後にいうのもあれですが、listのサイズをそんなに頻繁に計算することがあるのかというのは自分でも疑問です。

参考

上の実装は、Iteratorを備えたコンテナならなんでも使えます。
ただしlist以外のものは定数時間でサイズを計算できるはずなので、すごく遅くなります。
vectorで試したところ(-DUSE_VECTORを付けてコンパイル)こうなりました。

Small size: 1000
Large size: 100000

== Original implimentation (compare to small number) ==
Small list: 0.000
Large list: 0.000

== Original implimentation (compare to large number) ==
Small list: 0.000
Large list: 0.000

== My implimentation (compare to small number) ==
Small list: 0.000
Large list: 0.000

== My implimentation (compare to large number) ==
Small list: 0.020
Large list: 0.730

まあ当然ですが、大きいvectorに対しては今回作ったsize(lst)では相当遅くなってますね。