<Ruby> SetとArrayのinclude?の違い

updated_at: 2023-04-30

概要

先日社内のPRコードレビューを受けて、Array#include?よりSet#include?の方が高速だということを知った
その時にソースを追ってみたまとめ

要約

  • Array#include?は計算量がO(n)
  • Set#include?は計算量がO(1)


Array#include?

VALUE
rb_ary_includes(VALUE ary, VALUE item)
{
  long i;
  VALUE e;
  for (i=0; i<RARRAY_LEN(ary); i++) {
    e = RARRAY_AREF(ary, i);
    if (rb_equal(e, item)) {
      return Qtrue;
    }
  }
  return Qfalse;
}

https://github.com/ruby/ruby/blob/master/array.c#L5417-L5430

/**
 * This function is an optimised version of calling `#==`. It checks equality
 * between two objects by first doing a fast identity check using using C's
 * `==` (same as `BasicObject#equal?`). If that check fails, it calls `#==`
 * dynamically.  This optimisation actually affects semantics, because when
 * `#==` returns false for the same object obj, `rb_equal(obj, obj)` would
 * still return true.  This happens for `Float::NAN`, where `Float::NAN ==
 * Float::NAN` is `false`, but `rb_equal(Float::NAN, Float::NAN)` is `true`.
 *
 * @param[in] lhs     Comparison LHS.
 * @param[in] rhs     Comparison RHS.
 * @retval   RUBY_Qtrue  They are the same.
 * @retval   RUBY_Qfalse They are different.
 */
VALUE rb_equal(VALUE lhs, VALUE rhs);

https://github.com/ruby/ruby/blob/master/include/ruby/ruby.h#L164-L178
説明によるとオブジェクト同士の比較ということ

 describe "rb_equal" do
  it "returns true if the arguments are the same exact object" do
   s = "hello"
   @o.rb_equal(s, s).should be_true
  end
  it "calls == to check equality and coerces to true/false" do
   m = mock("string")
   m.should_receive(:==).and_return(8)
   @o.rb_equal(m, "hello").should be_true
   m2 = mock("string")
   m2.should_receive(:==).and_return(nil)
   @o.rb_equal(m2, "hello").should be_false
  end
 end

https://github.com/ruby/ruby/blob/master/spec/ruby/optional/capi/object_spec.rb#L764-L779
念の為テストも確認して同値比較メソッドであることを確認

つまりArray#include?は1つずつ同値比較をしているので計算量がO(n)

Set#include?

  # Returns true if the set contains the given object.
  #
  # Note that <code>include?</code> and <code>member?</code> do not test member
  # equality using <code>==</code> as do other Enumerables.
  #
  # See also Enumerable#include?
  def include?(o)
    @hash[o]
  end
  alias member? include?

https://github.com/ruby/ruby/blob/master/lib/set.rb#L397-L406

hashでのアクセスにより O(1)の計算量を実現していることが分かった
ちなみにSetはイニシャライズでhashとして扱われている

 def initialize(enum = nil, &block) # :yields: o
  @hash ||= Hash.new(false)
  enum.nil? and return
  if block
   do_with_enum(enum) { |o| add(block[o]) }
  else
   merge(enum)
  end
 end

https://github.com/ruby/ruby/blob/master/lib/set.rb#L245-L255