An interesting Ruby hash semantics gotcha

Thought this might amuse or perplex some Rubyists (or be useful to know - it’s been the source of a couple of hard-to-track-down bugs in the past).

>> {{} => true}[{}]
=> nil

>> {{} => true, {} => true}
=> {{}=>true, {}=>true}

but yet,

>> {} == {}
true

What’s going on here?

Ruby’s Hashes behave very strangely when you try to use a Hash itself as a key of a Hash.

This acts as a subtle gotcha when you try to memoize a function which takes hash arguments - and so a tricky-to-address bug in libraries like this: http://raa.ruby-lang.org/project/memoize/

Why?

Ruby calls Object#hash on each key of a Hash, using that numeric hash (small h) to allocate that object to a bucket of the underlying hash table data structure. Equality, when it comes to Hash lookups and unique keys of a Hash, will only happen if the keys generate the same numeric hash as a result of their hash methods.

For most ruby data structures, x.hash == y.hash is implied by x == y, and everything works fine.
But, not for Hashes themselves!

(NB. this also affects data structures like Arrays which themselves contain a Hash, since Array#hash must call hash recursively on its contents).

(Interestingly, for things like 1.0 == 1, x.hash == y.hash also fails. Note, x.hash == y.hash is always implied by x.eql?(y), but this equality isn’t a desperately useful one, and seems to have been constructed artificially as an equality for use with Hash which is consistent with .hash)

Why might it have been implemented this way?

Hashes are insensitive to the order of their keys - so, for example, we have:
{:a => true, :b => true} == {:b => true, :a => true}

When you’re actually being given two concrete objects to compare, you can just check that each key from the one has an equal corresponding value in the other, and vice versa.

But, when you’re asked to generate a numeric hash which is constant for the whole equivalence class, you’d have to do something to ensure the hashing isn’t order-sensitive. Like ordering the key/value pairs by their individual hashes before feeding into the hash function.

Some attempts at a fix in the form of a monkey-patched Hash#hash:

(yes, that’s pronounced ‘Hash hash hash’)

  1. Sort key/value pairs by the numeric hash of the pair first:
    class Hash
     def hash
       sort_by {|pair| pair.hash}.hash
     end
    
     def eql?(other)
       self == other
     end
    end
  2. Use an XOR of the hashes of the key/value pairs (XOR is order-insensitive, and should preserve entropy in the bits of the hash)
    class Hash
     def hash
       inject(0) {|hash,pair| hash ^ pair.hash}
     end
    
     def eql?(other)
       self == other
     end
    end

These then fix, eg:

>> {}.hash == {}.hash
=> true

>> {{} => true}[{}]
=> true

>> {{} => true, {} => false}
=> {{}=>false}

(Note, overriding eql? is required to make the last two work - it seems the Hash implementation uses eql? to do the equality comparison that follows the more approximate hash comparison)

Now, I’m sure there’s a reason Matz didn’t do it this way - perhaps a performance reason, perhaps a gotcha that I haven’t noticed with my approach. Perhaps it’ll be fixed in 1.9.
But at any rate, it’s useful to be aware of the issue.

2 Responses to “An interesting Ruby hash semantics gotcha”


  1. 1 Anthony Green

    I love Ruby code examples. I like the whooshing sound they make as they fly by over my head. You lost me at Object#hash.

    From my naive non-software enginneering perspective the first three code examples behave exactly as I would expect:

    I thought {} was synonymous with Hash.new

    Therefore

    {{} => true}[{}]

    generates 2 different hashes with different object_ids and so will return nil

    and its the same with

    {{}=>true, {}=>true}

    the two hashes have two different object_ids so are different keys

    where as

    {} == {}

    compares just the contents of both hashes and returns true.

    There is no === operator for Hashes which I would expect to assert that both objects have the same object_id and the same contents.

  2. 2 matthew

    I thought {} was synonymous with Hash.new

    Therefore

    {{} => true}[{}]

    generates 2 different hashes with different object_ids and so will return nil

    It is Hash.new yes - {{} => true} is a Hash which has one key/value pair - the key is {} (an empty Hash) and the value true.

    [] is then the lookup method for that Hash - so we’re looking up the value corresponding to the key {}. We expect to get true, but it doesn’t find the value, due to {}.hash not being fixed for different instances of an empty hash.

    h = Hash.new
    empty_hash_as_key = {}
    another_empty_hash = {}
    h[empty_hash_as_key] = true

    h[another_empty_hash]
    => nil

    -Matt

Leave a Reply

You must login to post a comment.