Difference between Python and Ruby when it comes to hashes with default values

Having worked with Python for a while, I am trying to pick up Ruby, especially for some of my work with logstash. While trying out a small program in Ruby, I got stumped with a peculiar trait of Ruby hashes with default values. It made me lose an hour of my life I am not going to get back. :(

In Python, dictionaries with default values is not part of the core language, and need to be imported from a standard library. For some reason I still don’t know, you cannot set a simple default value, you need supply a function object which is going to create a value for you. :/

So if I want to create a default dictionary in Python, you need to do something like this:

>>> from collections import defaultdict

>>> d = defaultdict(lambda: 5)

Now when you access any non-existent key in this hash, the hash gets magically populated with that key and the default value. See below.

>>> if d["one"]: print "not empty"
... 
not empty
>>>
>>> print dict(d)
{'one': 5}
>>>

Ruby hashes support default values in the core language. So you can do something like:

>> d = Hash.new(5)
=> {}

>> puts d["a"]
5
=> nil

>> d
=> {}

Wait! See the difference? Evaluating a non-existing hash position is Ruby returns the default value, but unlike Python, it doesn’t set the value!

After it drilled down to this quirk, I looked around and found some interesting articles.

This article from 2008 by David Black was particularly interesting. He points out in this article how this Hash behaviour breaks a very popular Ruby idiom.

Normally, the idiomatic Ruby way to initialize or return a variable goes like this:

>> d = Hash.new
=> {}

>> d["name"] ||= "Skye"
=> "Skye"

>> d
=> {"name"=>"Skye"}

However, if you use a Hash with a default value, it breaks this idiom because evaluating a non-existing key returns the default value. However, you would intuitively expect the ||= operator to at least set the missing key to the default value! But that doesn’t happen due to the peculiar treatment to that operator by Ruby. Ruby evaluates A ||= B NOT to A = A || B, but A || A=B. This never lets the value to be assigned to keys in default hashes.

>> d = Hash.new("May")
=> {}

>> d["name"] ||= "Skye"
=> "May"

>> d
=> {}

Some gotcha lurking there in an otherwise beautiful language.

Another nice ref: http://www.rubyinside.com/what-rubys-double-pipe-or-equals-really-does-5488.html

pythonprogramming
Pagerduty's fantastic Zookeeper bug Serializing structured data into Avro using Python