Home Home > 2012 > 05 > 24 > Ruby: Why to use symbols as Hash keys ( and why not )
Sign up | Login

Ruby: Why to use symbols as Hash keys ( and why not )

May 24th, 2012 by

I have often read that for hash keys it is better to use symbols than strings. So I was interested why and what is performance impact. It is quite easy to create a test scenario to measure it. The blog post also contains technical explanation and shows potential security problem.
My test scenario is quite easy. Let’s create simple hash and lookup for a key in the hash. Lets have the keys of four different kind: short string, short symbol, long string and long symbol. For measuring I use internal ruby measuring library Benchmark. Here is code:

require "benchmark"

precomputed_string = "Very long string value"*1000 
precomputed_symbol = precomputed_string.to_sym
MAP = {
  "key1" => true,
  :key2 => true,
  precomputed_string => true,
  precomputed_symbol => true
}
Benchmark.bm(20) do |x|
  x.report("string") do
    10000000.times { MAP["key1"] }
  end
  x.report("symbol") do
    10000000.times { MAP[:key2] }
  end
  x.report("long string/100") do
    100000.times { MAP[precomputed_string] }
  end
  x.report("long symbol") do
    10000000.times { MAP[precomputed_symbol] }
  end
end

Please note that for long string key I’m using less iterations, because it would be too. And here is result from my machine:


string                4.360000   0.000000   4.360000 (  4.365123)
symbol                2.870000   0.000000   2.870000 (  2.868708)
long string/100       8.460000   0.000000   8.460000 (  8.471581)
long symbol           2.890000   0.000000   2.890000 (  2.884652)

As you can see, even for short string it is faster to use symbol then string. For longer symbol keys, the time does not grow, so the speed of hash lookup doesn’t depend on key length. As you can see, the situation is different for string keys.
Why it is? The reason is hidden in the hash implementation. Hash uses a hashing function for the lookup ( ted mI agree that it is little confusing to name in ruby Map as Hash). Symbols have this value “precomputed”, but for string you need to compute it again for whole string. For symbol its hash value is simple object_id which never changes, but string have different object for each instance ( string is not immutable like in java ), so to compare if two strings have same hash you need to compute it. Short demonstration about object_id difference:


"test".object_id
"test".object_id
:test.object_id
:test.object_id

So should you use symbol always? There is one disadvantage. To keep symbol value always same (in one ruby process), unused symbol is not removed during run of garbage collector. Here’s the code that demonstrates it:


#for string
def test val
  map = {}
  1000.times do |i|
    value = val*(i+1)
    map[value] = true
  end
  return nil
end

100.times do |i|
  test "test#{i}"
  GC.start
end
puts `cat /proc/#{$$}/status | grep 'VmSize:'`

#for symbol
def test val
  map = {}
  1000.times do |i|
    value = val*(i+1)
    map[value.to_sym] = true
  end
end

100.times do |i|
  test "test#{i}"
  GC.start
end
puts `cat /proc/#{$$}/status | grep 'VmSize:'`

My results:


String: VmSize:	   24856 kB
Symbol: VmSize:	  343324 kB

So it is a trade-off between memory and speed. It is very important for long running tasks to have control about what is stored in symbols. Consider this code snapshot for long running server:


#get option value
VALUE_TO_DB_MAP = { :external => 1, :internal => 2, :both => 3 }
def update params
  db_value = VALUE_TO_DB_MAP[params[:option1].to_sym]
end

And now consider what happens if attacker sends there non-friendly long string. He can easily cause DOS from one machine.
I welcome any questions or suggestions in your comments.

Both comments and pings are currently closed.

2 Responses to “Ruby: Why to use symbols as Hash keys ( and why not )”

  1. Nikos

    Hey Josef, great post! Regarding your comment on DOS how will this be possible? Do you mean that this would happen because of a long symbol stored in memory? In that case are you sure garbage collection will not handle this between requests?

    • Anonymous

      Nikos: yes, it is stored in memory when first time occur. If you look at my code I run explicitly garbage collection (GC.start after each call of test). Only solution what how can server side prevent this is to run new ruby process for each request and then exit it ( of course I don’t mention correct solution to not translate unknown string to symbol ).