Ruby 2.4 got released this Christmas with a lot of exciting features. One of the most underrated features in ruby 2.4 is hash table improvements. Before going into details about implementation, let’s first check the benchmark to know how this change gonna affect your ruby application. Some benchmarks are:
Getting keys and values of a hash
h = {}
10000.times do |i|
h[i] = nil
end
# Get all hash values
Benchmark.measure { 50000.times { h.values } }
# Get all hash keys
Benchmark.measure { 50000.times { h.keys } }
Output
Ruby 2.3.3
=> #<Benchmark::Tms:0x00000002a0f990 @label="", @real=2.8023432340005456, @cstime=0.0, @cutime=0.0, @stime=0.13000000000000012, @utime=2.6799999999999997, @total=2.8099999999999996> #<Benchmark::Tms:0x00000002963398 @label="", @real=2.767410662000657, @cstime=0.0, @cutime=0.0, @stime=0.029999999999999805, @utime=2.729999999999997, @total=2.7599999999999967>ruby 2.4.0
#<Benchmark::Tms:0x0000000226d700 @label="", @real=0.8854832770002758, @cstime=0.0, @cutime=0.0, @stime=0.08999999999999997, @utime=0.7999999999999998, @total=0.8899999999999998> #<Benchmark::Tms:0x000000022b1018 @label="", @real=0.8476084579997405, @cstime=0.0, @cutime=0.0, @stime=0.06999999999999995, @utime=0.7799999999999994, @total=0.8499999999999993>Yeah, the above two operations executed ~ 3 times faster on my laptop. Though these numbers can vary with your machine and processor, the performance improvements will be significant on all modern processors. Not all operations became 3 times faster , average perfomence improvement is more than 50% If you are a ruby developer and excited to know what are the new features in ruby 2.4, then this feature gonna make your application faster and you don’t have to change anything in the code for that. Because these changes are backward compatible. If you are curious to know what happened behind the scenes of this performance boost, then continue reading.
Hash Table
In computing, hash table (hash map) is a data structure that is used to implement an associative array, a structure that can map keys to values. Hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. WikipediaIn other words, it is a data structure that allows you to store key value pair and helps to fetch specific data using the key in an efficient way. Unlike arrays, you don’t have to iterate through all elements to find a given element in the hash. If you are new to this data structure, check this cs50 video for a better understanding.
hash = {key1: value1, key2: value2}That is cool right! Now, let’s see how this is made possible in ruby.
Pre 2.4
Let’s first check how ruby implemented Hash in pre 2.4
sample_hash = {a: 10, b: 20, c: 30, d: 40, e: 50}
Let’s take keys :c and :d
Step 1:
First thing ruby does is it will take the hash of the key using the internal hash function.
2.3.1 :075 > :c.hash => 2782 2.3.1 :076 > :d.hash => 2432Step 2: After getting the hash value, result ?it takes modulo of 11 to get figure in which bin ruby can store the given pair
2.3.1 :073 > :c.hash % 11 => 10 2.3.1 :074 > :d.hash % 11 => 1This means we can put :c => 30 in 10’th bin and :d in 1st bin Step 3 What if there are multiple modulo operations that give the result? This is called hash collision. To resolve this, ruby uses a separate chaining approach i.e, it will make a doubly linked list and add it to the existing value in the bin. Step 4 What if the hash is too large ?? Linked list will start growing and will make the hash slower. So, ruby will allocate more bins and do an operation called Rehashing to utilise newly added bins. Improvements in 2.0 In ruby 2.0, Ruby eliminated the need for extra data structures for smaller hashes and uses linear search for better performance. Improvements in 2.2 In 2.2.0, Ruby has used bin array sizes that correspond to powers of two (16, 32, 64, 128, …).
Changes in 2.4
[caption id="attachment_1629" align="aligncenter" width="300"]
sample_hash = {a: 10, b: 20, c: 30, d: 40, e: 50}
Here, ruby will create two internal arrays, entries and bins as shown below
entries = [[2297,a,10], [451,b,20], [2782,c,30], [2432,d,40],[1896,e,50]]
each record in entries array will contain a hash, key, and value respectively
Default bin size in ruby is 16 so Bins array for the above hash will be somewhat like this
bins = [
3,
nil,
nil,
nil,
1,
nil,
nil,
nil,
5,
0,
nil,
nil,
nil,
nil,
2,
nil
]
Now what if we want to fetch an element from hash, say ‘c’
sample_hash[:c]
Step 1:
Find the hash using ruby’s internal hash function
:c.hash
2782
Step 2
Find the location in bin array by using find bin method
find_bin(2782)Step 3 Find the location entries array using bin array
bins[14] => 2Step 4. Find the entry using the key we got
entries[2] => [2782,c,30]Now we have value for the key ‘c’ without iterating through all the records
The article proves that the benchmark presented at the beginning is misleading and suggests invalid conclusions. Since in 2.4 Hash stores list of its entries in sequential array, so the `hash.values` operation is immediate — just need to copy the array. So there is significant performance gain comparing to 2.3, but it has nothing to do with performance of usual hash operations: insert, find, delete. Results of the presented benchmark cannot be used to predict performance gain of other hash operations.
Thanks for your inputs , 3 times performance improvement for accessing keys and values are from my laptop , it can vary with the processor you have . My point was to show that some hash operations became upto 3 times time faster , not all operations became 3 times faster . I will update the post with other benchmarks to avoid confusions .