Hash tables can be considered a staple in the algorithm cookbooks. This basic ingredient is frequently used to cook up associative arrays and much emphasis had been given to measure the performance of various conflict resolution techniques.
There is certainly a jungle of literature and web-blog content about hash table performance comparisons out in the wild. Along with different comparisons we get many conflicting results. I do not doubt that most of the results are valid but also many comparisons feel like a tasting comparison between apple slices and contrasting it with orange juice.
The main reason why easy comparisons of practical hash table implementations are hard to come by is that the performance test space has more than one dimension, and many good versus bad statements are based on a one dimensional slice of that test space.
So what are the dimensions? Let’s be precise, each dimension represents a discreet variable. Possible dimensions are, the basic operations (insert, look up, update, delete), the order of insert, lookup and delete operations, the number of elements in the table (actually a rescaled variable the load factor is mostly important), memory required, and possibly a few more dimensions.
Well, I have no intentions and also no time to comprehensively spec out the entire test space. Here, I’m interested to get an overview of hash table performances in many popular programming languages. My simple test suite is this:
Insert 10 million integer keys in order, pseudo random order, and probe 10 million keys generated via a 32 bit Galois linear feedback shift register algorithm (GLFSR). I focus on timing and maximum memory measure.
Disclaimer:
Of course, I also run into the apples versus oranges problem, since I compare compiled versus interpreted versus virtual machine based languages. Interpreted languages can have significant timing overheads, adjustment or difference measurements have to be taken into account. Also I’ve observed some “unfairness” in the Java execution: It uses multiple cores, that is cheating!
I took the maximum memory usage only lightheartedly, and didn’t correct for the virtual machine or interpreter overhead. Still, there may be some value in the measurements and surprises!
All tests were performed on a 2.4Ghz Core 2 Duo with 4Gb of RAM, no memory swapping did occur! The clang++ compiler version 2.9 with -Os flag was used for the C++ implementations, this did yield slightly better results than the gcc 4.6 compiler with the same optimization flag.
There is certainly a jungle of literature and web-blog content about hash table performance comparisons out in the wild. Along with different comparisons we get many conflicting results. I do not doubt that most of the results are valid but also many comparisons feel like a tasting comparison between apple slices and contrasting it with orange juice.
The main reason why easy comparisons of practical hash table implementations are hard to come by is that the performance test space has more than one dimension, and many good versus bad statements are based on a one dimensional slice of that test space.
So what are the dimensions? Let’s be precise, each dimension represents a discreet variable. Possible dimensions are, the basic operations (insert, look up, update, delete), the order of insert, lookup and delete operations, the number of elements in the table (actually a rescaled variable the load factor is mostly important), memory required, and possibly a few more dimensions.
Well, I have no intentions and also no time to comprehensively spec out the entire test space. Here, I’m interested to get an overview of hash table performances in many popular programming languages. My simple test suite is this:
Insert 10 million integer keys in order, pseudo random order, and probe 10 million keys generated via a 32 bit Galois linear feedback shift register algorithm (GLFSR). I focus on timing and maximum memory measure.
Disclaimer:
Of course, I also run into the apples versus oranges problem, since I compare compiled versus interpreted versus virtual machine based languages. Interpreted languages can have significant timing overheads, adjustment or difference measurements have to be taken into account. Also I’ve observed some “unfairness” in the Java execution: It uses multiple cores, that is cheating!
I took the maximum memory usage only lightheartedly, and didn’t correct for the virtual machine or interpreter overhead. Still, there may be some value in the measurements and surprises!
All tests were performed on a 2.4Ghz Core 2 Duo with 4Gb of RAM, no memory swapping did occur! The clang++ compiler version 2.9 with -Os flag was used for the C++ implementations, this did yield slightly better results than the gcc 4.6 compiler with the same optimization flag.
Insert 10 million integer keys in order, probe 10 million keys via GLFSR (leads mostly to misses)
Timing for insert and lookup measured in seconds, timing numbers in parentheses are non-adjusted for interpreter overhead, memory is in Mb:
insert: mean | insert: stdv | lookup (misses) | max memory | |
F# Mono 2.0 | 0.58 | 0.33 | 1.9 | 710 |
C++ hash_map | 2.1 | 0.54 | 5 | 1000 |
C++ google::dense_hash_map | 2.1 | 0.3 | 500 | 890 |
C++ google::sparse_hash_map | 3.6 | 1.2 | 2600 | 170 |
Python 2.6 | 4.4 (9) | 0.1 | 6 (11) | 1160 |
C++ std::map | 4.7 | 0.6 | 14 | 930 |
Haskell ghc 7.0.2 -O3 | 5.5 | 1.1 | 6 | 1500-2500 ** |
Perl 5.10 | 9.5 (12.4) | 2 | 1.5 (7.2) | 1250 |
Java1.6 Hashtable | 9.5 | 0.6 | 3 | 1200* |
JRuby1.6 | 10 (12) | 0.6 | 11.3 (17) | 1200* |
ruby 1.8.7 | 19.4 (23) | 1 | 10.8 (50) | 1200 |
* maximum memory is mostly determined by -Xmx1200m parameter, Java also uses 2 cores to perform the operations.
** Data.HashTable in Haskell doesn’t have a clear method, deferred garbage collection may lead to large memory fluctuations
Insert 10 million integer keys (pseudo random order) and probe 10 million keys via GLFSR (leads to no cache misses)
Timing for insert and lookup measured in seconds, timing numbers in parentheses are non-adjusted for interpreter overhead, memory in Mb
insert: mean | insert: stdv | lookup (no miss) | max memory | |
F# Mono 2.0 | 1.9 | 0.01 | 1.8 | 710 |
C++ google::dense_hash_map | 3.9 | 0.3 | 2 | 900 |
Haskell ghc 7.0.2 -O3 | 5.5 | 1.0 | 3 | 1500-2500 ** |
C++ hash_map | 8.6 | 1.3 | 2 | 500 |
Perl 5.10 | 10.2 (16.6) | 1.5 | 1.5 (7.4) | 1250 |
C++ google::sparse_hash_map | 10.4 | 1.6 | 4 | 170 |
Java1.6 Hashtable | 12.5 | 0.3 | 5.3 | 1200* |
Python 2.6 | 15 (20) | 1.8 | 9 (14) | 1250 |
C++ std::map | 20 | 2.1 | 13 | 470 |
JRuby1.6 | 21 (26) | 0.05 | 7.8 (14) | 1500* |
ruby 1.8.7 | 37 (92) | 1.8 | 3.6 (72) | 1200 |
* maximum memory is mostly determined by -Xmx1200m parameter, Java also uses 2 cores to perform the operations.
** Data.HashTable in Haskell doesn’t have a clear method, deferred garbage collection may lead to large memory fluctuations
Conclusion:
Quite frankly, I’m shocked at the Mono Dictionary implementation or even at the Mono virtual machine. I still think there’s something wrong with the measurements, it outperforms everything else by wide margins and the memory consumption isn’t too bad either!!
Perl has a good reputation for associative arrays, it seems to be justified when focussing on the “lookup” operation.
Also, I’m surprised at the Google hash table implementation, the sparse table has an excellent memory footprint, but overall the “lookup” operation suffers drastically with missing keys in the table!
Python and Java are quite good too, however Java clings very much to its allocated memory, hence is eats as much as one allows it.
Overall, I have to say that the various Ruby hash table implementations perform not so well with this test suite. I also tried to run Macruby 0.10 which indirectly would give me some clues on the underlying Cocoa dictionary data structure performance, but it blew up on me. I speculate that this is due to a garbage collection problem.
Finally, I’ve to say that I need to pay closer attention to the Mono platform, it seems to perform really well!
Please feel free to do your own experiments on various platforms, I appreciate your feedback!
I also would have expected that the results of Haskell and F# were similar, but my guess is that there are implementation issues with Haskell’s deferred garbage collection. Haskell’s Data.HashTable doesn’t offer a “Clear” method on its HashTable implementation, hence memory needs to be reallocated for each iteration in the test loop.
Perl has a good reputation for associative arrays, it seems to be justified when focussing on the “lookup” operation.
Also, I’m surprised at the Google hash table implementation, the sparse table has an excellent memory footprint, but overall the “lookup” operation suffers drastically with missing keys in the table!
Python and Java are quite good too, however Java clings very much to its allocated memory, hence is eats as much as one allows it.
Overall, I have to say that the various Ruby hash table implementations perform not so well with this test suite. I also tried to run Macruby 0.10 which indirectly would give me some clues on the underlying Cocoa dictionary data structure performance, but it blew up on me. I speculate that this is due to a garbage collection problem.
Finally, I’ve to say that I need to pay closer attention to the Mono platform, it seems to perform really well!
Please feel free to do your own experiments on various platforms, I appreciate your feedback!
The Haskell Data.HashTable implementation is quite bad. Gregory Collins has written several new hash table implementations that beat Data.HashTable by wide margin. They haven't been released yet though.
ReplyDeleteYou might also want to try Data.HashMap from the unordered-containers package. It's the fastest persistent map implementation we have at the moment.