Patrick Smacchia [MVP C#]

Sponsors

The Lounge

Wicked Cool Jobs

News

  • NDepend v3 is fully integrated in Visual Studio, and is now available for download! Software dependencies visualization, 82 .NET software metrics, continuous rule validations, assembly version diff, declarative code queries and more ! http://ndepend.com

Advertisement

Images in this post missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at imagehelp@codebetter.com
Comparing Hash Table Implementations Performances

 
One of my preferred programming paradigm is hash table. I consider almost like magic being able to test if a collection contains a particular object, independently from the size of the collection (hence the term constant time often used to characterize hash tables). Personally, I use hash tables intensively and it is the cornerstone of all VisualNDepend and Code Query Language (CQL) implementation performances. Performances are here critical since the goal of NDepend is to let .NET developers knows anything about their code bases in real-time, even on millions lines of code application that spawns on multiple Visual Studio solutions (anything about their code bases include architecture/dependencies/layering facts, 82 code metric values, changes between 2 versions, state mutability facts, encapsulation facts...).

 

I won’t go through all the details of hashtable here (load factor/collision/bucket/hash computation/hash on immutable state/caveat etc…). Jeff Atwood did it very well in his post Hashtables, Pigeonholes, and Birthday . Frankly, if you are not sure about hash table concepts or if you don’t use them on a daily basis, go read this post and understand why you are about to use hash tables extensively from now. Also, if you don’t believe me that hash tables are so central for good programming, then believe Microsoft engineers that provided the method Objet.GetHashCode() meaning that the hash concept applies for all .NET objects of all time.

 

I was curious about comparing the performances of the different implementations of hash table .NET developers. My motivation was to see if it was worth switching from the .NET framework hash table implementations to some other ones.

 

Basically there are the 2 classes in the .NET framework System.Collections.Generic.Dictionary<K,V> , System.Collections.Generic.HashSet<K> (I don't count the non-generic System.Collection.HashTable). There are also the implementations available in the 3 libraries C5 (by IT University of Copenhagen), PowerCollections  (by Wintellect) and Mono (by Novell). With the exception that PowerCollections doesn’t seem to provide its own Hash Dictionary.

 

Find below the detailed results. The conclusion is that the .NET Framework implementations are a bit more cheaper than the Mono ones, while the C5 and PowerCollections implementations are behind. Basically .NET implementations are between 1.4 to 1.7 more performant on insertion than the Mono ones. Both are equivalent when it comes to test containment in HashSet<T> and the Mono implementation of Dictionary<K,V>.TryGetValue(...) is a tiny bit better than the .NET one. One has to consider that the big motivation for using hash tables is when there are a lot of containment tests/retrieval, since these operations are done in a constant time. Thus, the Mono flaw on insertion is not that serious and this is a good news for all of us that don't have access to the class System.Collections.Generic.HashSet<K> because we are still running on .NET v2.0. We can just use the Mono HashSet<T> implementation.

 

Concerning memory , if you look at results you'll see that Mono collection seems up to  2 times cheaper that .NET ones but this results are biased. Indeed, some arrays objects are maintained and I cannot distinguish who reference what. I tried to do the memory tests separately but internally some arrays are maintained by the CLR ans it is still not possible to infer a clear result.

 

Here is the C# source file used to do the tests. Notice that I prefixed Mono collections with Mono. to avoid collision with .NET Framework collections. All hash table tested have some string generic parameters, and we also did the test with int32 values, to get also a glimpse at how it behaves with value type. If you have feedback on these testing methodology please put them in comment. I know that the biggest concerns is the fact that a single insertion in an hash table can sometime takes a lot of time because internally it sometime triggers a re-arrangement of keys proportional to the number of elements. The only way to shorten these unwanted effects was to do tests for several values.

 

The results for both performance and memory profiling were obtained with the great JetBrain dotTrace profiler. Concerning memory profiling tool I have to say that usually I use the Scitech .NET Memory Profiler which has some awesome real-time profiling capabilities without almost any overhead. I choosed dotTrace here because the result presentation is nicer and consistent with the performance result I wanted to present.

Concerning performance profiling Red-Gate recently released .NET Performance Profiler v4.1 that has some great visualization features but honestly, I haven’t found the time yet to dig in it and see if I could prefer it over my good friend dotTrace.

 

 

 

Hash Table: Performance Profiling Results

 

Performance for 1M strings.

 

 

Performance for 100K strings.

 

 

Performance for 10K strings.

 

 

Performance for 1000 int32.


 

Performance for 10K int32.

 

 

 

 

Hash Table: Memory Profiling

 

Memory for 100 elements

 

 

Memory for 1000 elements

 

 

Memory for 10M elements

 

 

Memory for 100M elements

 

 

Memory for 1000M elements

 

 

 

 

 


Posted Wed, Oct 29 2008 8:56 PM by Patrick Smacchia

[Advertisement]

Comments

Reflective Perspective - Chris Alcock » The Morning Brew #212 wrote Reflective Perspective - Chris Alcock &raquo; The Morning Brew #212
on Thu, Oct 30 2008 4:31 AM

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #212

gOODiDEA wrote Interesting Finds: 2008.10.30~2008.10.31
on Fri, Oct 31 2008 12:57 AM

Web网页栅格系统研究(1):960的秘密-(2):蛋糕的切法-(3):粒度问题12StepsToFasterWebPagesWithVisualRoundTrip...

Community Blogs wrote Interesting Finds: 2008.10.30~2008.10.31
on Fri, Oct 31 2008 12:59 AM

Web 网页栅格系统研究(1):960的秘密 - (2):蛋糕的切法 - (3):粒度问题 12 Steps To Faster Web Pages With Visual Round Trip Analyzer

Patrick Smacchia [MVP C#] wrote Lessons learned from a real-world focus on performance
on Mon, Dec 1 2008 4:32 AM

I am glad to announce that we just released a new version of NDepend where the analysis phase duration

Community Blogs wrote Lessons learned from a real-world focus on performance
on Mon, Dec 1 2008 4:55 AM

I am glad to announce that we just released a new version of NDepend where the analysis phase duration

Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas wrote Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas
on Tue, Dec 2 2008 10:16 AM

Pingback from  Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas

Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas wrote Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas
on Wed, Dec 3 2008 11:03 AM

Pingback from  Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas

Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas wrote Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas
on Thu, Dec 4 2008 11:09 AM

Pingback from  Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas

Patrick Smacchia [MVP C#] wrote Micro- optimization tips to increase performance
on Sun, Apr 19 2009 8:29 AM

In a previous post Lesson learned from a real-world focus on performance , I explained how we increased

Thomas Wang wrote re: Comparing Hash Table Implementations Performances
on Thu, Mar 18 2010 3:30 PM

One thing that I seldom see covered is the observation on hash table load factors.  Load factor does not appear to be linearly correlated with performance.  I measured hash table insertion performance on stock hardware varying the load factor.  I see 5% slow-down using a load factor of 4, and only a 60% slow-down using a load factor of 16...

Add a Comment

(required)  
(optional)
(required)  
Remember Me?
Devlicio.us