CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Raymond Lewallen

Framework Design, Agile Coach, President Oklahoma City Developers Group, Microsoft MVP C#, TDD, Continuous Integration, Patterns and Practices, Domain Driven Design, Speaker, VB.Net, C# and Sql Server

DataRowCollection, Red-Black Tree and a simple explanation.

Last week I posted about the DataRowCollection and the arraylist that supports it was replaced in Beta2. At that time I didn't know what the implementation was going to be. Turns out, an internal implementation of a red-black tree is the current method used to replace the arraylist. I was also told this could change at any time.

For those of you who don't know, a red-black tree is a type of binary tree with a height of no more than 2log(n+1). Standard operations take O log(n) time. Each node of the tree contains fields for color, key, parent, left, right and rank. The biggest rank of a tree occurs at the root and goes down to zero, so the rank is the height of a node. Two consecutive nodes differ in rank by either zero or one. Every leaf node is black (also a NULL pointer). If a node is red, both its children must be black. Most imporantly, to ensure the balance of the tree, each path from the root node to a leaf contains the exact number of black nodes.

A red-black tree has great performance, and will not suffer the linear performance degredation that a standard binary tree will encounter, due to the number of elements in the standard binary tree. Red-black trees are always balanced, so searching, inserting and deleting all are guaranteed to be O log(n). Re-balancing a tree can be a bit costly, but that still occurs in O log(n) time.

So what does this mean? It means the performance of deletions in a DataRowCollection will be substantially faster, if the DataWorks team continues to use their internal implementation of the red-black tree instead of an arraylist.


Comments

Raymond Lewallen said:

Panos,

I asked them about why they chose the red-black tree over a hashtable or sortedlist, but didn't get an answer, and was only told that the red-black tree was currently supporting the DataRowCollection and that it could very well change.

I almost wanted to implement my own red-black tree to how the GC would deal with it. Even though you get a big increase in initial performance, I am curious if the number of garbage collections is going to undo any performance gain you may have received, like you said, due to the number of nodes created.
# December 9, 2004 11:54 AM

TrackBack said:

# March 19, 2005 2:18 PM

pragya said:

Why we do n't use B Tree in place of RB Trees

# July 30, 2007 2:07 AM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add

About Raymond Lewallen

Working primarily in the public sector during his career, Raymond has designed and built several high profile enterprise level applications for all levels of the government. Raymond now works as a solutions architect for EMC. Raymond is an agile coach, Microsoft MVP C# and also president of the Oklahoma City Developers Group and Oklahoma Agile Developers Group. Raymond spends a lot of his time learning and teaching such things as Test Driven Development, Domain Driven Design, Design Patterns and Extreme Programming practices and principles, to name a few. Raymond is also an advocate of Alt.Net. Raymond is primarily a framework guy, so don't ask him anything about UI :) Check out Devlicio.us!