Raymond Lewallen

Sponsors

The Lounge

Advertisement

Images in this post missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at imagehelp@codebetter.com
DataRowCollection, Red-Black Tree and a simple explanation.
Last week I posted about the DataRowCollection and the arraylist that supports it was replaced in Beta2. At that time I didn't know what the implementation was going to be. Turns out, an internal implementation of a red-black tree is the current method used to replace the arraylist. I was also told this could change at any time.

For those of you who don't know, a red-black tree is a type of binary tree with a height of no more than 2log(n+1). Standard operations take O log(n) time. Each node of the tree contains fields for color, key, parent, left, right and rank. The biggest rank of a tree occurs at the root and goes down to zero, so the rank is the height of a node. Two consecutive nodes differ in rank by either zero or one. Every leaf node is black (also a NULL pointer). If a node is red, both its children must be black. Most imporantly, to ensure the balance of the tree, each path from the root node to a leaf contains the exact number of black nodes.

A red-black tree has great performance, and will not suffer the linear performance degredation that a standard binary tree will encounter, due to the number of elements in the standard binary tree. Red-black trees are always balanced, so searching, inserting and deleting all are guaranteed to be O log(n). Re-balancing a tree can be a bit costly, but that still occurs in O log(n) time.

So what does this mean? It means the performance of deletions in a DataRowCollection will be substantially faster, if the DataWorks team continues to use their internal implementation of the red-black tree instead of an arraylist.

Posted 12-09-2004 8:44 AM by Raymond Lewallen

[Advertisement]

Comments

Panos Theofanopoulos wrote re: DataRowCollection, Red-Black Tree and a simple explanation.
on 12-09-2004 11:43 AM
Do you know why did they chooce red-black tree vs SortedList or something similar ?
The benchmarks of red-black tree for loading a large Dataset (100K rows) will without doubt outperform the SortedList

But since red-black tree has created an additional 100K objects (nodes) vs just two in the case of the SortedList, the overall application performance will suffer

too much pressure to GC, for objects that probably elevate to generation 2

btw the BinarySearch of ArrayList (or searching a SortedList ) is also a O log(n)operation
Raymond Lewallen wrote re: DataRowCollection, Red-Black Tree and a simple explanation.
on 12-09-2004 11:54 AM
Panos,

I asked them about why they chose the red-black tree over a hashtable or sortedlist, but didn't get an answer, and was only told that the red-black tree was currently supporting the DataRowCollection and that it could very well change.

I almost wanted to implement my own red-black tree to how the GC would deal with it. Even though you get a big increase in initial performance, I am curious if the number of garbage collections is going to undo any performance gain you may have received, like you said, due to the number of nodes created.
TrackBack wrote re: What is the maximum size of a DataSet/DataTable
on 03-19-2005 2:18 PM
pragya wrote re: DataRowCollection, Red-Black Tree and a simple explanation.
on 07-30-2007 2:07 AM

Why we do n't use B Tree in place of RB Trees

Add a Comment

(required)  
(optional)
(required)  
Remember Me?