Images in this post missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at
imagehelp@codebetter.com
Speed up your lousy data reads
I had worked on this almost 2 years back, .net was kind of unknown then, and we were probably the first team in Bangalore working on it.
Problem : Your query resultset (may be a DataSet ) has 50000 rows with 5 ( or > ) columns. You can't optimize the query/SP anymore, and over all this, you are using Remoting/WebServices to get this data from a server.
Internals: Each cell (I presume that you are using a DataSet, as I was. I will write about collections later) has an address. These addresses are stored in Hash Tables. Ofcourse there is a linking between all the elements of the DataSet, either direct or indirect. But the Hashing algorithm used by MS is very generic, does not depend on the number of addresses. So every time the hash limit ( 4 buckets of 5 addresses each initially) addresses are rehashed ( the next prime number after 2 x previous bucket size, so 11, then 23 and so on). For the above case, you(CLR) may have to rehash around 16-18 times. This takes a lot of time, and you may face time outs, whitescreens on Deserialization.
Solution : Use batches. Add an identity column to the tables. Make batch sizes optimal for you resultset. say of 10000 each. Fetch these batches one by one, merge these batches (bad solution, you are not improving the rehashing times here) or better create new a DataTable and add rows as they come(another bad solution, but a little faster). So what do we do next? Not moving from this approach, use Asynchronous remoting(if you are not using webservices) to run a for loop to get the data(call the fetch method for each batch at once in a loop, and wait for the server to serve you. then create a DataTable and add rows). No whitescreens this time around.
While these solutions may reduce your fetch times to 50%, you should know that these are not robust solutions. Actually for me, the overhead that a DataSet has, keeping all the addresses it should not for my custom programs, is unaffordable. Binding controls with collections is possible and therefore I create custom classes for each resultset (many robots can be found for this nowadays, back then I had to write them all ) and have a collection defined for these classes, and remote them. Speeds up things a lot, makes you more responsible and so, more at control with things.
Oh yes, I forgot to mention that you can improve times even when using DataSets. All you have to do is write a custom serializer. You will need some help for this and here it is. I would still suggest collections, custom classses and asynchronous calls though, datasets are a load anyway.
Posted
Wed, Jan 5 2005 10:50 PM
by
rsakalley
[Advertisement]