LINQ To SQL – Query Tuning Appears To Break Down In *More Advanced* Scenarios

In the last post:


LINQ To SQL Query Tuning for Performance Optimization


Frans pointed me to a post on the MSDN Forums that discusses the chattiness that can occur in LINQ To SQL when you attempt to prefetch more complex object graphs.


Take the example in the forum post that discusses grabbing the Customers from the Northwind Database and prefetching their Orders and OrderDetails at the same time:


 



 


which you could code like the following:


 

using (NorthwindDataContext context = new NorthwindDataContext())
{
DataLoadOptions options
= new DataLoadOptions();
options.LoadWith
<Customer>(c => c.Orders);
options.LoadWith
<Order>(o => o.Order_Details);

context.LoadOptions = options;

IEnumerable<Customer> customers = context.Customers.ToList<Customer>();
}


 


Opening up SQL Server Profiler shows a rather ugly set of queries ( 40+ queries in total ) that breakdown as:




  1. 1 Query to First Retrieve all the Customers in Northwind


  2. 40+ Queries - 1 Query for each Customer to get its Orders and OrderDetails.

 


Again, you have that startling number of database roundtrips in LINQ To SQL that typically you want to avoid if possible to improve performance.


Since Frans mentioned it :), I wrote similar code to do the same thing using LLBLGen Pro and Self-Servicing Entities:


 

CustomerCollection customers = new CustomerCollection();
IPrefetchPath prefetchPath
= new PrefetchPath((int)EntityType.CustomerEntity);
prefetchPath.Add(CustomerEntity.PrefetchPathOrders).SubPath.Add(OrderEntity.PrefetchPathOrderDetails);
customers.GetMulti(
null, prefetchPath);

 


which resulted in only 3 queries ( roundtrips ) that first grabbed all the Customers, then all the Orders, and then all the OrderDetails. This is more inline with what I was expecting from LINQ To SQL.


I couldn’t find a way to tweak LINQ To SQL to reduce the number of queries, but maybe Scott Guthrie can come up with a more practical solution in his next installment of LINQ To SQL. This is just something to keep in mind when using the current version of LINQ To SQL. Always review the database communication with SQL Server Profiler to determine what is actually being generated.


by David Hayden


 

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

6 Responses to LINQ To SQL – Query Tuning Appears To Break Down In *More Advanced* Scenarios

  1. David Hayden says:

    Jim,

    Unfortunately the LoadOptions doesn’t *help* with multiple 1:m associations as we discovered above and Scott confirms.

    Have you found a way to have LoadOptions help in those cases?

  2. Jim Wooley says:

    As Scott mentions, the recommendation at this point is to use the LoadOptions (formerly DataShape, formerly Including). If MSFT isn’t working on a best practices, I’m including a discussion of it in my upcoming LinqInAction book at least (blatent plug…)

  3. David Hayden says:

    Thanks for the clarification, Scott.

    This type of information needs to be in a LINQ To SQL Best Practices Document so developers are aware of when LINQ To SQL may generate less-than-ideal SQL. If a query like above is one of the more active queries in an application, LINQ To SQL wouldn’t be an ideal candidate and it would be nice to know that upfront.

    A lot of developers, including myself, are looking for guidance on how to use LINQ To SQL in n-layer / n-tier applications as well as when it is efficient and when not. I hope Microsoft is working on a document and real-world samples to help developers, because with the go-live licenses we are building applications now :)

    Thanks.

  4. FransBouma says:

    Scott: isn’t that sloppy coding? I mean: the LoadOptions advertise as if they will do a proper job in fetching the graph. Because that’s what the developer is interested in.

    However, it won’t do any efficient fetching at all. Not only is a join in an 1:n relation pretty inefficient (as every row contains a duplicate), it’s also limiting the graph fetching, as you can’t fetch multi-path graphs efficiently.

    If I fetch 500 customers and their 100 latest orders and the order lines for these orders + the employee rows related to the orders, I’m in for a lot of fun, while it’s so unnecessary as the database also supports subqueries which often result in the same execution plan PLUS in path fetching they are more efficient in almost all cases.

    Furthermore, you can optimize the queries for each node pretty good with a threshold which you use to choose whether you would do:
    1) select * from child where fkToParent in (@c1, @c2, …, @cn)
    or
    2) select * from child where fkToParent in (select pk from parent where )

    How the implementation is now is very very limited. While a limited implementation could be OK, the thing with this is that it is advertised as if it is able to fetch a graph efficiently (I asked you about this in a reply to your linqdatasource article, and you said one could efficiently fetch a graph using loadoptions, but IMHO that’s thus not the case in many situations) but it can’t do that in a truckload of situations. The WORST part of this is is that this problem will likely be popping up in production, where the dataset is big and due to that performance problems will arise. The developer won’t notice these things very likely because the dataset the developer works with is often small enough to be performant even with a lot of queries.

    (although the simple customer-order-orderdetails graph was pretty slow on my northwind db. )

  5. ScottGu says:

    In cases where the relationship between two entities is (n:1), LINQ to SQL does a JOIN as opposed to separate queries since these tend to be safe from a data explosion perspective.

    In the case of a 1:n associations, LINQ to SQL only supports joining-in one 1:n association per query.

    So for example, if you wrote code like below:

    DataLoadOptions options = new DataLoadOptions();

    options.LoadWith (c => c.Category);
    options.LoadWith
    (c => c.OrderDetails);
    options.LoadWith(o => o.Order);
    db.LoadOptions = options;

    IEnumerable products = db.Products.ToList ();

    You’d find that only 1 query is executed against the SQL database for everything (even though you are bringing back all of the Products and their associated Category and OrderDetail information). This is because the product->category relationship is (n:1) and the OrderDetail->Order relationship is (n:1) and so both are automatically joined in.

    The reason the relationship in your blog post above is generating multiple queries is that you have two (1:n) relationship (Customers->Orders) and (Orders->OrderDetails). If you just had one (1:n) relationship (Customer->Orders) or (Orders->OrderDetails) LINQ to SQL would optimize and grab it in one query (using a JOIN).

    Hope this helps,

    Scott

  6. Chris Moseley says:

    This is interesting. I actually put this question to Scott Guthrie a few weeks ago as a comment on his blog. One of the first things I tried to do with Linq to Sql as an experiment in beta 1 was convert a small project from LLBLGen. In my LLBLGen projects I’m pretty big on using prefetch paths so I came across this problem almost strait away. The fact that I can’t seem to get a decent answer from anyone at Microsoft about it has been bothering me a little ever since. Scott’s answer was “I think you can call the LoadWith() statement multiple times. So to setup relationships across multiple tables just call it multiple times with each one.”. I tried this, and as already stated here, it doesn’t work. Scott did mention that he hadn’t had much time to play with this though so maybe when he does he’ll be able to apply some pressure to his team and get some sort fix implemented in time for release. He should as this is a pretty severe limitation to an ORM.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>