Previously: Architecting LINQ To SQL Applications, part 6
The topic of managing entity lifetimes is an important one as many of the issues that people have when using an ORM for the first time relate to a lack of understanding of how an ORM manages objects loaded from the Db, or that are to be inserted into the Db. In addition over the next few installments we will begin to talk about some of the issues related to multi-tier scenarios. It is important to understand how lifetime is managed because many of the issues people have come from working against the ORM rather than with it in these circumstances.
What are Entity and Value types
An Entity is a type which has an identity that remains unique and consistent throughout its lifetime. It is a unique in the sense in that it must always be possible to distinguish one entity from another. It is consistent in that even if the attributes change, the entity will still retain its identity. Consider a Customer type. As an entity we must be able to distinguish one customer from another. We need to somehow define Customer identity through a member. We avoid using natural members like Name because they are not individually unique and may change over time. Instead we base the identity of our entity on a surrogate member that gives us identity instead. In this case we might define an Id member for our Customer that assigns them a unique value within our organization.
Within the Db a primary key field distinguishes an entity, which is represented by a row on a table. LINQ To SQL piggybacks on this to provide support for an Entity, via a type mapped via a Table attribute or mapping, that corresponds to a table that has a primary key. So we map our Customer class to a Customer table and our Id member to an Id primary key field on that table. Again natural keys are allowed, but should be avoided in favour of surrogates to ensure that our entity remains unique and identifiable throughout its lifetime.
The counterpart to an Entity type is a Value type. A Value type takes its identity from the value of its attributes. Two entities that have the same attributes, compare the same. For example two postcodes that have the value "ABC 123" compare the same so we consider them to have the same identity. A value type might have just the one attribute, as with our postcode example, but also might have multiple attributes. An example of a value type with multiple attributes would be a Money type that has both an Amount and a Currency. We want to compare two instances of Money for GBP 123.74 as equal.
Within the DB a Value is represented as one or more columns within the row of an entity. LINQ To SQL only supports primitives as Values i.e. string, int, etc. and has no support for mapping user-defined value types. Thus there is no direct support in LINQ To SQL for types like Money. Hopefully this is a limitation that will be addressed in future versions.
Entity Lifetime
A data context is a unit of work. It tracks changes and submits updates to the Db when flushed. Because of this you should only keep it around long enough to do the work ...but no longer. When working with a DataContext we need to distinguish between, to borrow Hibernate terminology, transient and persistent entities.
Persistent objects have been loaded by LINQ To SQL and we have a reference to them in the DataContext’s identity map. The DataContext tracks changes for the persistent entity against the time that they were loaded. Future requests for that object, will return the object in the cache. Changes to that entity can be submitted to the Db. Lazy Loading of associations, uses the same DataContext we loaded the entity with originally.
An entity that is not in the identity map of the DataContext is a transient entity. It has a lifetime equivalent to the running application. To ensure that the entity persists we need to add it the matching table on the DataContext using InsertOnSubmit and flush it using SubmitChanges on the DataContext, causing the entity to become persistent.
MyContext context = new MyContext();
//persistent object
MyObject myOldObject = context.MyObjects.Where(m => m.Name == "Old").Single();
//changes are tracked
myOldObject.Name = "Changed Name";
//new entity; transient
MyObject myNewobject = new MyObject();
//no need to track; no Db row to update yet
myNew.Name = "New Name";
//make new object persistent and flush changes to old objects
context.SubmitChanges();
This is not always intuitive:
MyContext anotherContext= new MyContext();
//new entity; transient
MyObject myNewobject = new MyObject();
myNew.Name = "New Name";
//new entity is still transient until we actually submit
anotherContext.MyObjects.InsertOnSubmit(myNewobject);
//not persistent, won't be found
var results = anotherContext.MyObjects.Where(m => m.Name == "Old");
It is important to note that although an identity map is sometimes called a first-level cache its purpose is not to optimize retrieval of entities from the Db. Because the ORM, such as LINQ To SQL does not know the results of a query it cannot determine whether two calls to a query (even if it is the same query as other users may have updated the Db) will return the same result set. For this reason it must always bring the result set back, and then check for the existence of those entities within the identity map. If the map contains the item, we must return that instance instead because you might have made changes to the entity and we want to preserve them througout the unit of work. For this reason the cache is not about optimisation but about ensuring you do not lose changes during your unit of work. It is possible that queries that request an entity by primary key could be retrieved from the map directly, where they are already loaded, but you should not rely on this optimization.
Because we hold the object within the identity map for the lifetime of the unit of work there is a danger of concurrency errors, where another user updates the Db while we have the object. For this reason the identity map stores the original version of our object as well as the current one. This allows LINQ To SQL to compare the original against the state of the Db when it issues an update or delete query and raise an optimistic concurrency violation error if it has changed. Of course if you use a timestamp and set Update.Never on your mapping to inform LINQ To SQL that it should not check that field when looking for concurrency errors you can optimize this feature as well. However the optimal SQL issues by LINQ To SQL during an update still depends on knowing what has changed.
LINQ To SQL supports a Refresh method on its DataContext to force a reload of an entity or attributes of an entity. The purpose here is to allow you to resolve optimistic concurrency errors or to reload an object during a unit of work if you are aware that the Db has been changed by a mechanism outside of the purview of LINQ To SQL. The Refresh just goes back to the Db to find the latest version of your object. Note that refresh targets specific objects in the map or a set of objects.
You can disable use of the identity map by setting ObjectTrackingEnabled to false. The purpose here is to optimize when you are loading read-only collections i.e. you never want to submit changes on these objects back to the Db. Remember though that another DataContext will consider these to be transient objects, so avoid assigning references loaded this way into entities loaded via DataContexts which are tracking.
Attaching Entities
Sometimes we have an entity that is in the backing store but not in the identity cache of our DataContext. To make our DataContext aware that this is a persistent and not transient entity we need to Attach it to our DataContext. This puts it in our cache. However, our context cannot know if the entity is the same as the representation on the backing store, so it must assume so and change tracking will consider your object to be in the ‘original’ state. If your object has changed and you try to save those errors, you will get optimistic concurrency errors. This is because when we compare your ‘original’ to the Db, they do not match, which fools LINQ To SQL into thinking another user has changed it since we loaded it.
One option to avoid you must tell the DataContext what the original state in the Db was or set your columns as UpdateCheck.Never. If you have a Timestamp column, you can rely on that to do the right thing for you. Sometimes people suggest an Update.Never on every column strategy, where you cannot use a timestamp, but the danger is that we can overwrite genuine changes by another user.
Otherwise we need to either provide the original, by maintaining the original state for any objects we may choose to detach, or adopt a load and replay strategy for detached objects where we load the current representation from the Db and then write our changes over it.
Think carefully before heading down the detached objects route as it multiplies the complexity of what you are doing. This issue most often raises its head in multi-tier scenarios. We will talk about how to handle those in a future blog post, but for now recognize that the unit of work implies that the framework will not help you track changes outside of that context.
Managing DataContexts
Do not try to work with two different contexts at the same time. This is because what are persistent entities for one look like transient entities to the other because it does not have them in its identity map, as it did not load them.
Do not try to access an object graph loaded via LINQ To SQL outside of its DataContext if it has lazy loaded properties. This is because LINQ To SQL will access the original DataContext to load the entities. Trying to lazy load within another context falls foul of our earlier rule not to mix our contexts.
Finally, assume that a DataContext is not thread safe i.e. work with a DataContext only on one thread and do not try to pass entities retrieved via a DataContext on one thread, to another thread.
When working with a web application consider creating a DataContext per http request, using it to retrieve and then submit any changes required by the session. For a client-side application consider using a DataContext for each application transaction.
While a DataContext is disposable, only dispose of it when you finish your request or application/transaction and are finished with the persistent entities that it loaded.
Exercise caution around caching entities that were loaded via a DataContext. This is because when you access those elements they may still refer to the DataContext if they contain a lazy loaded association. If you want to use LINQ To SQL to load cached data, make sure you load objects that are not coupled to the DataContext, by not using EntitySet<> and EntityRef<> for associations, and disabling deferred loading on the context that you use to load them. This can be appropriate for reference data, in which case, you can also disable change tracking.
Is LINQ To SQL deficient here?
I read a fair number of opinions that are suprised by the behavior of LINQ To SQL. However, having used ORMs for a number of years, I find LINQ To SQL conforms to my expectations as to how an ORM should behave. Indeed a reading of something as old now as Martin Fowler's Patterns of Enteprise Application Architecture and you will find exactly this pattern of behavior for an ORM discussed. Indeed WORM (Wilson O/R Mapper now open source BTW) and NHibernate both behave in a similar fashion. So some of this seems to be based on expectations that don't come from experience of using ORM tools. On a recent .NET Rocks there was an opinion expressed that LINQ To SQL was somehow only fit to be a RAD tool because of multi-tier issues. I can't agree with this opinion at all. LINQ To SQL has a similar feature set to WORM, on which I have built distributed enterprise applications. Its limitations relate to the diversity of mappings that it supports (table per concrete class in an inheritance hierachy or value types for example) and its lack of support for multiple Db vendors, not some percieved issues around the unit of work and identity map patterns which it implements.
Posted
03-09-2008 7:55 PM
by
Ian Cooper