CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Ian Cooper [MVP]

  • Alt.Net UK September

    We are pleased to announce that the Alt.Net UK Conference will be returning in September, and that there will be room for more attendees this time!

    The plan is host the event at Conway Hall in London and our thinking is to follow the same sort of schedule as we did in February:
    - Evening planning session on Friday 12th September, following by a trip to a bar to socialise.
    - The Open Spaces sessions all day on Saturday 13th September

    This time we are thinking of starting off Saturday with a Park Bench to get the juices flowing.

    We are very open to listening to feedback from the community if you think that there are ways that we can improve on the conference experience.

    We would especially like to have more testers, technical authors and usability folk attend to foster cross-pollination of ideas.

    User registration will start from Friday 11th July at 07:00 UK time so the early birds will get the worm!

    The following social hubs have also been set up: Upcoming, Facebook, FriendFeed and LinkedIn. Don't forget that you can subscribe to the AltNetUk News River.

    Finally, we are currently looking for sponsorship, so if you know of an organisation that would be interested to be associated with the conference in return for a little lucre, we would love to hear from you / them! (The conference is non-profit)

    Conchango and redgate have generously agreed to be launch sponsors - but more is needed, especially as we have to pay for Conway Hall this time.

    Ian Cooper, Ben Hall and Alan Dean

    PS Thanks to David Laribee for his help in getting us set up on alt.net

  • TDD and Hard to Test Areas, Part1

     

    TDD and Hard-To-Test Areas

    I wanted to talk about the issues that people get when they begin working with TDD, the same issues that tend to make them abandon TDD after an initial experiment. Those are the 'hard-to-test' areas, the things production code needs to do, that those presentations and introductory books just don't seem to explain well. In this post we will start with a quick review of TDD, and then get into why people fail when they start trying to use it. Next time around we will look more closely at solutions.

    Review

    Clean Code Now

    TDD is an approach to development in which we write our tests, before writing production code. The benefit of this are:

    • Tests help us improve quality: Tests give us prompt feedback. We receive immediate confirmation that our code behaves as expected. The cheapest point to fix a defect is at the point you create it.
    • Tests help us spend less time in the debugger. When something breaks our tests are often granular enough to show us what has gone wrong, without requiring us to debug. If they don’t then we probably don’t have granular enough or well-authored tests. Debugging eats time, so anything that helps us stay out of the debugger helps us deliver for a lower cost.
    • Tests help us produce clean code: We don’t add speculative functionality, only code for which we have a test.
    • Tests help us deliver good design: Our test proves not just our code, but our design, because the act of writing a test forces us to make decisions about the design of the SUT.
    • Tests help us keep a good design: Our tests allow us to refactor – changing the implementation to remove code smells, while confirming that our code continues to work. This allows us to do incremental re-architecture, keeping the design lean and fit while we add new features.
    • Tests help to document our system: If you want to know how the SUT should behave examples are an effective means of communicating that information. Tests provide those examples.

    Automated tests lower the cost of performing these tests. We pay a cost once, but because we can then re-run our tests at a marginal cost they help us keep those benefits throughout the system lifetime. Automated tests are ‘the gift that keeps on giving’. Software spends more of its life in maintenance than in development, so reducing the cost of maintenance lowers the cost of software.

    The Steps

    The steps in TDD are often described as Red-Green-Refactor

     Red: Write a failing test (there are no tests-for-tests, so this checks your test for you)

    Green: Make it pass

    Refactor: Clear up any smells in the implementation resulting from the code we just added.

    Where to find out more

    Kent Beck’s book Test-Driven Development, By Example remains the classic text for learning the basics of TDD.

    Quick Definitions

    System Under Test (SUT) – Whatever we are testing, this may differ depending on the level of the test. For a unit test this might be a class or method on that class. For acceptance tests this may be a slice of the application.

    Depended Upon Component (DOC) – Something that the SUT depends on, a class or component.

    What do we mean by hard-to-test?

    The Wall

    When we start using TDD we rapidly hit a wall of hard-to-test areas. Perhaps the simple red-green-refactor cycle gets begins to get bogged down when we start working with infrastructure layer code that talks to the Db or an external web service. Perhaps we don’t know hot to drive our UI through a xUnit framework. Or perhaps we had a legacy codebase, and putting even the smallest part under test quickly became a marathon instead of short sprints.

    TDD newbies often find that it all gets a bit sticky, and faced with schedule pressure, drop TDD. Having dropped it they lose faith in its ability to deliver for them and still meet schedule pressure. We are all the same, under pressure we fall back on what we know; hit a few difficulties in TDD and developers stop writing tests.

    The common thread among hard-to-test areas is that they break the rhythm of development from our rapid test and check-in cycle, and are expensive and time-consuming to write. The tests are often fragile, failing erratically and difficult to maintain.

    The Database

    • Slow Tests: Database tests run slowly, up to 50 times more slowly than normal tests. This breaks the cycle of TDD. Developers tend to skip running all the tests because it takes too long.
    • Shared Fixture Bugs: A database is an example of a shared fixture. A shared fixture shares state across multiple tests. The danger here is that Test A and Test B pass in isolation, but running Test A after test B changes the value of that fixture so that the other test fails unexpectedly. These kinds of bugs are expensive to track down and fix. You end up with a binary search pattern to try and resolve shared fixture issues: trying out combinations of tests to see what combinations fail. Because that is so time consuming developers tend to ignore or delete these tests when they fail.
    • Obscure Tests: To avoid shared fixture issues people sometimes try to start with a clean database. In the setup for their test they populate the Db with any values they need, and in the teardown clean them out. These tests become obscure, because the setup and teardown code adds a lot of noise, distracting from what is really under test. This makes tests hard to read as they are less granular, and thereby harder to find the cause of failure in.  The Db setup and teardown code is another point of failure. Remember that the only test we have for out tests themselves is to write a failing test. Once you get too much complexity in your test itself it can become difficult to know if your test is functioning correctly.  It also makes them harder to write. You spend a lot of time writing setup and tear down code which shifts your focus away from the code you are trying to bring under test, breaking the TDD rhythm.
    • Conditional Logic: Database tests also tend to end up with conditional logic – we are not really sure what we are going to get back, so we have to insert a conditional check to see what we got back. Our tests should not contain conditional logic. We should be able to predict the behavior of our tests. Among other issues, we test our tests by making them fail first. Introducing too many paths creates the risk that the errors are in our test not in the SUT.

    The UI

    • Not xUnit strength: xUnit tools are great at driving an API, but are less good at driving a UI. This tends to be because a UI runs in a framework that the test runner would need to emulate, or interact with. Testing a WinForms app needs the message pump, testing a Web Forms app needs the ASP.NET pipeline. Solutions like NUnitAsp have proved less effective at testing UIs than scripting tools like Watir or Selenium, often lacking support for features like JavaScript on pages.
    • Slow Tests: UI tests tend to be slow tests because they are end-to-end, touching the entire stack down to the Db.
    • Fragile Tests: UI tests tend to be fragile, because they often fall foul of attempts to refactor our UI. So changing the order and position of fields on the UI, or the type of control used will often break our tests. This makes UI tests expensive to maintain.

    The Usual Suspects

    We can identify a list of the usual suspects, who cause issues for successful unit testing.

    • Communicating Across a Network
    • Touching the File System
    • Requires the Environment to be configured
    • An out-of-process call (includes talking to Db)
    • UI

    Where to find out more

    XUnit Patterns: Gerard Meszaros' site and book are essential reading if you want to understand the patterns involved in test-driven development

    Working with Legacy Code: Michael Feathers' book is the definitive guide to test-first development in scenarios where you are working with legacy code that has no tests.

    Next time around we will look at how we solve these issues.

     

    Posted Jul 07 2008, 04:10 PM by Ian Cooper with 19 comment(s)
    Filed under: ,
  • Showing some support for LINQ to SQL

    While I have finished my series on LINQ to SQL I wanted to talk about some of the reaction. In his summary post of 30 June Roger Jennings mentions his concerns that because the SQL Server Data Programmability group, who are bringing us Entity Framework v1, now owns LINQ to SQL we will not see the kind of development I asked for in my last post of my Architecting LINQ to SQL series. Indeed Matt Warren's comment in this post, that the provider model for LINQ to SQL was disabled before release is troubling for what it implies about internal politics over data access strategies in Redmond and might confirm concerns I had a long time ago.  Looking at the the Data Platform team's blogs and site, LINQ to SQL seems almost forgotten.

    I would like to see MS give this product the support it deserves. I would like to see a commitment from the Data Platform team to stop its focus on talking LINQ to SQL down as a RAD tool and tallking up its advantages for use in the OO approaches to software development. For example  I would like to see the Data Platform team talking about LINQ to SQL's support for POCO strategies, lazy loading, etc. and pointing out to customers who request those features. There needs to be more acknowledgement that if you want them you should consider LINQ to SQL. Right now their only response is to repeat that those features will be in Version 2 of the EF. Well, an MS ORM supports those features today, you should point that out to your customers, and give advice on how to achieve it. As of now, in my opinion, LINQ to SQL is their best development tool for OO an approach to development and they need to reflect its strengths in the advice they give, not just focus on its weaknesses when compared to EF. 

    Martin Fowler posted some time ago about different schools of software development. Perhaps one solution for the conflict over the future of these tools is for MS to accept that it has (at least) two audiences and build a product for the OO folks and one for the data-first folks. In that case LINQ to SQL would be a better start for the OO folks because it already contains so much of what they need, it may represent a better starting point for supporting them.

    People feel sorry for the Entity Framework team for the criticism in the open letter. For my part I feel sympathy for the LINQ to SQL team, who fell foul of product strategy decisions with ObjectSpaces and seem to have done so again. Considering how well they understood the OO approaches we wanted, unlike the EF team, the team are unsung for their efforts. The Alt.Net community in particular should give them wider support for having, unlike the Entity Framework team, recognized the needs of those developers taking an OO approach to development. it does not give us everything, but credit where it is due.

    Today, while I would not recommend using the Entity Framework I would recommend looking at LINQ to SQL. Everything needs evaluation for your own needs, but unlike EF, LINQ to SQL is a better contender for OO approaches today.

    Sasha points out when looking at how LINQ to SQL is suprising people who had been misinformed as to what if offered and how we should use it, the noise in the blogsphere from Entity Framework supporters seems to have drowned out the value of LINQ to SQL as an ORM. Indeed I believe the Data Team's own pitching of LINQ to SQL as a RAD tool is an underestimates the product.

    As a community, as people begin to realize the suprising power of LINQ to SQL, I would like to see us dispel many of the myths that seem to have grown up around that product. I would like to see us put pressure on the Data Platform team to provide the support for LINQ to SQL that we want going forward. Community reaction is everything and if the LINQ to SQL community remains silent in the face of the more vocal, but probably less numerous, EF community, we won't get the product we deserve.


    Posted Jul 02 2008, 07:57 AM by Ian Cooper with 24 comment(s)
    Filed under:
  • Architecting Linq to SQL, part 10

     Previously: Architecting Linq to SQL, part 9

    End of the line

    This is intended to be the last part in this series and I wanted to take the opportunity to talk about a number of related if diverse topics. I would like to look at what I would like to see in the next version, and talk about when and where I intend to use Linq to SQL and other ORMs such as NHibernate.

    I will try to get some of the code that goes with this series up onto google code over the coming months, schedules permitting. Roger Jennings requested that I give more than a trivial example of how to do messaging for n-tier scenarios. I'm flattered by Roger's confidence, though I feel that Greg or Udi would be better placed to do a introductory piece on messaging. But if there is demand I will give it a try.

    What would I like to see in the next version?

    The following are, I think, the priorities:

    Support for Value Types. In a fine grained object model we may have classes that are not entities (have a distinct identity independent of their state). A common example would be Money, which has a value and currency. We do not want to map these to rows in a table, but to columns. Right now with Linq to SQL we have to represent money as two fields, amount and currency in an entity. We would like to represent them as one type, which can be mapped independently. As an aside a lot of systems overuse primitive types directly. Often we have something that is not a string or an int even if we can represent it as such. Our systems become clearer if we can wrap these primitives in a name appropriate to the domain such as ShippingReference. This only works if we can map value types easily. This is fairly straightforward if we assume that the column names used by these value types remain the same on any entity that stores them.

    Support for changing loading options. As of today we can only alter the default loading behavior of a DataContext before we use it. This assumes that we can determine what we want to eager load once for a context. The reality is that we may want to set this before we run any query. So it must be possible for us to set eager loading options each time we run a query. An alternative would be to do something more akin to Hibernate's HQL language's ability to add a fetch to the query expression so that we can tell the query to load the relationship eagerly.We also want support for eager loading multiple child associations, not just the one we have now.

    Support for ordered relationship types. Right now an association is treated as a set - an unordered collection. However often are children are ordered, particularly being in a map where we have both a key and a value. The key should be both a primitive type and another entity or value type. While this type of mapping is less common in the relational world, within our domain we often want to use ordered mappings, and support for mapping these to relational tables gives us increased flexibility when mapping domain to Db.

    Support for table per sub-class mapping.  Sometimes we do not want to allow fields on sub-classes to be nullable. Unfortunately this is a requirement of table-per-class-hierarchy mapping strategies. Allowing table per sub-class, using a shared key strategy would allow us to avoid this issue. Table per-subclass with shared key avoids some of the performance issues from moving away from a single table, which might be incurred if we took a union approach to combining data from multiple tables to support subclassing.

    There are also some things I would like to see, though I am less optimistic that they will happen

    Expose the provider model. Allow LINQ to SQL to target multiple Dbs. It exists but was never exposed at RTM. I suspect the resources were not allocated because the Entity Framework became the way to work if you had a non-SQL Server back end. Given EF not being positioned as an ORM let's open it up so we can take LINQ to SQL forward.

    Include an explicit in-memory provider. This will make TDD a breeze.  Once we have an in-memory provider it would be easy to swap out the Db for unit testing purposes.

    Support for second-level caching. MS now has a second level caching technology in Velocity. It would be nice to see support for working with a second level cache within LINQ to SQL (the first level cache is the identity map).

    What I would be cautious about in the next version 

    There are also some things that I would be disappointed about a disproportionate amount of much effort being expended on:

    Support for serialized entities. I hope I have managed to explain why serializing an entity across tiers is a bad architectural style. Instead of corrupting LINQ to SQL with support for this practice, I would like to see an emphasis from the patterns on practices team on dissuading people from approaching n-tier design in this style. We do not want to pollute entities with change tracking or serialize a DataContext.

    More advanced designer options. I appreciate that some folks like designers, but I think that they may be a red herring here. If you work domain-first then you might as well use attributes to mark up your domain model, or hand code your xml mappingfile. If you are going to work in a data first approach, I would push extending SQLMetal with those capabilities instead of a designer.

    In the data first case the design is done in the RDBMS, not in the domain model, so by the time we get to LINQ to SQL we are just generating our entity model from our Db. All the designer gives us is the ability to select a sub-set of tables to generate. A fairly simplistic UI, such as a dropdown list to add tables, could configure the options for a SQLMetal call. Flashy drag and drop layout seems a little bit wasteful. Even better if the property based approach is just a wrapper around a SQLMetal call that makes that command you have configured available. That allows folks to use the command they have created throough the designer in their build scripts to call SQLMetal. This would give more resources for the new functionality people want from their data-first designer such as file per entity, update an existing set of files for changes etc.

    I understand this may not be popular, but command line tools are cheaper to author and can deliver a lot more bang for your buck if your team has limited resources. In addition a designer can blind you to an over-complex approach to mapping. If you cannot easily map by hand, if you require that designer, then I believe that you may have lost your way.

    I understand that these suggestions will be unpopular with some people, but both of them represent dead ends to me, that do not provide us with the ability to write better software. Of course your mileage may vary.

    Linq to SQL over Entity Framework for your ORM

    For my part, and that is of course based of my school of software development, LINQ to SQL is a better ORM than the Entity Framework. That may come as no shock to the EF team who have a bigger vision for their product than ORM. For me, LINQ to SQL get a lot right: support for persistence ignorance, single mapping file that is by-hand authorable, lazy loading as a default strategy. If MS intends to provide an offering in the ORM space, as opposed to whatever space the EF is defining, and thus fulfill the vision that Anders gave us of simplifying the development experience by having data access as part of the language. To me LINQ to SQL is the best MS contender for the crown. Given the resources, LINQ to SQL could become a great tool. I hope that MS continue to allocate a fair share of resources to it.

    LINQ to SQL vs. NHibernate

    To be honest, I have to say that my next project will use NHibernate for its persistence technology instead of LINQ to SQL. Why? It is a large project, with a significant number of entities, and we want to support fine-grained object models and table per sub-class mapping strategies. We also wanted the insurance of being able eager fetch on a query-by-query basis and have a 2nd level cache. It is an old adage but 'there is no silver bullet'. I'm picking one tool out of the kit, it does not mean the others are not valuable.

    At the same time LINQ to SQL still forms part of our strategy, because we believe it to be simpler to approach for many projects.So we also have and will be using LINQ to SQL. If anything LINQ to SQL replaces WORM for us which we used for a number of projects where we had good table to entity affinity. Ironically perhaps WORM was an implementation of the proposed interface for ObjectSpaces. ObjectSpaces was the MS ORM for .NET 2.0, that never saw the light of day. ObjectSpaces became LINQ to SQL, and Matt has full the story here, so it seems a natural inheritor. Let us hope it does not meet the ObjectSpaces fate of being sidelined for a more grandiose vision of data access.

    A valid question might be to ask why I want improve LINQ to SQL, why I do not just tell everyone to use NHibernate. Some of this is a recognition of the market, many people will not use a non-MS ORM and LINQ to SQL is is a solid ORM. Pragmatically we are likely to get more .NET developers who know LINQ to SQL available in the market place than NHibernate developers. But I also believe that with the expressiveness of LINQ MS have a real chance to move the ORM market forward in the .NET space. LINQ to SQL is like ASP.NET MVC, it is a welcome acknowledgement from MS of what developers want, and we should commend them when they do get it right.

    I will be posting a series on NHibernate going forward, so that you can make your own judgements on which to use and when.

     

  • Alt.Net events in London

    Danny over at the Alt.Net yahoo group asked about alt.net events in London. I thought I would publicize more widely for folks that do not know what is on offer.

    I run the London .NET user group. We have covered alt.net topics for the last 5 or so years, and have always had strong alt.net community involvement. For example we ran the XP game for everyone back in 2005. You can find our, much in need of work when I get some time, site here.

    I am also involved in running the altnetconf in London with Ben and Alan with lots of help from Michelle at Conchango. We held an open spaces conference last January and are in the planning stages for a new one in September. We do have a google group that you can find here.  I don't spend as much time there as I should

    Sebastian Lambla is running an alt.net beers get together. I think he is looking to add more technical content to his existing social scene. If you are looking for a more regular and explicit alt.net get together that is worth trying (sorry Seb will try to be in the country for the next one). Seb usually advertises meetings on the above list and you can also pick it up via our newsriver. Thanks to Dave Verwer for that (Dave is a Ruby guys but I believe he is also working on an ASP.NET MVC book with Jeffrey Palermo).

    Skills Matter hosts some 'Open Source .NET Evenings'. London .Net User group will be hosted there a few times this year. You can find their listing on their website.

    So we are pretty well-covered, in fact there are too mauy events to attend all of them.

    In addition there is a wider agile community to attend such as the Extreme Tuesday Club.

    To be honest I am strongly opposed to a user group self-identifying as alt.net as the point is to reach out to the wider audience. I would say that a lot of UK groups tend to run alt.net topics including the NxtGenUG and VBUG guys. I know that because I have spoken at them. If you are interested in alt.net and want to see more events, I would encourage you to offer to speak, for that is the limitation a lot of groups have on alt.net topics - speakers. In feel sure all would welcome you and not turn you away for presenting the alt.net viewpoint. Given how much is already available I would definitely question anyone setting up an explicit alt.net group in the the London area. Presumably they would be focusing on alternatives to things happening at existing groups and be running sessions like 'Waterfall: Return to Righteousness' and 'Procedural Programming: Putting Data Stores Back On Top".



    Posted Jul 01 2008, 12:52 PM by Ian Cooper with 3 comment(s)
    Filed under:
  • Architecting LINQ To SQL part 9

    Previously: Architecting LINQ To SQL part 8


    LINQ To SQL with N-Tier: Why is there pain?

    People tend to experience pain using LINQ in N-Tier scenarios because they are trying to pass entities between layers. You can find some examples of this complaint, not to pick on anyone but just to avoid repeating it, here and here.

    First up is that you will have trouble serializing EntitySet and EntityRef. As you will recall from Part 5, we need to use these collection types to support lazy loading. The problem is that they will not serialize with the XmlSerializer or because they contain circular references. They will however serialize using WCF’s DataContractSerializer.

    Second, remember that we said that you should always access entities within the scope of a DataContext in Part 7. The DataContext provides the unit of work and change tracking, so objects outside the scope of their context must be attached if we are to tell LINQ to SQL that they are persistent and not transient. You need to provide LINQ with information as to the original state of the persistent entity, so as to avoid spurious concurrency errors .When we access a lazy loaded collection the proxy for that collection uses the DataContext it was loaded with to retrieve the persistent child items. If the DataContext has been disposed you will get an exception.

    DataContext itself is not serializable and contains non-remotable resources like a Db connection. This means you cannot gain change tracking and lazy loading by simply serializing your DataContext too. This tends to throw people who come from a DataSet background who have become used to serializing the context with the data. Indeed this ability to work in a disconnected fashion is trumpeted as a feature of the DataSet. As a result, this set of users tries to use LINQ to SQL in the same way and then becomes frustrated when they can’t.

    It is important to recognize that ORMs represent a different architectural style to DataSets. Persistence Ignorance means that we want our domain model to be unaware of persistence related concerns, such as the relational model, Db connections, SQL statements, and object change tracking. As a result, these types of architecture don’t support this model of interaction. Those of us who believe in persistence ignorance think that this is a good thing, because clean separation of concerns makes our software more amenable to incremental re-architecture which keeps our software fit and healthy as it changes.

    Prefer Messaging
    So if LINQ to SQL follows this school of development what is the answer to working in a multi-tier environment? Simply it is to prefer messaging when communicating across tiers.
    The usual alternatives to messaging are shared database integration and remote procedure calls.

    The former is not usually raised as a mechanism within LINQ to SQL discussion but having been the recipient of legacy systems that use shared database integration I would advise strongly against it. Problems occur when multiple systems update one Db. To support multiple applications you need to express the Db schema in a generic way. Each application has to transform this schema when it loads its entities to match its own domain model. These model differences, systems tend to comprehend the entities in different ways, lead to subtle inconsistencies. Whenever you change how we interact with the Db for one application, we have to test all applications that share that Db, increasing the cost of making changes. It is not just schema changes that are at issue here, how we update the Db, the values we write, are just as important a concern. Just because both of my systems have a Customer does not mean that the semantics of dealing with a Customer are the same to both of them. For the order fulfilment and complaints departments a customer may be very different. Domain models are application specific  and not shareable.

    Remote procedure calls are problematic because they exchange type. Exchanging type locks us into a relationship between client and server that becomes burdensome because when we change the server we have to change all the clients, or if we need a change for one client, we have to change the server and all other clients. This increases the total cost of ownership of the solution.

    When we talk about messaging as a solution there are a number of important things to remember:

    We should share schema not type. We want to send message not transfer an instance of a type. The sender may compose the message from one of more types and the receiver may consume the message into a type. But it’s a message not an entity.

    Do not reference your domain model across the tiers. If you find yourself creating references to an assembly from one application in another, in an effort to send and receive entities, you have lost your way.

    Exchange state, not behaviour. What we care about is passing a message that has some state, not the behaviours associated with an entity.

    Assume you do not own the ‘other’ tier. Remember that when we talked about distribution one reason was to expose our services to a range of clients. If we make assumptions about the client, we limit our consumers.

    Servers decide what to do with a message, clients decide what to do with replies, if there are any. With no transfer of behaviour each side decides what to do with the message, if anything.


    Remember how we said the first law of distribution was don’t? Well one thing to think about is that if you balk at the idea of messages because ‘in the real world we need to get this code done in a reasonable timescale’ then your issue might be that you should not be opting for a multi-tier scenario anyway, because if you need n-tiers then you tend to be able to pay the price of messaging.


    Insulate the Domain
    When you use messages you want to insulate your domain from the message that goes over the wire. This is because you want to be able to change your domain but continue to support existing clients who use the current message format. Remember that you may not own your clients, so you cannot assume that they will change just because you change. So you need to separate your domain from the message.

    In most cases you will tend to replace your RPC need to serialize the entity with a Document Message. To populate the document message it is tempting to serialize your entities directly over the wire as the message body. Instead, populate your messages from your entity.

    The following trivial service represents this pattern, showing how we use 'replay' to load the entity affected by the message, update it, and then submit the changes:

        public class RegionService : IRegionService
        {

            public RegionDTO GetRegion(int regionId)
            {
                NorthwindDataContext session = new NorthwindDataContext();

                var query =
                    from r in session.Regions
                    where r.RegionID == regionId
                    select new RegionMessage { Id = r.RegionID, Description = r.RegionDescription };

                return query.Single();
                   
            }

            public void SendRegion(RegionMessage regionMessage)
            {
                NorthwindDataContext session = new NorthwindDataContext();

                Region region =
                        (from r in session.Regions
                        where r.RegionID == regionMessage.Id
                        select r).Single();

                    region.RegionDescription = regionMessage.Description;

                session.SubmitChanges();            
            }

        }
     

  • The criticism of the Entity Framework is not just around Domain Driven Design

    Some of the response to the open letter on the Entity Framework have suggested that the criticism comes from practitioners of Domain Driven Design, as outlined by Eric Evans, who find that the model proposed for Entity Framework does not gel with their objectives. This is a mis-characterization of what is happening here. Eric has a number of insights in Domain Driven Design related to ideas like ubiquitous language but many of the ideas in conflict which have existed in the OO community for a long period of time outside of that context.

    As an example the idea of persistence ignorance, or separation of concerns. In September 2000, in response to the complexities of the Enterpsise Java Bean (EJB) model, Martin Fowler, Rebecca Parsons and Josh MacKenzie started the POJO or Plain Old Java Project which highlighted the benefits of coding business logic into regular java objects over beans. EJBs were an ambitious attempt to support persistence, transactions, events, rpc etc in a single component. While there was an initial rush to adoption, they quickly proved a millstone around the Java communities neck, because of their complexity. Technologies like Spring and Hibernate emerged as less-complex ways of delivering enterprise class solutions, many of them were later to be incorporated into drastically revised EJB visions.

    The .NET community had a huge amount to gain from this experience. Ports of Hibernate and Spring offered the community the chance to avoid the mistakes of the past. However, seemingly unaware of the lessons of history the EF team embarked on a project to produce a complex framework, of which persistence is just one aspect, reminiscent of the EJB initiative. So the reaction against the EF comes from its failure to learn lessons that another community has struggled with around complexity and ambition.

    So the warnings issued around the EF are trying to prevent .NET developers straying into the same path of pain that their Java brethren experienced with complex non-PI frameworks.

    As another example let us consider the notion of designing your system by focusing on objects instead of data. This one is pretty old. At OOPSLA in '89 Kent Beck and Ward Cunningham presented a paper on using CRC cards as a technique for teaching object oriented thinking. They wanted to find a new guide for procedural programmers used to designing systems by thinking of processes, data flows and data stores by an equivalent tri-partite model for the OO world. Their answer was class names, responsibilities, and collaborators. Even this far back Kent and Ward urge their audience to "[create] objects not to meet mythical future needs, but only under the demands of the moment". Even this far back (and perhaps even further, this is just an example I was aware of) the focus on building an OO system by modelling classes, their responsibilities, and collaborators and not processes, data flows, and data stores was being espoused as good OO practice. This 'domain' focus is not a new 'fad' but pretty much how you do OO design as opposed to procedural design. It would not seem sensible to do OO by modelling processes, data flows and data stores and then generating an OO design from them, but this seems to be the approach that the EF takes. This makes it overly-complex to adopt an OO approach using the entity framework.

    So these concerns are not because of some new 'faddish' method favoured by the alt.net community.

     

  • DDD7 Call for speakers

    We have announced the date for DDD7 as Saturday 22nd November 2008, at Microsoft's campus in Reading (UK). That is a long time out, but we wanted to begin the call for speakers now, to give speakers plenty of time to get their sessions submitted. We will still go to voting much closer to the event, so you have time, but for those of you who have been undecided about committing to speaking in the past, you should have plenty of time.

    I'm hoping that we will see a good representation from the Alt.Net community in the submissions for this year's event, so start thinking now. 

  • When does design happen in agile?

    Someone at work, new to agile approaches, asked me about when agile projects do design. I thought I would share my answer in case others were still wondering, and to try and explain where experience may have lead to us activities that are not textbook.

    The short answer is ‘all the time’ and ‘just-in-time’. The long version is ‘all the time’ in that we are always thinking about design during story gathering, planning etc. which is why you want as many people on the team involved as possible. When we say ‘just in time’ we mean that we leave it to the ‘last responsible moment’. Of course the hard part is figuring out when the ‘last responsible’ moment is. I like Alistair Cockburn's model that the objective of any activity ins software development is to provide just enough for the next activity to take place. So we would tend to do just enough design to let us being development.

    Being more concrete: 

    • Agile projects do design through TDD, so called emergent design or incremental re-architecture.
    • XP supplements TDD with pair programming, so at least two people design everything.
    • XP recognizes that some design needs to be all team, and has an optional CRC path, in each iteration, for contentious design choices or complex problems.
    • Crystal does not mandate pair-programming, and scales to larger teams, so it has more ceremony. It’s not uncommon for a crystal project to model the core domain model up front.
    • Most agile teams recognize that there is a nebulous conception or inception phase when you gather user stories, organize and understand them, do just-enough modelling,  set up build servers etc. It is the great unspoken step of many agile methodologies.
    • We care more about the domain model when modeling than about the infrastructure. Early on I would never tend to model the infrastructure.
    • Any ‘up-front’ modeling, before coding, is high-level, not low i.e. we care about classes, roles, responsibilities, and collaborators, but not attributes, methods etc. I tend to favour Rebecca Whirfs-Brock’s Responsibility Driven Design for this, and would consider this her Exploratory Design phase.
    • The conversation is more important than the artifacts the conversation produces. This is why CRC cards are great because the artefact is less important than the design discussion among the team and running the scenarios. You may produce a UML sketch of the system at the end, but it’s just an aide-memoir to the discussions, not a blueprint. It is also why design needs to be ‘all-play’. We don’t want to punt documents over the wall to developers, we want them to be involved in the activity of design from the beginning. That way the documents don’t inform, they just record the discussion. That’s what a document is after all.
    • CRC cards also work well because they have feedback - running through scenarios. We tend to divide up the cards and throw a ball between participants as the system calls their classes act. That means you get some testing around your design as you do it.
    • Remember that you are looking for ENUF, just enough design to make the next step, development, possible. If you end up throwing away too much of your initial design when you get to coding you have created waste, which we are trying to avoid, so don’t push this exploratory design step too far. Just do enough to answer the key questions
    • Be proportional. If you have ten user stories you may not need much of a conception phase. If you have two hundred you may need more.

    The N-Tier with LINQ posts almost finished so should be out in the next few days. There is really only one post in the series after that, then we will move on to looking at hard-to-test areas in TDD.
  • Architecting LINQ To SQL Applications, part 8

    Previously: Architecting LINQ To SQL Applications, part 7 
     
    Tiers
    A layer, such as we discussed Part 2, in is not a tier. A layer is a logical unit of division; a tier is a physical unit of division. The two need not be the same. Software may be layered within a single process running on one machine or across a number of processes running on different machines. Layering is about separation of concerns. Physical distribution is about sharing and scalability. So for example an RDBMS tends lives on a server, because we want to share data among many users in a scalable fashion. Some rules of thumb are:

    • Presentation Layer: This is the most obvious. For a rich client we need run our presentation layer on the client. Remember that we might consider a browser application to be a rich-client if it uses JavaScript and DHTML to provide its UI i.e. Ajax. For a thin client application the presentation runs on the server.
    • Application and Domain Layer: This may run in the same place as the presentation layer, and this is the most responsive option. An application server runs the application and domain logic on a server. It allows sharing of the functionality exposed by the domain across multiple clients or to other applications. Running an application server can also lower cost of upgrades in a rich-client environment, because you update the domain logic in a smaller number of places.
    • Infrastructure Services: Because the application and domain layer depend on the infrastructure services they tend to run with the application and domain layer or on their own server.

    (As an aside, layers do not constrain packaging decisions either (i.e. how to divide your application into assemblies) and packaging decisions are again seperate from tiers. I do not intend to cover packaging decisions here. One of the best discussions is in Robert Martin's Agile Principles, Patterns, and Practices in C#. The NDepend tool is invaluable in helping you make sensible packaging decisions.

    The Laws of Distribution

    The first law of distributing your application into tiers is don’t.

    Distribution is a complexity multiplier. Increased complexity equals increased cost. Software is an economic proposition to the buyer: what benefit they receive offsets the cost of purchase and ownership. The benefits of distribution come from sharing, scalability, and quality of service. Always make tier decisions with those criteria in mind so those are the criteria which steer us in that path. David Hayden has talked about this issue on CodeBetter before.

    Scalability

    Distribution will make your application more complicated and crossing process boundaries will make your application slower (calls across process or machine boundaries are orders of magnitude slower).  So why would we scale our application? Many application servers seem to do little beyond passing a requests to the Db to and from the UI, doing little or no processing in-between. What are people using a server for here? Connection Pooling is the major driver. If our data access happens from within the same process, then we can share connections from a pool reducing the cost of creating and destroying connections.
     
    However, for ASP.NET applications there is little to be gained here. Indeed, a lot of people separate their domain layer into an application server in this context without needing to. We gain the benefits of connection pooling obtained from running in a single process, by virtue of being hosted within ASP.NET. There is no need to introduce a seperate application server for connection pooling here. For smart clients we can gain benefits from pooling by doing data access from a server tier, because otherwise every client will have its own connection to the Db. So at a certain point a Windows Forms application may gain a performance benefit from using an application server, but its worth bearing in mind the need to trade this benefit against the performance costs of distribution to begin with.

    The connection pooling example extends to anything where we have expensive creation. We may find it is more efficient to create once on the server and then allow mutliple clients to share that resource. Indeed if materialization of our objects from the Db were to be an issue then caching the objects so created and returning the cached copy from our server would relieve pressure on the Db, by allowing clients to retrieve the copy instead of re-issuing the request. However, we would need to make decisions about the importance of the liveness of our data and how our cache would be refreshed if it became invalid.

    In addition, where we are performing intensive activity on the server we may want to improve the performance of our application by splitting the work up and then scaling out or up. In this case we put the work into a server and share the work out among a pool of servers. Where the application server does a lot of work, for example complex calculations, then adding an application server allows the number of nodes where that calculation can be performed.

    Shared Services
    Sharing creates pressure for a middle tier, especially if we want a service oriented architecture where there is one authority within our enteprise for certain operations. For example in an insurance company a number of applications might require access to our rating information. By hosting this information in a service we make it possible for any application within the enterpise to obtain that information. If it was locked into a smart client (for example an Excel spreadsheet) it would be difficult for other applications to share that information. A rich client cannot share the knowledge of the domain it captures, or the results of calculations, and in this circumstance you are always forced into moving this knowledge onto the server.

    Quality of Service
    We may want to provide our application with multiple nodes to improve its fault tolerance; if we lose one node other nodes can continue to carry out work. We might want to run a service our application uses in a different security context to that which the application uses. We might want our application to use reliable messaging for an operation. All of these issues can be loosely grouped as quality of service concerns and we may want to distribute our application to avail ourself of these services.

    The Last Responsible Moment
    Beware premature decisions here. Just because you may need to expose this knowledge as some point, does not mean that you need to do so from the beginning. A well-layered application would be amenable to refactoring to a mult-tiered one, when you need it to be. I would try to defer the cost of distribution, until you know you need it. Bear in mind for example that a web application just uses the HTTP protocol to exchange HTML with a browser. To add a second ‘presentation’ layer that exposes XML over HTTP instead, to a calling application instead of a browser, should be a straightforward change to your existing architecture. Indeed, identifying the services the application provides may be easier if we look toward what ‘services’ we provide via HTML.

    Why is there pain going multi-tier with LINQ To SQL?
    Assuming that you have worked through options and decided you need an N-Tier application, the question becomes: what issues will you hit when you try to use LINQ To SQL in this context.

    Serialization
    It is not straightforward to serialize an entity between two tiers.

    The first issue is that XMlSerialization fails with circular-references so XMLSerialization will tend not to work where you are managing a parent-child relationship with an EntitySet-EntityRef pair. The designer does have a workaround for this in WCF, so that you don't serialize both parts. There is a good summary of the issues here and here.

    Entity-lifetime management 

    The second issue is DataContext. Remember how we talked about the DataContext as a unit of work in which we should manage all of our interaction with entities. Passing an entity to another tier removes it from its extant unit of work. Assuming that you intend to work with the entity on another tier and then pass it back to update the Db with any changes, you will need to bear in mind that you are converting a peristent entity to a transient one and back again. There are two ways of dealing with this round trip. The first is to attach the entity to the DataContext when you deserialize it, to let the DataContext know that it is really a persistent object. As previously discussed, to effect this you think about how you will manage the 'prior' state by, for example, tracking the old state of the entity yourself. The second is to think of your deserialized object as set of deltas to the current Db representation of the entity and load the entity from the Db within the current DataContext and change it to reflect the object you have deserialized.

    This suprises some people because they are used to working with DataSets, which formed a unit of work, and were serialized with both the old and current state of the rows they contained, allowing the updates to be played through a DataAdapter when they were passed back to the middle tier. LINQ To SQL does not support this model 'out-of-the-box'. This tends to shock people who expect that the features of DataSets should somehow extend to LINQ To SQL.

    We'll cover solutions to these issues next time, but as a teaser I would point out that the 'Replay' model above is common in web applications. There we create a DataContext and load any entities required in response to a request for a page; compose an HTML response to that request from those entities and then dispose of out DataContext. If the user is editing they send a message from their browser in return with changes that indicate the modifications we want to apply to the entity. In response to that postback we load the entity from the Db again, using a fresh DataContext, and apply the edits that the user has made to it.

    Note how we are exchanging messages with the browser, in the form of HTML, instead of serializing objects. This basic pattern of message exchange forms the heart of how we should architect out n-tier applications when using LINQ To SQL.

     

  • Architecting LINQ To SQL Applications, part 7

    Previously: Architecting LINQ To SQL Applications, part 6

    The topic of managing entity lifetimes is an important one as many of the issues that people have when using an ORM for the first time relate to a lack of understanding of how an ORM manages objects loaded from the Db, or that are to be inserted into the Db. In addition over the next few installments we will begin to talk about some of the issues related to multi-tier scenarios. It is important to understand how lifetime is managed because many of the issues people have come from working against the ORM rather than with it in these circumstances.

    What are Entity and Value types

    An Entity is a type which has an identity that remains unique and consistent throughout its lifetime. It is a unique in the sense in that it must always be possible to distinguish one entity from another. It is consistent in that even if the attributes change, the entity will still retain its identity. Consider a Customer type. As an entity we must be able to distinguish one customer from another. We need to somehow define Customer identity through a member. We avoid using natural members like Name because they are not individually unique and may change over time. Instead we base the identity of our entity on a surrogate member that gives us identity instead. In this case we might define an Id member for our Customer that assigns them a unique value within our organization. 

    Within the Db a primary key field distinguishes an entity, which is represented by a row on a table. LINQ To SQL piggybacks on this to provide support for an Entity, via a type mapped via a Table attribute or mapping, that corresponds to a table that has a primary key. So we map our Customer class to a Customer table and our Id member to an Id primary key field on that table. Again natural keys are allowed, but should be avoided in favour of surrogates to ensure that our entity remains unique and identifiable throughout its lifetime.

    The counterpart to an Entity type is a Value type. A Value type takes its identity from the value of its attributes. Two entities that have the same attributes, compare the same. For example two postcodes that have the value "ABC 123" compare the same so we consider them to have the same identity. A value type might have just the one attribute, as with our postcode example, but also might have multiple attributes. An example of a value type with multiple attributes would be a Money type that has both an Amount and a Currency. We want to compare two instances of Money for GBP 123.74 as equal.

    Within the DB a Value is represented as one or more columns within the row of an entity. LINQ To SQL only supports primitives as Values i.e. string, int, etc. and has no support for mapping user-defined value types. Thus there is no direct support in LINQ To SQL for types like Money. Hopefully this is a limitation that will be addressed in future versions.

    Entity Lifetime

    A data context is a unit of work. It tracks changes and submits updates to the Db when flushed. Because of this you should only keep it around long enough to do the work ...but no longer. When working with a DataContext we need to distinguish between, to borrow Hibernate terminology, transient and persistent entities.

    Persistent objects have been loaded by LINQ To SQL and we have a reference to them in the DataContext’s identity map. The DataContext tracks changes for the persistent entity against the time that they were loaded. Future requests for that object, will return the object in the cache. Changes to that entity can be submitted to the Db.  Lazy Loading of associations, uses the same DataContext we loaded the entity with originally.

    An entity that is not in the identity map of the DataContext is a transient entity. It has a lifetime equivalent to the running application. To ensure that the entity persists we need to add it the matching table on the DataContext using InsertOnSubmit and flush it using SubmitChanges on the DataContext, causing the entity to become persistent.

    MyContext context = new MyContext();

    //persistent object
    MyObject myOldObject = context.MyObjects.Where(m => m.Name == "Old").Single();
    //changes are tracked
    myOldObject.Name = "Changed Name";

    //new entity; transient
    MyObject myNewobject = new MyObject();
    //no need to track; no Db row to update yet
    myNew.Name = "New Name";

    //make new object persistent and flush changes to old objects
    context.SubmitChanges();

    This is not always intuitive:

    MyContext anotherContext= new MyContext();

    //new entity; transient
    MyObject myNewobject = new MyObject();
    myNew.Name = "New Name";

    //new entity is still transient until we actually submit
    anotherContext.MyObjects.InsertOnSubmit(myNewobject);

    //not persistent, won't be found
    var results = anotherContext.MyObjects.Where(m => m.Name == "Old");

    It is important to note that although an identity map is sometimes called a first-level cache its purpose is not to optimize retrieval of entities from the Db. Because the ORM, such as LINQ To SQL does not know the results of a query it cannot determine whether two calls to a query (even if it is the same query as other users may have updated the Db) will return the same result set. For this reason it must always bring the result set back, and then check for the existence of those entities within the identity map. If the map contains the item, we must return that instance instead because you might have made changes to the entity and we want to preserve them througout the unit of work. For this reason the cache is not about optimisation but about ensuring you do not lose changes during your unit of work. It is possible that queries that request an entity by primary key could be retrieved from the map directly, where they are already loaded, but you should not rely on this optimization.

    Because we hold the object within the identity map for the lifetime of the unit of work there is a danger of concurrency errors, where another user updates the Db while we have the object. For this reason the identity map stores the original version of our object as well as the current one. This allows LINQ To SQL to compare the original against the state of the Db when it issues an update or delete query and raise an optimistic concurrency violation error if it has changed. Of course if you use a timestamp and set Update.Never on your mapping to inform LINQ To SQL that it should not check that field when looking for concurrency errors you can optimize this feature as well. However the optimal SQL issues by LINQ To SQL during an update still depends on knowing what has changed.

    LINQ To SQL supports a Refresh method on its DataContext to force a reload of an entity or attributes of an entity. The purpose here is to allow you to resolve optimistic concurrency errors or to reload an object during a unit of work if you are aware that the Db has been changed by a mechanism outside of the purview of LINQ To SQL. The Refresh just goes back to the Db to find the latest version of your object. Note that refresh targets specific objects in the map or a set of objects.

    You can disable use of the identity map by setting ObjectTrackingEnabled to false. The purpose here is to optimize when you are loading read-only collections i.e. you never want to submit changes on these objects back to the Db. Remember though that another DataContext will consider these to be transient objects, so avoid assigning references loaded this way into entities loaded via DataContexts which are tracking.

    Attaching Entities

    Sometimes we have an entity that is in the backing store but not in the identity cache of our DataContext. To make our DataContext aware that this is a persistent and not transient entity we need to Attach it to our DataContext. This puts it in our cache. However, our context cannot know if the entity is the same as the representation on the backing store, so it must assume so and change tracking will consider your object to be in the ‘original’ state.  If your object has changed and you try to save those errors, you will get optimistic concurrency errors. This is because when we compare your ‘original’ to the Db, they do not match, which fools LINQ To SQL into thinking another user has changed it since we loaded it.

    One option to avoid you must tell the DataContext what the original state in the Db was or set your columns as UpdateCheck.Never. If you have a Timestamp column, you can rely on that to do the right thing for you. Sometimes people suggest an Update.Never on every column strategy, where you cannot use a timestamp, but the danger is that we can overwrite genuine changes by another user.

    Otherwise we need to either provide the original, by maintaining the original state for any objects we may choose to detach, or adopt a load and replay strategy for detached objects where we load the current representation from the Db and then write our changes over it.

    Think carefully before heading down the detached objects route as it multiplies the complexity of what you are doing. This issue most often raises its head in multi-tier scenarios. We will talk about how to handle those in a future blog post, but for now recognize that the unit of work implies that the framework will not help you track changes outside of that context.

    Managing DataContexts

    Do not try to work with two different contexts at the same time. This is because what are persistent entities for one look like transient entities to the other because it does not have them in its identity map, as it did not load them.

    Do not try to access an object graph loaded via LINQ To SQL outside of its DataContext if it has lazy loaded properties. This is because LINQ To SQL will access the original DataContext to load the entities.  Trying to lazy load within another context falls foul of our earlier rule not to mix our contexts.

    Finally, assume that a DataContext is not thread safe i.e. work with a DataContext only on one thread and do not try to pass entities retrieved via a DataContext on one thread, to another thread.

    When working with a web application consider creating a DataContext per http request, using it to retrieve and then submit any changes required by the session. For a client-side application consider using a DataContext for each application transaction.

    While a DataContext is disposable, only dispose of it when you finish your request or application/transaction and are finished with the persistent entities that it loaded.

    Exercise caution around caching entities that were loaded via a DataContext. This is because when you access those elements they may still refer to the DataContext if they contain a lazy loaded association. If you want to use LINQ To SQL to load cached data, make sure you load objects that are not coupled to the DataContext, by not using EntitySet<> and EntityRef<> for associations, and disabling deferred loading on the context that you use to load them. This can be appropriate for reference data, in which case, you can also disable change tracking.

    Is LINQ To SQL deficient here?

    I read  a fair number of opinions that are suprised by the behavior of LINQ To SQL. However, having used ORMs for a number of years, I find LINQ To SQL conforms to my expectations as to how an ORM should behave. Indeed a reading of something as old now as Martin Fowler's Patterns of Enteprise Application Architecture and you will find exactly this pattern of behavior for an ORM discussed. Indeed WORM (Wilson O/R Mapper now open source BTW) and NHibernate both behave in a similar fashion. So some of this seems to be based on expectations that don't come from experience of using ORM tools. On a recent .NET Rocks there was an opinion expressed that LINQ To SQL was somehow only fit to be a RAD tool because of multi-tier issues. I can't agree with this opinion at all. LINQ To SQL has a similar feature set to WORM, on which I have built distributed enterprise applications. Its limitations relate to the diversity of mappings that it supports (table per concrete class in an inheritance hierachy or value types for example) and its lack of support for multiple Db vendors, not some percieved issues around the unit of work and identity map patterns which it implements.

  • Architecting LINQ To SQL Applications, part 6

    Previous: Architecting LINQ To SQL Applications, part 5

    Mapping with XML files instead of Attributes

    Greg Young pointed out in the comments to the last post that using attributes can clutter your domain objects. Although it is simpler to show attributes first, so that you can relate rolling your own mappings to the designer generated code, I did not want to leave the story incomplete without showing you how to move those mappings into a file in order to keep your domain objects clean.

    The correspondence between the mapping file and the attributes is straightforward. Instead of attributes we just have XML elements and instead of properties on those attributes, we have attributes on our XML elements.

    First of all we need to create a text file to hold our mappings. We call it Keysafe.map. Next we need to indicate the xml encoding:

    <?xml version="1.0" encoding="utf-8"?>

    Now we need to open up a Database element, which will form the root of our mapping.

    <Database Name="northwind" xmlns="http://schemas.microsoft.com/linqtosql/mapping/2007">
    </Database>

    Within the Database element we need to add a Table element for each entity we wish to map (equivalent to our [Table] attribute).

    <?xml version="1.0" encoding="utf-8"?>
    <Database Name="northwind" xmlns="
    http://schemas.microsoft.com/linqtosql/mapping/2007">
     <Table Name="dbo.Category">
     </Table>
    </Database>

    Because we are not using an attribute we have to tell LINQ To SQL what type our table maps to explicitly:

    <?xml version="1.0" encoding="utf-8"?>
    <Database Name="northwind" xmlns="
    http://schemas.microsoft.com/linqtosql/mapping/2007">
     <Table Name="dbo.Category">
      <Type Name="KeySafeDomain.Category">
      </Type>
     </Table>
    </Database>

    Then we need to map out our Columns. Again because we are not associating our attribute with a member, we have to explicitly indicate the member.

    <?xml version="1.0" encoding="utf-8"?>
    <Database Name="northwind" xmlns="
    http://schemas.microsoft.com/linqtosql/mapping/2007">
     <Table Name="dbo.Category">
      <Type Name="KeySafeDomain.Category">
       <Column Name="Id" Member="Id" DbType="Int NOT NULL IDENTITY" IsPrimaryKey="true" IsDbGenerated="true" UpdateCheck="Never" AutoSync="OnInsert" />
       <Column Name="Name" Member="Name" DbType="NVarChar(50) NOT NULL" CanBeNull="false" UpdateCheck="Never" />
       <Column Name="ParentId" Member="ParentId" DbType="Int" UpdateCheck="Never" />
       <Column Name="Version" Member="Version" DbType="rowversion NOT NULL" CanBeNull="false" IsDbGenerated="true" IsVersion="true" AutoSync="Always" />
      </Type>
     </Table>
    </Database>

    As before the DbType information is there to help us generate the Db from our domain model.

    We also need to map out the associations between our classes. Again the conversion between the attribute based model and our XML model is straightforward.

    <Association Member="Children" Storage="children" ThisKey="ParentId" OtherKey="Id"/>

    <Association Member="Parent" Storage="parent" ThisKey="ParentId"/>

    <Association Member="Systems" Storage="systems" OtherKey="CategoryId"/>

    At this point it is just grunt work to translate our previous attribute based mappings into an XML mapping file.

    In the end our mapping looks like this:

    <?xml version="1.0" encoding="utf-8"?>

    <Database Name="northwind" xmlns="http://schemas.microsoft.com/linqtosql/mapping/2007">

    <Table Name="dbo.Category">

    <Type Name="KeySafeDomain.Category">

    <Column Name="Id" Member="Id" DbType="Int NOT NULL IDENTITY" IsPrimaryKey="true" IsDbGenerated="true" UpdateCheck="Never" AutoSync="OnInsert" />

    <Column Name="Name" Member="Name" DbType="NVarChar(50) NOT NULL" CanBeNull="false" UpdateCheck="Never" />

    <Column Name="ParentId" Member="ParentId" DbType="Int" UpdateCheck="Never" />

    <Column Name="Version" Member="Version" DbType="rowversion NOT NULL" CanBeNull="false" IsDbGenerated="true" IsVersion="true" AutoSync="Always" />

    <Association Member="Children" Storage="children" ThisKey="ParentId" OtherKey="Id"/>

    <Association Member="Parent" Storage="parent" ThisKey="ParentId"/>

    <Association Member="Systems" Storage="systems" OtherKey="CategoryId"/>

    </Type>

    </Table>

    <Table Name="dbo.ITSystem">

    <Type Name="KeySafeDomain.ITSystem">

    <Column Name="CategoryId" Member="CategoryId" DbType="Int NOT NULL" UpdateCheck="Never" />

    <Column Name="Comments" Member="Comments" DbType="NVarChar(4000)" UpdateCheck="Never" />

    <Column Name="Name" Member="Name" DbType="NVarChar(50) NOT NULL" CanBeNull="false" UpdateCheck="Never" />

    <Column Name="Id