The Don’t Repeat Yourself Principle and the Wormhole Anti-Pattern

Getting back on track with the "Maintainability" series of posts.  I'm doing this way too late at night, so the coherence might be lacking.

Don't Repeat Yourself

Don't Repeat Yourself (DRY) is a statement exhorting developers to avoid duplication in code.  Duplication isn't always the easiest thing to spot or even prevent.  From the Pragmatic Programmers:

DRY says that every piece of system knowledge should have one authoritative, unambiguous representation. Every piece of knowledge in the development of something should have a single representation. A system's knowledge is far broader than just its code. It refers to database schemas, test plans, the build system, even documentation.

Duplication is an obvious problem for maintenance, but there's a secondary meaning to the DRY Principle.  When I'm adding an all new feature to a system with new classes, database mappings and tables, new screens, web services, etc. I want to make the change with the fewest steps possible with a minimum of repetition.  I want to tell the system what I want to happen, and I want to say it only once.  More on that second meaning later.

 

Duplication Retards Change

For the upcoming (soon, knock on wood) StructureMap 2.0 release, I got in and added support for generic templated types.  It was nasty.  It wasn't really nasty because of Generics, it was nasty because I blundered with this innocuous looking code:

    return _pluginType.FullName;

In some spots it was useful or necessary to identify a .Net Type with a string value and early on I fell into using the full class name as a convention.  I then promptly duplicated that simple Type.FullName logic over 70 times in the codebase.  Flash forward 3 1/2 years to the new Generics support, and I needed a way to go from a string to a type.  The obvious answer was to finally change to using assembly qualified names.  It took me about 6-8 hours total to make that one little change because of the stupid amount of duplication I had introduced with the FullName logic.

Some other cases:

  • Multiple applications, or even subsystems of the same system, reading and writing to a shared database.  You almost inevitably end up with duplicated work to read, write, validate, and interpret the exact same data.  Think about a column in a database that represents the status of some sort of work item.  The logical entity represented by this row has different constraints and business rules depending upon what the value in that status column.  If you have more than a single piece of code that "knows" how to interpret that status value, you have duplication, and a particularly pernicious sort of duplication because it's hard to spot by looking at any one codebase.  Just as a warning, coding in a data-centric manner can open the door to a great deal of harmful duplication.  Ask yourself, if the database structure or status field changes, how many other pieces of code have to be changes? 
  • Reading and writing values from the HttpContext in ASP.Net.  This little bit of code represents a great deal of potential duplication (even if you eliminate the Magic Number Antipattern):  string something = (string) HttpContext.Current.Session["something"];  What if you want to change your state management strategy altogether?  You'll have to change every single piece of code that dipped directly into HttpContext.
  • In .Net applications, you often need to use a subclass of System.Text.Encoding when converting byte arrays to strings or vice versa.  In an application I worked with there were 67 different references to the ASCIIEncoding class.  Why do I distinctly remember this number you might ask?  Because we needed to localize the application to a Unicode encoding and I found out quickly that the change was going to lead to considerable change and effort to hunt down and make all the necessary changes.  If the character conversion code had been more centralized into some sort of helper class, that change could have been easier.

 

Stop Duplication in its Tracks

The worst case I've ever observed was a factory automation system.*  The system was originally built to pull upcoming factory build jobs from a MQSeries queue, go through a series of business rules, then determine the proper routing and push the new directions to other MQSeries queues.  Fine and dandy, until the day that the factory needed to start the basic process manually from a client application on the factory floor.  The developers decided to recreate the business rules portion of the existing code, rule by rule, and created a new implementation of the business rules for the new client.  I spent some time learning about both components, and it was very apparent that the new code was better structured, but trouble was right around the corner.  It's easy to guess what happened next.  Those particular business rules were volatile, but only now you had to make functionally equivalent rules changes in two very different components.  The system became harder to maintain and extend.

The duplication was created purposely because the team felt that the original code was just too hard to reuse because the business rules and the workflow was deeply intertwined with the code that called into MQSeries.  They didn't have any test automation to catch regression bugs, and the system was hard to deploy, so modifying the existing code was quite risky.  If the original code had been much more orthogonal between business rules and the communication infrastructure, they might have been able to simply write some new glue code to interact with the existing code.  If the system had been backed up with a software ecosystem of effective build automation and comprehensive test automation coverage, the team would have been much better positioned to morph the existing code into a structure that would allow for reuse between both the automatic MQSeries mechanism and the newer manual client process.

Part of the reason duplication creeps into code is the ease of copy/paste/modify operations to create new code.  Runaway "IDE inheritance" (copy/paste/edit coding, I couldn't find a link) can lead to a system that's very difficult to maintain.  Sometimes developers do the copy/paste/modify trick because the original code isn't quite what they need in the second case.  It definitely requires some skill and experience, but in the "not quite what I need" case, I'd much rather a developer take a little time to refactor out the common pieces first before making the second set of changes.  Refactoring is perhaps more work than copy/paste in the short term, but stamping out duplication can only help in the longer run.  Refactoring is an invaluable skill that's well worth your time.

 

The Wormhole Anti-Pattern

Bill Caputo wrote a good description of the Wormhole Anti-Pattern that so commonly afflicts enterprise software systems.  Roughly stated, I would define the wormhole as all of the stages a piece of data goes through to get from the database to the screen or service interface and back again.  When the wormhole gets long and involved, your development work is going to be a struggle — hence the "Anti-Pattern" designation.

As an almost canonical example, my first official job in software was supporting a data integration between a third party engineering application and a downstream construction application.  Between the two databases, a flat file report, two rule files, and the Tibco definitions, I counted 8 different variable names and mappings for a single piece of data along the data exchange.  The big problem was that I had to change that mapping pretty frequently — and that meant following the path through all 8 steps.  Needless to say, that code was very difficult to troubleshoot and modify.  Of course I made all of the modifications in production to support ongoing engineering projects because there wasn't any such thing as a development environment;)  If you're a thrill seeker, nothing is more exciting than coding in the production environment while it's live.

To apply the Wormhole Anti-Pattern to your architecture efforts, think about how many steps you would have to go through to get a new element on a screen persisted in the database.  Or to add a new feature to your application.  If the thought of jumping through a lot of Xml configuration hoops or database metadata setup or the sheer number of changes gives you pause, you may be exhibiting the Wormhole Anti-Pattern.  At that point you need to start working towards eliminating or combining some of the steps to shorten your wormhole.

Just for comparison, we had to add some fields to a screen after it was built one week.  Here is the wormhole we have to go through on my current project.  I've had worse, but this is more than enough:

  • Element on the screen
  • Property on a Domain class
  • Property on at least one Data Transfer Object (DTO)
  • Mapping from  DTO to Domain class in the client
  • Repeat on the server side, but differently
  • Change unit tests
  • Add new field to FitNesse tests

 

In line with the Wormhole Anti-Pattern, you might also check out the Shotgun Surgery code smell.  If you constantly make a repetitive set of changes to the same classes anytime one changes, it might be a sign that you should shorten your Wormhole by collapsing the class structure down into fewer pieces to consolidate related code into a more cohesive structure.  Your goal is to enable changes to your application to be made in fewer mechanical steps.

 

I only want to tell you this once!

Going back to the previous section on The Wormhole Anti-Pattern, the second, more proactive goal of the DRY Principle is to express changes in as few steps and places as possible.  My thinking in regards to the quality of a  system architecture has changed quite a bit from my brief exposure to Ruby on Rails.

From Nico Mommaerts,

One of the selling points of Rails is that it is built with the DRY principle in mind. DRY stands for Don't Repeat Yourself, meaning that every piece of your system is described once and only once, which should make development and maintenance a lot easier since there is no need to keep multiple parts of the code in sync. Hand in hand with DRY goes 'Convention over Configuration', another one of Rails' core philosophies. Rails uses a set of code and naming conventions that when adhered to eliminates the need for configuring every single aspect of your application. Only the extraordinary stuff needs to be configured, like legacy database schemas or other resources you don't control. Using these two philosophies, DRY and 'Convention Over Configuration', Rails lets you write less code AND more features in the same time as with a typical Java or .NET application, with easier maintenance afterwards.

Even if you're never going to code in Ruby or build web applications, take a look at how Rails puts the various pieces together to eliminate repetition in code and configuration.  A good design allows for minimizing the amount of repetitious information.

DRY-ing out StructureMap

After seeing how Ruby on Rails works, it made StructureMap feel just a little shabby in some places.  Here's a specific example, one of the features in StructureMap is the ability to define configuration profiles and easily switch between them.  Typically, I like to use this feature to handle environmental differences between development, testing, and production.  There's a lot more to the functionality, but for now let's just look at the configuration needed for just a single IService today. 

Look how ugly this is in general (couldn't get CopyAsHtml to format this for some reason), and the duplicated information between the Profile nodes, the PluginFamily nodes, the Plugin nodes, and the Instance nodes.

<StructureMap MementoStyle='Attribute' DefaultProfile='Development'>
<Assembly Name="SomeAssembly"/>

<Profile Name="Production">
<Override Type="SomeAssembly.IService" DefaultKey="Production"/>
</Profile>

<Profile Name="Testing">
<Override Type="SomeAssembly.IService" DefaultKey="Testing"/>
</Profile>

<Profile Name="Development">
<Override Type="SomeAssembly.IService" DefaultKey="Development"/>
</Profile>


<PluginFamily Type="SomeAssembly.IService" Assembly="SomeAssembly">
<Plugin Type="SomeAssembly.ConcreteService" Assembly="SomeAssembly" ConcreteKey="Concrete"/>

<Instance Type="Concrete" Key="Production">
<Property Name="host" Value="PROD-SERVER"/>
<Property Name="port" Value="5050"/>
</Instance>

<Instance Type="Concrete" Key="Testing">
<Property Name="host" Value="TEST-SERVER"/>
<Property Name="port" Value="5050"/>
</Instance>

<Instance Type="Concrete" Key="Development">
<Property Name="host" Value="localhost"/>
<Property Name="port" Value="2000"/>
</Instance>
</PluginFamily>

</StructureMap>

A major part of my work for StructureMap 2.0 has been ease of use, and that has meant eliminating the duplication and mechanical steps in configuration.  Below is the exact equivalent of the profile in StructureMap 2.0:

 

<StructureMap MementoStyle="Attribute" DefaultProfile="Development">

 

  <Assembly Name="SomeAssembly"/>

 

  <Profile Name="Production">

    <Override Type="SomeAssembly.IService">

      <Instance PluggedType="SomeAssembly.ConcreteService,SomeAssembly" host="PROD-SERVER" port="5050"/>

    </Override>

  </Profile>

 

  <Profile Name="Testing">

    <Override Type="SomeAssembly.IService">

      <Instance PluggedType="SomeAssembly.ConcreteService,SomeAssembly" host="TEST-SERVER" port="5050"/>

    </Override>

  </Profile>

 

  <Profile Name="Development">

    <Override Type="SomeAssembly.IService">

      <Instance PluggedType="SomeAssembly.ConcreteService,SomeAssembly" host="localhost" port="2000"/>

    </Override>

  </Profile>

 

</StructureMap>

 

All I really did was enable a user to make all the configuration inline in the Profile node itself.  Just doing that took down the number of moving parts and centralized the semantic meaning of the profile configuration into one spot instead of being spread out throughout the Xml file.  The underlying model of StructureMap is unchanged, only the configuration code got more sophisticated to streamline the user experience.

 

 

More than the Code

Anytime you talk about improving the way you create software it's very hard to treat coding, design, process, and infrastructure as separate topics because they're all tightly intertwined.  You definitely want to apply the DRY Principle to your change management.  Here are a couple examples of what I mean:

  • Long lived code branches.  A temporary branch that's short lived for production support or a risky change is one thing, but a long lived branch essentially represents a whole new system.  I've seen a couple smaller product companies jeopardize their very existence by maintaining and extending customer specific branches of their system.  Hot fixes and newly demanded features often had to be implemented several different times on somewhat divergent versions of the same code.  Long lived branches need to be treated as a last resort.  If there's any possible way to arrange your system to allow for customer specific features and customizations while maintaining one version of the core code, your company will be far better off.  Microkernal designs with IoC engines (like StructureMap) can help.  Orthogonal code will help by creating plenty of seams to allow for customization.  Build and test automation makes changing code much less risky.
  • WSDL or XSD schemas for integration.  We hit this on my current project.  Our new .Net client communicates with the existing Java server platform by sending Xml messages over a stateful socket.  Quite naturally, we devolved into using XSD schema's to describe the contract of the messages.  Great, we use the XSD.exe tool in .Net to codegen DTO classes on one side, and JAXB to do the same on the Java side.  Both codebases need to have a copy of the XSD's, and that's what we did.  A copy in the .Net SVN repository and another in the Java CVS repository.  Needless to say, any change in schema from either side requires the XSD's to be copied back and forth.  This situation has caused us no small amount of pain from mismatches in the Xml definitions.  One way or another, the XSD definitions from .Net to Java need to be locked together automatically to shut down the potential discrepancies.

 

 

The Highlander Puts it all into Perspective

Bellware thought this was an awful analogy, so I absolutely have to use it.  If you're a big fan of the cult movie Highlander (and who isn't?), this will put it all into perspective.  The main characters in the movie were all striving to be the last one standing to win the "Prize."  As Christopher Lambert and Sean Connery intone constantly throughout the movie, "there can be only one!"

The Don't Repeat Yourself Principle is "There can be only one!" (expression of any rule or functionality that could conceivably change)

Unfortunately I've been on and seen a couple projects where the basic architecture just didn't allow for outwardly small changes to be made efficiently.  A nasty case of a Wormhole, plus clumsy or inefficient build processes, can make the simple addition of an extra piece of information from persistence to user interface turn into a living hell.  In the Highlander, there is a scene where the bad guy, the "Kurgan" played by Canadian character actor extraordinaire Clancy Brown, wins a sword duel and disembowels Sean Connery's character.  As the Kurgan twists the sword to inflict more pain he utters the line "it hurts, doesn't it!"  When a request for a small change comes across your desk and all you can think about is all of the painful and tedious work it will take to get that change done, that's what I call the "Kurgan Moment."

To wrap up, the Highlander and DRY good, Kurgan and Wormhole bad.

 

 

Appropos of nothing here, Locke is easily the best character on Lost.

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at http://codebetter.com/jeremymiller.
This entry was posted in Maintainability, StructureMap. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://codebetter.com/blogs/jeremy.miller jmiller

    Jacob,

    You’re absolutely right, and we’ve talked about it, but moving the server code into SVN has turned into a “get around to it” project that just isn’t happening soon.

  • Jacob Atzen

    Jeremy wrote: A copy in the .Net SVN repository and another in the Java CVS repository.

    In my twisted mind keeping two different versioning systems is also a kind of violation of DRY. If you used Subversion for everything the problem would as easy as configuring an external. I guess you should even keep your tools dry.

  • http://codebetter.com/blogs/jeremy.miller jmiller

    or just make OODB’s a practical reality

  • Yoni

    I truely believe there will come a day when adding a field will be as simple as adding a line of code which states “AddField()”.
    The database should be modified automatically in the next version update and the appropriate UI should find room and automatically include the new piece of data. All of this should be a direct result of the AddField method call… (or a similar mechanism)

    I am in the process of creating that future (though it may take a while ;-).

    Like Arnon said sometimes you need layers in-between but sometimes you simply don’t and for those times things sould be as simple as possible.

  • http://codebetter.com/blogs/jeremy.miller jmiller

    Chuggle,

    I can’t disagree with you at all, and like Arnon said above, you do need some layering to avoid tight coupling. 4 steps sounds pretty tight to me anyway, I was thinking more about our 8-10 step wormhole.

    Coding in a configuration file isn’t something I would recommend offhand. Almost anything is more expressive than an xml file.

  • Chuggle

    Another well written article J and certainly food for thought but Im not sure I agree 100% with this – lets say Ive just finished building my fantastic ASP.NET application and the business decides to add a new field (like we dont see that coming!). Well I have to add it to both the web page & database (as that is the business requirement) – no getting away from that.
    Say Ive used NHibernate – I’ll obviously have to add it to the hbm file and add an additional property to the domain abject this new field is a part of (or create a new class if required).
    I’ll then have to amend my service that feeds the UI to include the new field
    So all in all 4 or so steps – which you may class as a wormhole
    Im a big fan of KISS (no not the rather sad 80s rock group – rather Keep it simple stupid) and I think adding a new field across 4 well documented layers is a lot easier to understand than using configuration files (which normally confuses the hell out of newbies…..and seeing as programmers here come and go regularly its difficult enforce a common skillset) – yes you type a bit more but then thats what intellisense is for :)

    In my experience (what little there is) maintainability is often about writing code that “those that will come after you” can understand (unfortunately)

    Just my 2cents

  • http://www.rgoarchitects.com/blog Arnon Rotem-Gal-Oz

    On the flip side of the “wormhole anti-pattern” you have too few layers which will result in an architecture which is also problematic (at least for many cases)
    since if we go with this anti-pattern we will end up with binding a textbox to a dataset (or even a resultset) which is the result of a direct query to a database

    Arnon

  • http://joeydotnet.com/blog JoeyDotNet

    Great post as usual, Jeremy! I think Pragmatic Programmer should be *required* reading for every developer, bottom line. We’ve had to deal with the Wormhole anti-pattern quite a bit on a couple of our current Smart Client/WCF projects. I’ve seen adding an element to a screen pass through 7 or 8 layers before it gets to the database on some of these kinds of apps. Very tedious and is a maintenance nightmare.

    I’m working on a different project now, for which I’m probably going to be using MonoRail + ActiveRecord (as close to RoR as us .NET’ers can get). It’s simplicity is very refreshing.

    BTW, agreed about Locke. :) This show just gets crazier each year.