On partitionning .NET code

A few months ago I published two articles on SimpleTalk on how to partition .NET code, first with assemblies, second with namespaces inside assemblies. I repackaged this content in two white books available in the NDepend documentation.

Partitioning .NET code is a topic that interests me for a long time. I consider it being the key to prevent code structure chaos (spaghetti) and to achieve higher code maintainability and quality. In the past years, on the present blog many posts about partitioning .NET code have been written, and these two white books represent the compilation of this content (8+7 printed pages).

When I browse the web, it seems to me that few developers question themselves about how to partition their .NET code base, through physical and logical components. It is commonly and tacitly accepted that the concept of VS project and .NET assembly is the right artifact to partition code and define components boundaries. But doing so leads to major frictions and productivity loss, like spending minutes to compile C# or VB.NET code instead of seconds. For the one that questions about how to partition code, there is a consequent Return On Investment waiting just here.

This topic involves many aspects of development. Hence choosing a way of partitioning code can only be the result of several choices and compromises. In the two white books the choices exposed are based on a simple idea: having as few assembly as needed (ideally one), each assembly being filled with several namespaces that represent the components boundaries, the namespaces’ dependency graph being acyclic. Doing so has proven being successful and cheap to achieve and maintain in many occasions (for example, see the recent Hendry Luk post on refactoring the SheepAop code base). I hope this content can be useful to you as well, and even if you don’t end up refactoring your code this way, you certainly won’t waste your time mulling over the ideal strategy you and your team should apply to partition code and avoid chaos.

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Patrick Smacchia

    Fred, your experience is representative of what I saw in many real-world .NET dev shops: hundred(s) of VS projects/assemblies that end up in a huge waste of time, without bringing any value.

    Btw, have you tried the Copy Local = false trick to decrease the compilation time?

  • Fred Steffen

    Hey, Ralf…

    I work at a shop with 99 assemblies (and counting). In order to effect one change in our controller, I have to traverse at least 9 files in at least 6 assemblies. We have “Loose Coupling” and “Single Responsibility”, we use “Dependency Injection”, etc… And imo, it’s been a huge waste of time. The developers are given SSD’s, since even on my 8 core 16gb ram work machine, if I didn’t use an SSD, it’d take 5 minutes to compile. Mind you, none of these assemblies are used in other applications. And, if we did want to use them, there are soo many dependencies, that even adding one feature of our application would mean referencing dozens of assemblies.

    I for one believe strongly in creating a physical separation for a reason, like encapsulating functionality for easy portability between applications. I prefer namespaces for logical grouping of code.

  • jboarman

    What about using a static constructor instead of the whole “submain()” concept?

  • Marc Greiner

    Hi Patrick,Your explanations are crystal clear.
    Thanks for your patience and pedagogy in trying to explain this, I personally would have given up long ago.

  • Bruno Martínez

    You should forbid inlining of SubMain to make sure other assemblies aren’t loaded early:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void SubMain() {

  • Anonymous

    You praise black box and separated contracts. On my side I praise that the source code is the design.

    Have you noticed how often developers use tools like Reflector to decompile black boxes (such as .NE Fx impl) ? Let’s imagine a Contains() method presented by an abstracted collection implemented in a black box. In such situation my problem would be that I don’t know if Contains() is a O(N) or O(1) method. Sure this might be detailed in documentation somewhere, or even in a performance test annotation. But the source code is the design: in such situation I’d jump straight to the possible implementation(s) and investigate by myself.

    Such subtle implementations details that can have drastic effects at runtime are more the rule that the exception. Each line of code is a potential enemy. Calling Contains() on a hashet or a on a list can have dramatic effect on performance. I prefer to know my enemies, and not having them luring in hidden implementations such as black boxes.

    Separating Concerns is an excellent principle to keep business code focused on business rules. But using conjointly SoC + black box is risky. To illustrate this risk consider the famous ORM N+1 problem. The N+1 problem is harmful only for developers who consider the ORM as a black box that delegate the concern of persistence. Educated developers avoid the N+1 problem because they know how ORM work. And then, by definition, the ORM is not a black box for them.

    You said “I hardly ever suffer from semantic breaking changes” while using the black box + separated contract approach. Certainly it is related to your talent and experience. But for less experienced developers, ‘N+1 like’ problems luring in black boxes necessarily provoke pain.
    What I am proposing in white books is not zero focus. In white book 2, I explain that independent features implementations should be kept separated. And thanks to the Dependency Structure Matrix, I even explain how this separation can be visualized (or enforced with some rules). Dividing feature impl into smaller components lead to an even narrowed level of focus. Hence the level of focus for developer is bounded by feature, and often sub-feature, implementation size.
     

  • Ralf Westphal

    You´re right, there hardly is agreement in the literature on what a component is. So might as well define it yourself ;-)

    You did that. And I did that. Here´s my definition:

    -Binary functional unit
    -with physically separated contract.

    Why´s that? Because this to me seems to be the least common denominator of most definitions. And it´s sufficiently distinct from other concepts like class or module (whatever that is ;-).

    Most importantly, though, this definition carries with it the notion of a black box. A physical black box that is. And this black box can be developed in parallel with other black boxes. And it can be separately deployed (read: reused). (However, reuse to me is not that important. I´m more concerned with developer efficiency.)

    For your first white book I take away a sole goal: Increase Visual Studio solution compilation performance.

    That´s of course a legitimate goal – but one I don´t share. In fact I haven´t come across a situation where VS was too slow for any project. So why jump through any hoops to increase speed – and lose other benefits?

    You say: “Too much focus necessarily increases entropy.” Hm… if anything I´d say, too much focus decreases entropy. Because if entropy is low, then structuredness is high.

    You say: “and not taking account of the context leads to performance and architecture aberrations. It also leads to development friction like syntactic and semantic breaking change discovered late on the build machine, at smoke test or production time, after source code has been committed.”

    Strange, I never suffer from syntactic break changes on a build server. Syntax is enforced by contracts. And contracts are checked for every component. Remember: Contracts are separate.

    And I hardly ever suffer from semantic breaking changes. But that might be a result of explicit code design. I don´t just code away using TDD.

    And I sure do not ever want to just run the full app on my machine just because I added something to a component. That would be like a worker in a car assembly line jumping into the car after putting on headlights to test drive it.

    Integration tests are not casually run by developers. Why should they? I have no need for such “control freakishness” in a streamlined software production process.

    .NET Remoting: Sure, that was not the pinnacle of remote communication. Neither was CORBA. But why? Because VS solutions were too small? No, because developers had no clear understanding of what software architecture means and what the “8 fallacies of distributed computing” meant. Remote communication will not be solved with tools but by understanding.

    You say: “Because physical artifacts are costly”

    Here again it´s compilation speed you´re concerned with. But what is your root problem?

    You say: “Imagine the mess if each of the monster assembly was divided into hundreds of smaller ones. We’ll have to deal with a .NET Fx made of thousands of assemblies.”

    This again is not about pro/con of developing code in many solutions. It´s about using/deploying many assemblies. But that´s a different problem altogether. 

    If usage/deployment is your concern, why not merge assemblies? You can even sign them. Your namespaces are retained.

    I´m not against that. As I said, I call it packaging.

    You say: “Loose coupling everywhere decreases productivity since abstractions and corresponding mocking impl have to be created and maintained”

    Interestingly enough I don´t even use a mocking framework anymore. I very, very rarely have to mock classes.

    And loose coupling “just happens” through my design method. It´s just enough to produce highly evolvable software.

    You say: “breaking changes management was a nightmare”

    I don´t know what the root problem of the team you visted was. I suspect they did not really have a good explicit architecture/model and their production process was ad hoc. Also 900 assemblies for just 1 application seems too much to me; probably abstractions missing.

    But I sure would not have told them to throw everything back into a single solution.

    If you only realize a breaking change when compiling/integrating, then there sure is sth wrong with your understanding of the whole application. So putting all back into a single project is just a symptomatic cure.

    You say: “Look at the MS pain for not introducing breaking changes in the .NET Fx across versions.”

    That´s a different issue. It´s about frameworks with unknown users. No “source in one project” will help you there.

    You say: “If several applications use a shared assembly, I’d strongly advise each application have its own version.”

    I agree. The GAC is overrated.

    -Ralf

  • Anonymous

     I would use feature grouping (the second proposed partition). Page 4 of
    the second white book, there is a dependency matrix on the namespaces of
    NDepend.UI. You see there all features implemented through a super
    component. The bulk of the code is in the multiple features impl and
    feature are living side-by-side (feature A doesn’t call feature B, else B
    is no a feature but a service for A).

    But for example
    CodeContexMenu or ContextSensitiveHelp are actually services shared
    across by many features (each usage is a blue cell in their respective
    row, 18) and 19) ).

    Hence the first proposed partition is fine as
    long as Commands and Services are shared by many different features,
    they then become low-level shared util/helper.

  • Anonymous

     I would use feature grouping (the second proposed partition). Page 4 of
    the second white book, there is a dependency matrix on the namespaces of
    NDepend.UI. You see there all features implemented through a super
    component. The bulk of the code is in the multiple features impl and
    feature are living side-by-side (feature A doesn’t call feature B, else B
    is no a feature but a service for A).

    But for example
    CodeContexMenu or ContextSensitiveHelp are actually services shared
    across by many features (each usage is a blue cell in their respective
    row, 18) and 19) ).

    Hence the first proposed partition is fine as
    long as Commands and Services are shared by many different features,
    they then become low-level shared util/helper.

  • Anonymous

    >The notion of component is a logical one? Please refer me to a definition of component where that´s stated. I don´t know of any. See here for example: “A software component can be deployed independently and is subject to composition by third parties.” (from here: http://de.wikipedia.org/wiki/Komponente_(Software)).T

    There is no such thing as a clear definition of what a component is. The 624 pages of the book Component Software http://www.amazon.com/gp/product/0201745720 by Clemens Szyperski are not enough to give a definition of what a component is. When it comes to deployment in .NET I prefer o use the common term assembly, .dll and .exe.

    So to quote the white book, I consider a component to be: an aggregate of classes and associated types ; a unit of development and test ; a unit of learning for developers ; a unit of architecture ; a unit of layering. None of these definitions imply to pay the price of a physical artifact, hence I deem a component to be a logical concept. This is my own definition and I develop further a way of partitioning code based on this clear definition.

    >You say, “I don’t see any difference in sharing a set of classes encapsulated in a namespace in a  largeVS project, or sharing them through a single small VS project??” But in fact there is a huge difference: focus. If you code on a small VS project containing just a single component you focus much more. You feel relieved from noise created by all that code which is not related to your current task.

    Too much focus necessarily increases entropy. The code currently written lives in a larger context (callercallee, statefullstateless, error handling through exception..), and not taking account of the context leads to performance and architecture aberrations. It also leads to development friction like syntactic and semantic breaking change discovered late on the build machine, at smoke test or production time, after source code has been committed.

    Too much focus reminds me like when 10 years ago the community was enthusiast about remote object. The developer was able to focus on code logic. Communication logic was completely removed. The developer didn’t care if each method call was remote or local. We know the result, performance of applications were completely rooted since a remote call is many orders of magnitude slower than a local call.

    >Also if component is an important concept to you to separate code into distinct units (larger than classes), why wouldn´t you underline this logical separation with a physical separation à la “form follows function” (or more like “form follows purpose”).

    Because physical artifacts are costly, a component is logical and it is better to use a lightweight logical concept to keep things light. The cost of an assembly creates slow down at compilation, at CLR loading time, at deployment time (it is easier to manage 10 files instead of 900 files, as I’ve seen last year during a consulting session). The .Net Fx is spreaded around 130 or 150 assemblies, with many monster assemblies like mscorlib, System, PresentationFramework, System.Core… Imagine the mess if each of the monster assembly was divided into hundreds of smaller ones. We’ll have to deal with a .NET Fx made of thousands of assemblies.

    >Compiletime: I know a single VS solution (or a single project) compiles faster than multiple solutions. But why does that matter? Who´s compiling code anyway? And why?
     I guess your premise is that every developer is always working on the whole codebase and thus needs to compile it every couple of seconds after adding some small changes to run tests. But while that might be the default nowadays for most developers it´s by far not the best way to develop software. When I develop software, I just work on single components at a time. That means I sit before a VS solution with 2+ projects with maybe a couple of thousand lines in them. Not more. I focus on my tasks related to that component. I´m not distracted by code I should not care about right now. The physical isolation makes very clear the logical separation of the code. My compile times are fast. And I don´t care about the whole system. Not a single bit. Integration is done on some server. I just see to that I write flawless code according to my task list. I simply don´t need a fast machine.

    Here again too much focus is not necessarily what I want. It is intellectually rewarding to put abstractions gate at every stage to have super loose coupled code base. But in the practice:

    ->There is still a bunch of low-level domain and util/helper that will be used by most of the components (hence focusing here is an illusion)

    ->Focusing increases entropy, performance and architecture aberration appear since developer has a poor knowledge of caller/callee of the component they are focusing on

    ->Loose coupling everywhere decreases productivity since abstractions and corresponding mocking impl have to be created and maintained

    Personally I am a big fan of loose coupling, but not everywhere. It shouldn’t be systematic. Abstraction is not a dogma.

    >Also by developing in this way all the work can nicely be distributed among any number of developers. No merge conflicts occurring when working with a repository.

    What about syntactic/semantic/performance/error handling breaking changes? In the real-world company I visited last year with 900 assemblies and dozens of developers working concurrently, breaking changes management was a nightmare (and also the reason I was solicited actually). By working with a few large VS projects, the bulk of breaking changes would be eliminated, or at least anticipated, before source commit. Obviously not all breaking changes and especially those resulting from merge conflicts, can be eliminated locally if there are dozens of developers working concurrently around.

    >Sure, I will also use namespaces to structure my classes logically and within my many VS component projects. But why stop there? Why shouldn´t I gain separate deployabibility at the same time for my components? How can I reuse code in a namespace in some solution sitting in one huge assembly in another solution?

    If one needs code sharing amongst several applications we are talking of physical deployment and physical reusability. Using a physical artifact like assembly is then fine.

    But taking account of versioning, binaryphysical sharing is something extremely costly and painful to achieve. Look at the MS pain for not introducing breaking changes in the .NET Fx across versions. If several applications use a shared assembly, I’d strongly advise each application have its own version. Many of OSS projects use Log4Net, but Log4Net is not shared in the GAC, each OSS project comes with its own version of Log4Net.
     

  • http://twitter.com/glencooper glencooper

    Great articles – thank you for putting these together.  I would be interested on your thoughts around namespaces and generic notions of “Commands” or “Services”.  I tend to see a namespace such as “Commands” create all command classes for an application.  Obviously with a large application this can grow quite large so they are typically organized into sub-namespaces.  However, for any given area of your application you are likely to have Commands, Services, Events, etc.  So would you recommend grouping them under a namespace for that area and then sub-namespaces for commands and services?

    i.e.

    MyApp.Commands.FeatureArea.DoSomething
    MyApp.Services.FeatureArea.SomeService

    OR

    MyApp.FeatureArea.Commands.DoSomething
    MyApp.FeatureArea.Services.SomeService

  • Ralf Westphal

    The notion of component is a logical one? Please refer me to a definition of component where that´s stated. I don´t know of any. See here for example: “A software component can be deployed independently and is subject to composition by third parties.” (from here: http://de.wikipedia.org/wiki/Komponente_(Software)).

    You say, “I don’t see any difference in sharing a set of classes encapsulated in a namespace in a  largeVS project, or sharing them through a single small VS project??” But in fact there is a huge difference: focus. If you code on a small VS project containing just a single component you focus much more. You feel relieved from noise created by all that code which is not related to your current task.

    Also if component is an important concept to you to separate code into distinct units (larger than classes), why wouldn´t you underline this logical separation with a physical separation à la “form follows function” (or more like “form follows purpose”).

    Compiletime: I know a single VS solution (or a single project) compiles faster than multiple solutions. But why does that matter? Who´s compiling code anyway? And why?

    I guess your premise is that every developer is always working on the whole codebase and thus needs to compile it every couple of seconds after adding some small changes to run tests. But while that might be the default nowadays for most developers it´s by far not the best way to develop software.

    When I develop software, I just work on single components at a time. That means I sit before a VS solution with 2+ projects with maybe a couple of thousand lines in them. Not more. I focus on my tasks related to that component. I´m not distracted by code I should not care about right now. The physical isolation makes very clear the logical separation of the code. My compile times are fast. And I don´t care about the whole system. Not a single bit. Integration is done on some server. I just see to that I write flawless code according to my task list. I simply don´t need a fast machine.

    Also by developing in this way all the work can nicely be distributed among any number of developers. No merge conflicts occurring when working with a repository.

    Sure, I will also use namespaces to structure my classes logically and within my many VS component projects. But why stop there? Why shouldn´t I gain separate deployabibility at the same time for my components? How can I reuse code in a namespace in some solution sitting in one huge assembly in another solution?

  • Anonymous

    Ralf, please read the white books, especially the first one on assemblies, it brings precision on my stances on all these questions. To answer shortly:

    >Why would I want to put my whole codebase of 500,000+ lines into as few
    assemblies (aka VS projects) as possible?

    Coz assembly is a physical container while the notion of component is logical (hence more fine grained). Using a physical artefact to implement a logical concept provokes friction point, like very long compilation time or dozens of assemblies to reference just o use an API.

    >How does that help
    collaborative coding?

    I don’t see any difference in sharing a set of classes encapsulated in a namespace in a  largeVS project, or sharing them through a single small VS project?? (except that namespaces come with the advantage of using a hierarchy of components). Isn’t the unit of sharing the source code file?

    >How does that help to reduce compile time?

    This is completely described in the first white book (try it, you can gain more than 10x faster, guaranteed), have you ever tried to set Copy Local = false?

    >(And
    why would that be important?)

    Because when VS slows down (like taking a few minutes for compilation), the developer loose the productivity flow and go on facebook or take a coffe break 20 times a day. There is no question that any development tool should never make the developer wait ideally more than a second (realistically more than a few seconds) for any action.

    >How does that help to support major
    principles like SoC or “loose coupling”?

    This is not related, if you want physical ‘loose coupling’ then you define your abstractions in a dedicated assemblies (physical in the sense the 2 sides of the code run on different tiers in different processes), if you just need logical ‘loose coupling’ then defining abstractions into a low level namespace is more than enough (logical in the sense, it is just here to have a decoupled code architecture).

    The same remark applies for Separation of Concerns, it is not related to the artefact used to define components boundaries (assembly, namespace or something more exotic)

  • Ralf Westphal

    Why would I want to put my whole codebase of 500,000+ lines into as few assemblies (aka VS projects) as possible? How does that help collaborative coding? How does that help to reduce compile time? (And why would that be important?) How does that help to support major principles like SoC or “loose coupling”?

  • Anonymous

    Felix, thanks for the great post from Arik, our soluion are actually complementary. Using
    Build Projects in Parallel and using RAM Disk can be applied on top of the code refactoring I propose, to have a super fast compilation process, certainly under 3 or 4 seconds for a large VS solution. Arik also notes how Copy Local = True is harmful for performances!

    Concerning having several webapplication and several WCF, this sounds clearly as a physical reason to separate them in different VS projects. What happen however, is that these projects should just define the UI or communication parts. The business logic is then encapsulated in one (or at most very few) core assemblies shared amongst UI or communication parts.

  • felix

    i have found another post on these  theme with other interisting idea.
    http://blogs.microsoft.co.il/blogs/arik/archive/2011/05/17/speed-up-visual-studio-builds.aspx
    what do you thing about this?

  • felix

    Interisting…but what about a solution with several webapplication and several WCF? And what about publishing when on dev enviroment all run on the same machine and in production environment all will be potentially on different machine?