I think there is a clear distinction between re-structuring and re-factoring code. This is because of the nature of the OOP code which is made of 2 distinct artifacts: methods’ bodies and fields on one hand, and methods’ declarations, types’ declarations, namespaces’ declarations on the other hand. I would say that working on members bodies is re-factoring while working on declarations is re-structuring.
At the origin of this distinction, members and declarations, re-factoring and re structuring, there are 2 different purposes:
- Members are here to define the behavior, they contains the logic that is executed by threads at runtime.
- Declarations are here to structure, to organize, to abstract and to componentize the code, to make it understandable and maintainable.
Interestingly, we can notice that what OOP has brings since C for example, is mostly new organizational means, and especially classes and abstractions. If you look at a C program, you’ll see that all statements, do, while, if, switch, function call, ternary operator, variables… existed before OOP became mainstream.
Why would distinguish between re-factoring and re-structuring could be useful? I noticed several times that while re-factoring code is a complex and error prone task, re-structuring is pretty seamless and generally doesn’t introduce regression bugs. The distinction is interesting because if you have to only re-structure some code, by making sure to not touch method bodies and fields, you will likely avoid a lot of problems. When you are facing some major changes, it becomes then interesting to organize the work in distinct re-structuring and re-factoring phases.
Concretely, re-structuring implies:
- creating/removing/renaming assemblies and namespaces,
- renaming types,
- moving types from one namespace/assembly to another,
- creating interfaces for some existing classes, and maybe provide associated factories,
- re-structuring the test methods impacted, without the need to change the logic of the tests
I realized that re-structuring was something seamless when we decided to remove all dependency cycles between namespaces in the code of NDepend. It was back in 2006, when we introduced the Dependency Structure Matrix in VisualNDepend. The first thing to do was to eat our own dog-food and see if it could really help us re-structuring our code. To my surprise it only took us 3 days to re-structure around 25.000 lines of code.
Levelizing is cheap
Actually, we did a specialization of re-structuring here, we levelized the code, meaning we assigned a level to each of our namespace or in other words, we removed dependency cycles. We might also used the term layering instead of levelizing here to meanthat we are creating layers. I had the chance to explain all this in this post, Layering and the Level metric. The following diagram shows why dependency cycles avoid computing the level metric for some entangled components (the red ones):
I came to the conclusion that levelizing code was something relatively cheap because developers instinctively try to respect level. If you were coding the System.String class, would you feel natural to call some XML API from it? I guess no, and this is because you feel that System.String is something lower level than any XML API. This intuition applies to any coding task and often the code base comes at a point not too far from a perfect levelization. This doesn’t mean that the code base is not completely entangled. For example here is the indirect dependency matrix of the VisualNDepend namespaces before its levelization.
Each black cell informs that the 2 corresponding namespaces are dependent on each other. We mean here indirectly dependent, like for example A uses B uses C uses A. The fact that most cells are black reflects the fact that all namespaces depend on all other namespaces. The number assigned to a black cell is the minimal cycle length between the 2 corresponding namespaces. You can display such cycle by clicking the cell:
Typically, to get a clear picture of such entangled situation, the dependency matrix is much better suited than graph. Here is the graph representing the same information that is shown on the matrix:
Things are actually a bit better if we use the direct dependency mode, in which the matrix is now showing only direct dependency, in green from left to top, in blue from top to left and in black for A and B mutually dependent:
What we can see is that the Kernel namespace (row and column #13) has a lot of black cells. This is because the Kernel namespace is using a lot of other namespaces and also, it is used by most of the other namespaces. Splitting this Kernel namespace into something low-level (Kernel interface) and something high level (Kernel implementation) was the key to get our code base levelized. And frankly, it didn’t take a lot of time to do this split.
There were a few other problems because there are also some black cells not involving the Kernel namespace. These were not hard to fix because of the low/high level developer intuition. For example, by digging with the matrix to see why 2 namespaces are mutually dependent, we obtained the following matrix. We can see much more green cells than blue cells. It means that the namespace on the left (QueryPanel.*) should rely on the namespace on top (GraphObjectModel.*) and not the opposite. We then just have to remove the 6 blue cells by creating an interface or moving the culprit class (MetricMinMax) to another namespace at a lower level.
Finally, we obtained all our namespaces levelized and as a consequence a triangularized dependency matrix:
More details about this particular case study can be found in this article I wrote 2 years ago.
Since then I had several real-world opportunities to figure out that re-structuring code (and especially levelizing code) is cheaper than expected. The consequence is that for a relatively small cost, you can take the time to re-structure your code base without impacting behavior and creating regression bugs.
In a next post, I’ll explain concretely why a clean and levelized architecture brings more agility. Basically, the idea is that once every components has a well defined level, it is much easier to introduce abstractions you need to implement new features and requirements you didn’t anticipate. In other words, it becomes natural to continuously adapt the structure of your code to unexpected situations.