I think there is a clear
distinction between re-structuring and
re-factoring code. This is because of
the nature of the OOP code which is made of 2 distinct artifacts: methods’
bodies and fields on one hand, and methods’ declarations, types’ declarations,
namespaces’ declarations on the other hand. I would say that working on members
is re-factoring while working on
declarations is re-structuring.
At the origin of this
distinction, members and declarations, re-factoring
and re-structuring, there are 2
different purposes:
- Members are here to
define the behavior, they contains the logic that is executed by threads at runtime. - Declarations are here to structure,
to organize, to abstract and to componentize the code, to make it understandable and maintainable.
Interestingly, we can
notice that what OOP has brings since C for example, is mostly new organizational means,
and especially classes and abstractions. If you look at a C program, you’ll see
that all statements, do, while, if, switch, function call, ternary operator, variables… existed before OOP became mainstream.
Why would distinguish
between re-factoring and re-structuring could be useful? I noticed several times that while re-factoring code is a complex and error-prone task, re-structuring is pretty
seamless and generally doesn’t introduce regression bugs. The distinction
is interesting because if you have to only re-structure
some code, by making sure to not touch method bodies and fields, you will likely
avoid a lot of problems. When you are facing some major changes, it becomes
then interesting to organize the work in distinct re-structuring and re-factoring
phases.
Concretely, re-structuring implies:
- creating/removing/renaming
assemblies and namespaces, - renaming types,
- moving types from one
namespace/assembly to another, - creating interfaces for
some existing classes, and maybe provide associated factories, - re-structuring the test
methods impacted, without the need to change the logic of the tests
I realized that re-structuring was something seamless
when we decided to remove all dependency cycles between namespaces in the code
of NDepend. It was
back in 2006, when we introduced the Dependency Structure Matrix in
VisualNDepend. The first thing to do was to eat our own dog-food and see if it
could really help us re-structuring our code. To my surprise it only took us 3
days to re-structure around 25.000
lines of code.
Levelizing is cheap
Actually, we did a
specialization of re-structuring
here, we levelized the code, meaning
we assigned a level to each of our namespace or in other words, we removed
dependency cycles. We might also used the term layering instead of levelizing here to meanthat we are creating layers. I had the chance to
explain all this in this post, Layering and the Level metric. The following diagram shows why dependency cycles avoid computing the level
metric for some entangled components (the red ones):
I came to the conclusion
that levelizing code was something relatively cheap because developers instinctively
try to respect level. If you were coding the System.String class, would you feel natural to call some XML API
from it? I guess no, and this is because you feel that System.String is something lower level than any XML API. This
intuition applies to any coding task and often the code base comes at a point
not too far from a perfect levelization. This doesn’t mean that the code base
is not completely entangled. For example here is the indirect dependency matrix of the VisualNDepend namespaces before its levelization.
Each black cell informs that the 2 corresponding namespaces are dependent on each other. We mean here indirectly dependent, like for
example A uses B uses C uses A. The fact that most cells are black reflects the fact
that all namespaces depend on all other namespaces. The number assigned to a black cell is
the minimal cycle length between the 2 corresponding namespaces. You can
display such cycle by clicking the cell:
Typically, to get a clear
picture of such entangled situation, the dependency matrix is much better
suited than graph. Here is the graph representing the same information that is shown
on the matrix:
Things are actually a bit
better if we use the direct dependency mode, in which the matrix is now showing
only direct dependency, in green from left to top, in
blue from top to left and in
black for A and B mutually dependent:
What we can see is that
the Kernel namespace (row and column #13) has a
lot of black cells. This is because the Kernel
namespace is using a lot of other namespaces and also, it is used by most of
the other namespaces. Splitting this Kernel
namespace into something low-level (Kernel
interface) and something high level (Kernel
implementation) was the key to get our code base levelized. And frankly, it
didn’t take a lot of time to do this split.
There were a few other problems
because there are also some black cells not involving the Kernel namespace. These were not hard to fix because of the low/high
level developer intuition. For example, by digging with the matrix to see why 2
namespaces are mutually dependent, we obtained the following matrix. We can see
much more green cells than blue cells. It means that the namespace on the left
(QueryPanel.*) should rely on the
namespace on top (GraphObjectModel.*)
and not the opposite. We then just have to remove the 6 blue cells by creating
an interface or moving the culprit class (MetricMinMax)
to another namespace at a lower level.
Finally, we obtained
all our namespaces levelized and as a consequence a triangularized dependency matrix:
More details about this
particular case study can be found in this article I wrote
2 years ago.
Conclusion
Since then I had several
real-world opportunities to figure out that re-structuring
code (and especially levelizing code) is cheaper than expected. The consequence is that for a relatively small
cost, you can take the time to re-structure your code base without impacting
behavior and creating regression bugs.
In a next post, I’ll explain
concretely why a clean and levelized architecture brings more agility. Basically, the idea
is that once every components has a well defined level, it is much easier
to introduce abstractions you need to implement new features and requirements
you didn’t anticipate. In other words, it becomes natural to continuously adapt the structure of
your code to unexpected situations.

