The Importance of Being Explicit

I’ve gotten burned several times lately with little defects that are indirectly caused by implicit “read the tea leaves” style programming.  Here’s an example of what I mean taken from my work last week.

<Rule DayLimit=”5″ />

It’s just an attribute in an XML configuration file that specifies configurable business rules, no big deal.  The problem was that a value of “0″ in the DayLimit attribute was interpreted as a completely different business rule than a positive value in the DayLimit attribute.  Fortunately an automated regression test picked up the forgotten requirement.  I think it would have elminated some confusion if there had been a separate attribute for the “0″ case like AllowPostDating=”False” to be more explicit.


I caused an edge case bug by a little bit of sloppy programming.   I was pulling data from some validation tables in the database into an array of objects that would be consumed by business logic classes.  In this case it was perfectly legal to have “child” rows even if the “header” row didn’t exist.  In the case of orphan records, I just created the header object and assigned a zero value to it’s Rate property.  In some cases during business validation I would check whether Rate != 0 to check if the business entity existed at the header level. This was confusing, but workable until some automated tests failed with false validation errors when existing header records had a rate of zero.  Once I understood the issue, I corrected the object structure to make the existence test more explicit like the following.

	public class ImplicitRecord
{
private decimal _rate;

public decimal Rate
{
get { return _rate; }
set { _rate = value; }
}
}

public class ImplicitBusinessClass
{
public void Process(ImplicitRecord record)
{
// Check if there is ANY rate
if (record.Rate == 0)
{
// create an error message
}
}
}

public class ExplicitRecord
{
private decimal _rate;
private bool _hasRate;

public decimal Rate
{
get { return _rate; }
set { _rate = value; }
}

public bool HasRate
{
get { return _hasRate; }
set { _hasRate = value; }
}
}

public class ExplicitBusinessClass
{
public void Process(ExplicitRecord record)
{
// Check if there is ANY rate
if (!record.HasRate)
{
// create an error message
}
}
}


One of the scariest, most error prone idioms in all of software development is passing around a Hashtable/ArrayList of Hashtable/ArrayList objects.  How many pernicious bugs have been caused by fouling up the key values to the Request, Session, and QueryString collections?  Assuming you have a choice in the matter, which class below would you rather consume based on it’s public API?  The answer is ExplicitActionClass unless you’re being perverse just to spite me.  Remember that other developers will follow behind you, so code to reduce the probability of their mistakes.  Make the public API easy to use and intention revealing.

	public class HashtableActionClass
{
public ArrayList Execute(Hashtable arguments)
{
// unwrap things in the arguments hashtable and perform work
string userName = (string) arguments["USER_NAME"];
decimal purchaseAmount = (decimal) arguments["PURCHASE_AMOUNT"];

// perform work and return an answer

return new ArrayList();
}
}

public class ReturnClass
{
}

public class ExplicitActionClass
{
public ReturnClass Execute(string userName, decimal purchaseAmount)
{
ReturnClass returnValue = new ReturnClass();

// perform work and log results to the returnValue variable

return returnValue;
}
}


I’ve never coded in C++, but I’m guessing that passing around pointer structures between objects led to some truly wicked bugs.


Evil Databases


Ambiguous database designs can cause even more damage.  I’ve often run across database tables whose columns represent very different conceptual things depending upon the values in another column.  I know there is a little bit of database inefficiency by having a bunch of columns with lots of null data, but I’d still much rather have separate columns for separate logical concepts and pieces of information. 


I’ve been in several situations where the only way to integrate two or more systems was to directly peek into another system’s underlying database.  This is fraught with so much risk that it’s borderline insane.  It’s risky because you’re duplicating a lot of logic required to “interpret” the business meaning in the underlying data.  On one hand you have to correctly reproduce the interpretation, and on the other hand you must keep the duplicated logic synchronized through later changes.  The synchronization just isn’t going to happen because the different codebases are probably being built and tested separately.   Any change in the database schema becomes risky without coordinated, large scale automated testing on every downstream system or any form of compile time checks keeping the applications synchronized with the database. 


At a previous employer we had an absolutely humongous Operational Data Store (ODS) that contained answers to every question.  Many applications in the enterprise touched this monster.  A much greener, more idealistic version of myself foolishly tried to give the ODS team a suggestion to optimize a terribly sluggish view I needed to access.  I basically had my ear chewed off and told that the code change would take 6 months of regression testing to do something like that.  I ended up creating a polling mechanism to cache the very important, extremely volatile data every 15 minutes just so our application could actually function with any kind of decent performance.  The biggest usage of our user interface turned out to be a report on the cached data that we’d thrown in at the last minute as a “nice to do” feature request.  We almost had to add another web server to the farm just because of the demand for this data that was too difficult to pull out of the ODS.


I think that people are dangerously overexuberant about SOA in general, but prudent usage of SOA should eliminate the need for duplicating the fragile database sharing crap, and that sounds pretty darn good to me (using web services as a thin pass-through data layer instead of raw ADO.NET or JDBC seems rather foolish to me though). 


I feel that coding explicitly is orthogonal to the Static versus Dynamic Typing debate.  “Duck Typing” is one thing, but hidden meaning in code is just plain bad coding.

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at http://codebetter.com/jeremymiller.
This entry was posted in Ranting. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://codebetter.com/blogs/jeremy.miller jmiller

    Josh,

    I just meant that I would never put an unnecessary SOAP layer between an application and its database. A couple years ago some of the clowns on your “Architecture” team thought it was a good idea to make every application access data through SOAP web services no matter what the situation. Thank God we had no real power then. I wouldn’t *ever* allow any other application connect directly to a production transactional database. I’m not wild about the file extracts because of file locking concerns, but I think you’re on the right path. DTS to push the data extracts into a reporting DB maybe? I also don’t like the integration by polling against a reporting DB strategy either. That sucked royally in production.

  • http://codebetter.com/blogs/jeremy.miller jmiller

    Darrell,

    There are always exception cases to depart from a normalized database structure. In a logic-intensive application the database should be designed around object persistence. I was thinking specifically about the design of a database table to store an inheritance hierarchy of classes. Check out Fowler’s Single Table Inheritance pattern from the PEAA book. The way you’re suggesting to design tables to store the inheritance relationship is only one option out of many. If a denormalized database makes persistence easier without hurting performance, then normalization is *not* important.

  • http://codebetter.com/blogs/darrell.norton/ Darrell Norton

    If you have lots of null columns, you don’t really have a normalized database. Those should be put into a foreign key table with a “type” column.

    Duck typing is something made up by the Pragmatic Programmers. It is what the REAL intention behind typing is, not the watered down “this is how we implemented it in [a given staticly typed language].” But I would never pass in a Hashtable so that my object messages were one parameter. I’d pass in an object on which I could call known methods, dynamic or static doesn’t matter.

  • http://flimflan.com/blog Joshua Flanagan

    You caught my attention with the paragraph in passing about fragile database sharing crap. It is an issue very dear to me right now as I try to figure out an integration strategy for an application that contains a lot of data other people want. I agree SOAP would be overkill, but I wonder what you mean by “raw ADO.NET” ? Does that mean giving other users/apps a database username/password to use via ADO.NET? Isn’t that fragile database sharing? I’m really trying to avoid opening up the DB to be used whenever and however by people that get an account.
    Right now, I’m leaning towards flat file extractions being dumped to a network share, for all to consume as they see fit. It seem so ancient and low-tech, but somehow elegant and “future proof”.

  • http://codebetter.com/blogs/jeremy.miller jmiller

    Gary,

    Reflection abuse vs. Hashtable of Hashtables? I’m not sure which one wins the “who’s more evil” smackdown. I actually was trying *not* to rant and say something useful for once here on CodeAdequately.com.

    I, of course, have already ranted a little bit about abusing reflection here–> http://codebetter.com/blogs/jeremy.miller/archive/2005/06/29/130090.aspx

    I don’t remember what system I was looking at that day, but I bet you can make a good guess.

  • http://codebetter.com/blogs/sahil.malik sahilmalik

    See, and when I’m explicit everyone bitches at me.

  • BlackTigerX

    I actually just saw this a few minutes ago, in one database, there is this field “BilledCorrect”, and I was told that 1, means is correct, 2 is incorrect

    =o|

  • Gary Williams

    I 100% agree. The most awful bugs I have ever had to chase are in cases with ‘soft’ typing. The refactor tools don’t work. Searching can’t find it. Ick. I don’t really even like reflection. Like most tools it can be very handy – and when you need it, irreplaceable. But overuse of reflection is a far too common evil. If you don’t like strongly typed languages (C#, Java) then don’t use them…but don’t break the expectations of the language by walking around the typing just because you can.

    Oooh…was that a rant?