Sponsored By Aspose - File Format APIs for .NET

Aspose are the market leader of .NET APIs for file business formats – natively work with DOCX, XLSX, PPT, PDF, MSG, MPP, images formats and many more!

Auto-generated vs. human-generated IDs

Got myself a delicious problem but there is another discussion to be had so I will leave you hanging on that one until a later post.

I’ve already hinted at the software I’m working on for the time being. It’s essentially something to help manage a land surveying office but what it does isn’t really the point. What *is* the point is that it has some major domain object that needs to be managed. In this case, it’s a Job (to use the domain term).

A brief history: Jobs in land surveying office have always historically been a paper-based entity. There is a lot of physical information to collate. Land titles, plans, deeds, even the sketches the crews make in the field. These are all key pieces of information that need to be kept organized in some fashion. And it’s all either visual information or third party data that they have no control over (and sometimes both). So to think that all of this information can be moved into the digital world is a little ambitious at this point.

So historically, they tend to organize them into a Job. And naturally, a job must be assigned a number. And typically, companies will use a numbering system that will convey some information in the number. This can be something relatively simple, like M070123 which indicates that this was the 123rd job in 2007 for the Montenegro office. Or it could be somewhat more complex like BLC-08-A31 which might mean it’s a BLC (domain term, don’t ask ’cause I don’t know) from 2008 from zone A of the city and it’s the 31st job in that zone for the year.

However it is generated, the fact remains that they need to maintain a running list of these numbers in a "book" (another domain term, again don’t ask ’cause I’ve never heard of it either) so that they can generate a new one easily. As a new job is ordered, the person recording it must take care to record a new number in the running list so that there are no duplicates.

Now here comes the consultant (i.e. me, and seriously, do NOT ask about this one) to automate some of this data. And as I start working, the inevitable question arises: Why are we still using this archaic process of generating job numbers manually? Computers *love* generating IDs automatically. Databases do it natively. NHibernate can get you a GUID in less time than it takes to remember what the acronym stands for.

And so it was that I made a suggestion: Why not ditch the current mechanism and start over with something more computer-friendly? Like starting with an ID of 1 and going up from there. They can still refer to jobs easily. And though we’re losing that tiny bit of metadata embedded in the previous version, we can glean that information (and a whole lot more) in the form of reports on the system. My job’s easier and they’ve saved money during development.

This being a family business (and my family’s business to boot), they have reservations but defer to the "expert". There’s a lot of "well, there may be some pushback from the others but if you think it’ll be easier, go for it." I’m happy at a job well-done.

So confident am I that I think nothing of an e-mail from my brother asking me to contact one of the "others". That is, someone outside the family but who is actively going to *use* the system. She has some reservations but is leery about bringing them up. She’s young, inexperienced. What could she possibly say to sway the mind of the big, bad consultant?

"How do we enter in existing jobs?"

An honest question that deserves an honest answer, rather than the back-pedalling one I gave. Which basically suggested we’ll have a field for OldJobNumber to handle legacy job numbers. And even as I spoke the words, the whole idea kind of unravelled in my head.

Because the old job number is important. They’ve got cabinets and cabinets filled with files referring to them. And all of a sudden, I’m suggesting they create a whole new filing system. And that’s just the physical aspect. Even within the application I’m writing, I’d have to present the job number differently. I had visions of: if ( old job number exists ) show it, else show job ID peppered throughout the code.

So I swallowed my pride and admitted I hadn’t thought of that. After which case, the floodgates opened and it was admitted that no one was really looking forward to it.

And after some questioning, it turns out there is a very good reason for this, though not an obvious one. There is something psychological in having metadata in the job number. Sitting in the office, you’ll notice that they are constantly bandying job numbers about. "What’s the status of job M080050?" "Where are the plans for job V070758?"

And when they call the jobs out like this, they can make a mental filter as they try to remember which job it is. You can imagine the thought process: "B-07-C98, that’s that BLC from late last year in the Soho area, I’ve got those plans right here." Compare that job number with another that has an ID of 78945 and it suddenly becomes a lot harder to create that mental filter. In essence, it’s more than an identifier. It’s also a name. And a filter. And even a kind of mini-report.

The lesson learned: Don’t subvert your client’s domain with all this new-fangled computer jargon.

The net result is job numbers will still be auto-generated but they will be generated in the form that they’re used to. It’s not quite as automatic as an auto-increment but it’s still very much algorithmic and can be done somewhat easily by the application.

But I have *seriously* simplified how they generate them for the sake of this post. The reality is the tasty problem I referred to in the opening paragraph. By way of foreshadowing (or even foreboding), they have not one, not two, but *six* methods of generating a number for a job based on various factors, some of which require intimate knowledge of a map of the area.

Stay tuned!

Kyle the Auto-generated

Final closing point, because someone may comment on it, is that I have no intention of dropping the auto-incrementing ID. But like most IDs, it won’t be as in-your-face as I originally expected.

This entry was posted in Featured, Sundry. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://weblogs.asp.net/bsimser Bil Simser

    Entities in the system, be design, have identity. Usually it’s tied to some way to retrieve it from a persistence layer but doesn’t have to be. Each of these jobs you mention has that identity, the ancient formatted Job number. Whether it’s calcuated, generated, or manually entered really doesn’t matter. It’s the business key as to how to “identify” something (identify, not find). You might have an ID property (or an Identifiable interface) on the entity so you can retrieve it from some backend system. Think of the ID property as the little old lady in the back room who shuffles around and finds your documents based on some Job number jibberish you rattle off. “Gladis, get me the report on Job M080050 please”. And she will go off and find it in filing cabinet #23, drawer #2, folder #8. I like to call Gladis NHibernate.

  • Matt

    It’s a simple difference between primary key and identifier. It always annoys me when developers think primary keys are good identifiers. Makes me want to shove a GUID down their pipe and sit em down for a long lecture on guid.comb. Which everyone enjoys!!

  • Darrell

    This is a good post to remind all developers to really make an effort to try to get to the bottom of why a certain task is done a certain way before deciding it could or should be changed.

  • http://www.appdev.info D. Lambert

    I used to do some work in configuration of manufactured items, and there was a phenomenon there known as “smart” part numbers. In essence, this was the same problem you ran into taken to the next level, because a part number like “ABC0108-8ZZCA-BK220″ is instantly known to mean you’re talking about an “ABC” product, January ’08 revision, with eight “ZZ” thingamabobs equipped with California emmission equipment, painted black, with 220V wiring. Instantly, of course, is relative based on whether you’re one of the 30-year veterans shuffling around the plant floor.

    The big problem when these “smart” part numbers get really complicated is that the rules start to break down. Parsing of the part numbers evolves as needed to accomodate new products without respect to whether any parsing rules are broken. So how do you automate this?

    First, as you pointed out, you have to understand the difference between “ID” in a programming or database context and “ID” or “Part Number” as used by a business user. In your case, “Job Number” isn’t really an ID — it’s a name for the Job. People are going to see the “Job Number”, but they’re not going to see your database ID – that’s a further clue that the “Job Number” is a business domain field just like Parcel ID or Surveyor Name.

    You’re right – show the user the Job Number, and help them manage it to the extent you can, but don’t treat it as a substitute for a database ID.

  • http://www.peterritchie.com/blog Peter Ritchie

    Sounds to me like a “Job #” is a representation of other (possibly meta) data. i.e. it’s not “generated” but “calculated”.

    There’s two reasons for “auto-generated” ids: one is that it’s an implementation detail–it’s a DB key, for example. The other reason is that it’s a human-readable id–i.e. primarily used by humans and only managed by computers. This later case seems to be what you’re describing.

    This may be why you’re getting so much pushback, you’re inter-mingling a human-readable-ID with an implementation-detail-ID. In the distant past, I’ve tried to do the same thing; but have only encountered problems. The biggest problem is keeping the computer-based requirements separate from the human-based requirements. You’re description of the various algorithms to “calculate” this ID are perfect examples; if you tried to use that same ID as a DB key you’d drive yourself insane; and what may work today won’t work tomorrow when the domain users want to evolve that ID (and then you’re stuck with migration scripts to get old records in sync with a new schema that does keep the human readable ID separate…).

  • Kyle Baley

    @Garry

    That’s an awesome way of putting it. So many times you hear about consultants (including myself) wanting to change things because “that’s how it’s always been” is the only reason they’re given. When in fact, there are very good reasons why “that’s how it’s always been”. They’ve just been forgotten over the years.

  • http://garryshutler.blogspot.com Garry Shutler

    Sounds to me like one of those things that was worked out to be a good way of doing things so long ago that it’s just become the way it’s done.

    Therefore, when you’ve challenged them on why they do it that way they couldn’t come up with a better reason that “because that’s how we do it” until you dug around and found the underlying motivation.