This post has little redeeming value and it’ll be a bit mean-spirited, but there are some lessons learned at the bottom. At the Agile Austin lunch today Joshua Flanagan and I were talking about how developers are sometimes slow to kick off limitations of the previous generation of technology. We talked about people with VB6 backgrounds transitioning to .Net that simply can’t make themselves create new classes because object creation was expensive in VB6 (I was that way too at first). I brought up the classic problem of old school COBOL developers who code enormous procedures as a holdover from the days when COBOL subroutines were inefficient (I’ve always thought that I could spot a ex-COBOL programmer coding in any other language).
That conversation inevitably led to a downward spiral of WTF stories. Here’s my clincher in the game of one-upmanship, and I hereby promise to never, ever tell this story again.
Several years ago I was involved with a Death March project to rewrite a very large VB6 application. One of the VB6 client applications in the larger system was a target for a tremendous amount of user and production support venom. I had all of the code available to me, so I decided to see if I could spot what the problem. I looked at all the forms and didn’t find anything but some innocuous looking code. Then I accidentally opened the code for the splash screen(!) and out popped 10,000+ lines of VB6 code. Here in all of its glory is what I found:
- The system had a large amount of metadata in about a dozen different tables that needed to be accessed in numerous places. To standardize the access to the tables the developers created a shared DLL to provide access to these metadata tables. It was a fine idea, but it was a terrible implementation. They were already invested quite heavily in a homegrown CSLA code generation tool to generate middle tier code from the database structures. I’m guessing that they adapted their codegen tool to create metadata classes and data access code for each of the tables. The end result was an object hierarchy of an object for each row in the database. Every single time you created an instance of the main metadata class you pulled down every single row (and there was a fair amount of data) in all of the metadata tables and created a VB6 object for each row — every single time. Remember that object creation in VB6 was much more expensive than it is in .Net today. I’m not done yet.
- The collections of the metadata row objects were not hashed, and they didn’t create any kind of find or query helper method on the main metadata class. To access any piece of data you had to iterate through the collection to test each object until you found the one you wanted. In some cases you had to loop twice if the first query didn’t find what it was looking for. The worst thing about this was the absurd number of times the looping code was copied and pasted all through the codebase. I’m not done yet.
- In many cases the metadata class hierarchy was created on the application server and used remotely by a heavy VB6 client via DCOM. All of that repeated iteration over the collections was done over a stateful DCOM connection via the WAN from the Pacific coast or South America to the US. It gets worse.
- Back to the VB6 client that caught all of the flack. In this case the metadata component ran on the client workstation. That’s right, client workstations hitting the database directly (and the connection string information on the workstations as well). This particular database server had some severe instability problems due to the number of open database connections (this was back in Oracle 8.0.5 days when it wasn’t as resilient to inactive and orphaned connections). I’m not sure which is worse, long running stateful communication over the WAN or heavy database access on a client box. Almost there.
- Lastly, the guilty VB6 client was some sort of polling mechanism that ran continuously. Every .4 seconds a timer event would kick off the polling. As you probably guessed, the client created a new instance of the whole metadata hierarchy every single time to look for the exact same piece of data every single time. Every .4 seconds they had to pull down all of the data from the database, create all those COM objects, and iterate over the collections to find the exact same little piece of data.
Okay, the maintenance team found the issue pretty quickly and largely fixed the instability of that one client rather quickly by caching the data. At one time I was standing on a soapbox and defending the rewrite by saying that we could reduce the LoC count of the system by nearly an order of magnitude in some cases — just by cleaning up the way the system fetched metadata.
The project was an awful experience, but I learned a lot about enterprise development. For those of you who are new to large scale enterprise development, here’s some things to know or think about:
- Network round trips are evil. At lunch today we were laughing at developers who obsess about optimizing string concatenation while writing systems that are chatty to the database. The fastest way to bring an enterprise application to its knees is excessive network traffic.
- If you possibly can, avoid stateful network connections. Stateless connections will lead to better scaling and reliability. For maximum scalability (and I’m thinking horizontal scaling here obviously), a client’s request should be able to be serviced by any of the servers. Then scaling can be accomplished by putting a hardware router in front of your application/web servers.
- Be very cognizant of database connections. Newer database engines seem to be more resilient now, but you can easily overwhelm a database by opening too many connections. There’s a reason why connection pooling and running data access through middle tier servers was, and is, important for databases. Make sure you understand how connection pooling is configured and working in your environment
- Fancy SOA infrastructure or big Enterprise Application Integration strategies don’t mean crap if the code underneath it sucks. I know there’s a pervasive theory that you *fix* the enterprise architecture by putting the ugly stuff behind abstracted web services first, then replacing the legacy code. Granted I’m an application guy, but my money is on prioritizing the patching of the existing code.
- Just think about what you’re doing. I’m writing the same code over and over again, I should eliminate the duplication and then I’ll code faster. This seems like it’s harder than it should be, maybe we should look for a different way? This just can’t be right to code this way — it’s not.
- Severe schedule pressure leads to crappy code that’ll be more expensive later. I’m sure the developers of that system deserve some smacking, but I know that management put them in some extremely bad situations. At some point I think you just have to scream no more in an impossible situation. One of the legends I heard was the development team being given cots at the factory and being told to make the prototype run the factory floor in 6 weeks. I seriously doubt I’d write quality code in that situation.
- Maybe, just maybe, you shouldn’t push a prototype into production, then willy nilly glom on more code for 3-4 years after that.
- Maybe, just maybe, a mission critical system that is a foundation for your business shouldn’t be built on the cheap.
BTW, that (stabilized now) VB6 code is still limping along as far as I know, five years later and at least three major replacement attempts under the bridge. The even older COBOL system that the VB6 system was supposed to replace was still limping along the last I knew of as well.