Here are some things I have learned from having just completed a datawarehouse reporting application for a client company:
- When dealing with any sort of impute logic (figuring out who a customer's primary salesperson was, given that they had orders tied to several different salespeople), store the “tier” (i.e., tier 1 is highest $$ sales, tier 2 is most orders, etc.) that the record had information imputed on. This makes debugging by developers and third-party testers so much easier.
- Use set-based updates as much as possible. A row-by-row algorithm executes, at best, at O(n). Set-based algorithms execute more like O(log(n)). Once you reach more than 50 rows or so, depending on how busy the server is, the difference is noticeable.
- Do not use dynamic sql unless you have to. Development and debugging time are all reduced when not having to deal with dynamic sql.
- Automate the deployment. It took too long once we were on site to actually deploy the application. Deployment should be scripted and then tested before going to the client.
- Keep the application prefix to 3-4 characters. We used a 12-character prefix with an underscore, which was way too long. A better system code would have been 3 or 4 characters with no underscore.
- Keep all formatting functionality in user-defined functions (UDFs). UDFs are one of the best features in SQL Server 2000, and keeping all formatting functionality there improves reuse, reduces formatting errors, and improves system flexibility.
- Do not repeat logic in more than one proc. We did a very good job of this, but toward the end, certain changes needed to be reflected in 2 or 3 stored procs, or in several places in the same stored proc (part of this was a byproduct of using dynamic SQL). All of this should have been abstracted out. Other times, due to good encapsulation, changes that seemed like they would take a long time were very quick and easy to fix, with minimal chance of errors.
- Organize the data at the lowest possible level of aggregation. For example, if you are doing summary reports for shipments, but you need a list of individual shipments for part of the report, go ahead and generate the list of shipments as part of the initial extract-transform-load (ETL). You can then do calculations, aggregations, and summations more quickly and easily on this set of data, and when it comes time to print out the inevitable list, you can just select a range of rows. With proper indexing, this is faaaast compared to almost any other method.
