No one would blame you for strictly associating NoSQL with performance. Most of the back and forth about NoSQL – an umbrella term given for non-relational storage mechanisms – has squarely put the focus on performance, sites with massive traffic, and server farms. It’s an interesting conversation, but one that risks alienating NoSQL from the majority of developers.
Does NoSQL provide us simple developers with any tangible benefit? As a matter of fact, it can – one as significant for us as performance is for facebook. First though you need to understand that all of those tools you’ve been using to access your data, such as DataSets, Linq2Sql, Hibernate, NHibernate, EntityFramework, ActiveRecord, SubSonic SQLAlchemy, are meant to help you deal with the well-known object-relational impedance mismatch. The short description is that data stored in code (typically using an Object Oriented approach) and data stored in a relational DB requires coercion to move to-and-fro. The amount of coercion will greatly vary from system to system as will the visibibility (or leakiness) of said coercion (the tool you use has a significant impact on this as well).
A lot of developers don’t feel that object-relational impedance mismatch is really a problem or even exists. That’s only because, as the only solution, you’ve been dealing with it for so long – possibly your entire programming career (like me), that you don’t think about it. You’ve been trained and desensitized to the problem, accepting it as part of programming the same way you’ve accepted if statements.
By changing how data is stored, the object-relational mismatch no longer applies to NoSQL solutions. Of course, just because the object-relation mismatch is gone, doesn’t mean something else hasn’t taken its place. There’s 4 primary storage techniques used by NoSQL solutions. I’m going to look at the two I’m most familiar with: Document (via MongoDB) and ColumnFamily (via Cassandra).
Now I don’t have any NoSQL systems in production. My experience with MongoDB has been as a main contributor to the C# MongoDB driver (Norm) – largely focusing on the underlying communication protocol. I’ve also been writing a sample application as a demo for the driver and am prototyping something here at work (with plans to go into production). My experience with Cassandra is even more limited – having spent this past weekend looking at writing a C# driver for it (Apollo).
What I’ve noticed with MongoDB specifically (and probably document-oriented database in general) is that your data layer practically vanishes. This makes sense given the support for arrays and nested documents, as well as the ability to serialize and deserialize from a non-ambiguous and simple-type protocol like JSON (or BSON). This is huge productivity win for developers – shorter development time, less code and therefore less bugs.
On the flip side, my initial reaction to ColumnFamily storage (and I’d assume Key-Value engines) approach is that its even further away from OO than a relational model – thus the mismatch is even greater. You end up dealing with individual values (or arrays of individual values) and the language of Cassandra bleeds deeply into your application (much like the language of RDBMS bleed in your code when you use DataSets). Again, not a huge surprise sine Cassandra *is* heavily tuned for performance.
Ultimately, the drivers and tools you use to communicate with the storage engine are going to have a significant impact. For example, before Norm, the main way to communicate with MongoDB from C# was essentially through the use of glorified dictionaries. Before NHibernate we were using DataReaders or DataSets. However, the greater the difference between OO and storage model, the greater the complexity and leaks. Also, NoSQL drivers are young and have tons of room to grow, whereas the last hot thing to happen on the RDBMS front was Rail’s ActiveRecord.
Keep in mind that I am biased. Not only is my knowledge of Cassandra limited, but I’ve also had a hand in shaping the MongoDB drivers – obviously MongoDB fits well with my vision of what data access should look like. Maybe as my involvement in the ColumnFamily approach grows, so too will my opinion of the technology. However, it seems pretty clear to me at an implementation level that document-oriented databases (as well as object oriented database, like db40, I’d assume) are relatively close to the OO model used in code, and as such provide the greatest value to programmers.
At this very moment though, risks of new technology aside and the inconvenience of having to learn and grow, I’d say that there is compelling reason for developers to move away from relational storage engines (at least to prototype and play with). Reduced complexity within the application layer, not performance and scalability, is NoSQLs greatest strength.