BigTable concerns, or “How to put your trust in the cloud”

I didn’t necessarily mean to piggyback off Greg’s two posts on ORMs but c’mon, what’s a hillbilly to do when he perpetuates such negative stereotypes? I mean, before you start knocking it, have any of you *tried* kissing your sister?

He also has some blather in there on RDMSs and ORMs. So I suppose I should hide my indignation behind something technical.

We’re using BigTable for our project by way of Google App Engine. The decision to use it was pretty easy once we landed on GWT as our platform. The integration ‘twixt GWT and App Engine is pretty seamless and hey, App Engine uses Big Table.

I’m glossing over the dozens of times we’ve second-guessed ourselves since making the decision though. The most recent was just yesterday as a matter of fact when my friend expressed a couple of concerns:

  • How do we back it up?
  • How do we do ad hoc reports against it?

Over the course of the conversation, these boiled down to: How can we work with BigTable in a way we’re used to with RDMSs?

The inevitable option came up. Maybe we shouldn’t use BigTable. Maybe MySQL is more suitable if we’re unsure. That means moving off App Engine though and we like what we’ve seen so far with it.

It was a bit of an uncomfortable conversation actually and this was between two very seasoned developers who have never shied away from new tech. I think the reason for the awkwardness though is that we aren’t dealing with someone else’s money. This is a startup so it’s a decision that he and I are going to have to live with.

In the end, being seasoned developers, we recognized that moving to a new development platform will just substitute one set of problems with another. For basic transactions (I hesitate to say OLTP because that will imply I know more about the term than I do), like getting some objects and saving them again, BigTable just plain works. There’s no ORM behind the scenes to map your data structure to the domain model. You create a User and you save it. Any relationships are automatically dealt with by some magic that is buried in the documentation somewhere, I’m sure. It really is like working with an ORM without actually having to deal with the mapping.

As for our two questions above, we have a tentative solution that I still like a day later and it will solve both problems. Let’s take the second one:

How do we do ad hoc reports against it?

See this is where RDMSs shine, I think. So breaking down the question we get: How can we get the benefits of BigTable for transactional stuff and the benefits of RDMSs for reporting and ad hoc querying?

Funny how CQRS starts to make sense when you have the right problem staring you in the face. We’ll have a separate relational database for querying. As requests are sent to BigTable, we’ll also dump them out to another service elsewhere that queues them up to be processed into the relational database.

This also addresses our first question:

How do we back it up?

The nice thing about this approach is that we now have our offline backup though of course, backing up is only half the solution. We also need some way to restore BigTable from our relational database easily. But the idea seems sound enough even if the mechanics may prove otherwise.

Maybe this sounds unduly complicated. It really doesn’t to me. App Engine and BigTable offer a lot of advantages. They solve problems I don’t want to deal with, most notably, scalability. The ones they introduce, backing up and querying, by contrast, are pretty simple. Besides which, I’m scheduled for Udi’s course in a couple of months anyway.

And for the record, I don’t have any sisters. Just three adventurous brothers.

Kyle the Restored

This entry was posted in Google App Engine, GWT. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Kyle Baley

    I dunno. I have pretty specific requirements:

    * Backup and restore AppEngine datastore
    * Manipulate data using regular SQL
    * Process data outside the AppEngine request limits

    Unless AppRocket can do these specific things, I’m not sure it’ll work. But I’ll take a look…

  • Paul Prescod

    Is AppRocket the sort of thing you need?

    Have you tried it?

  • Richard

    With ad hoc reports, I can see the need for the relational models. For pre-created reports, there is a different option that I’ve heard use with event based architectures and that is to have the report also be a processor of the events in question. For example, an inventory report might start at 0 quantity for every item. Every time there’s an event involved in restock, the report is updated. Every time there’s an event involved in a destocking, the report is updated. In this way, the report content is generated by the events that affect the report content versus by the execution of report. This auto results in performance as running the report 100 times is no different than 1 time, because running it is just viewing data that’s already been collected.

  • Kevin Webber

    Regardless of whether we use RDBMSs or NoSQL to store data, we still have data and we still have a need to transform and view that data in a million different ways. I actually think NoSQL is causing people to question why SQL is a requirement for business folks to know at all. SQL is more like a symptom of a problem rather than an effective solution IMO, but it’s still the best solution we’ve got. 😉 For the most part we write a lot of complex SQL to help business folks plan strategies and tactics based on the info/knowledge we have tucked away on our servers (or in the cloud or wherever). SQL is just a way to get at it all. NoSQL really needs to fill that gap and the tooling to catch up before it can take on RDBMSs and SQL in certain industries (ie, products like Hyperion Interactive Reporting, etc). Like anything else it all boils down to analyzing the requirements of the specific project/app and weighing the tradeoffs of being an early adopter vs the potential productivity gains of NoSQL. I’d love to see some numbers on those gains, otherwise it might be like one of those scenarios where one car feels faster than the other even though they travel at the same speed, because one car has a louder muffler. :)

  • Kyle Baley

    That’s a mighty fine question, pete. I’m coming on five weeks into working with BigTable at all so my answer is based on the optimism that has led my career to-date: someone must have solved that problem already. And blogged about it.

  • pete w

    I am all for shifting away from a RDBMS but the relational paradigm is geared towards solving some common fundamental needs, and ad-hoc reporting is one of many of those needs.

    Tell me hillbilly, have you run into the scenario where your model has changed drastically, and you must work with a new data model as well as the deprecated data already resident in bigtable? I was wondering how schema flux is handled over time.

  • Kyle Baley

    I suppose it depends on how much you trust Google with your data. Their terms of service state they aren’t responsible for the loss of any data and that backups are your responsibility. But you make a good point and we talked about this as well. In “Getting It Real” terms, this constitutes a decision we can put off until later.

  • Paul Bauer

    Doesn’t Google App Engine obviate the data backup problem?