Migrating to Postgresql with my friend NHibernate

Last night I decided to accelerate our plan to migrate from SQL Server to PostgreSQL and gave it a quick attempt to see what the work involved was. I was hoping that due to our usage of NHibernate, and by having meticulously written database agonistic code the migration would be cake.

Turns out I was right and although there were a couple bugs along the way, the process took less than 30 minutes. Today I want to talk about the small issues we faced in migrating our code, as well as provide an overview of what I’ve learned about PostgreSQL so far.

First though, why migrate? I want to avoid getting rantish [in this post], but the cost of ownership for SQL Server isn’t remotely close to being competitive. SQL Express is great, provided you can live with (or are willing to) its numerous constraints. Bizspark (I only bring it up because I feel someone will mention it in a comment) is a subprime mortgage – it lulls you into a sense of free only to nail you with licensing cost down the road. I’m sure someone can quote some obscure feature that might be needed by some system, but for the rest of us, the only reason to use SQL Server is familiarity. Familiarity is a pragmatic and reasonable reason to pick a technology, but only for so long.

Secondly, why PostgresQL? Well, I’ve used sqlite and MySQL extensively in the past (including submitting patches to their .NET drivers). However, MySQL is currently in a weird state (the sun/oracle thing) and I need ACID compliance and my experience with InnoDB has been average.

Thirdly, its worth pointing out that the system being migrated has a comprehensive set of unit tests, including a number tests which hit the DB. While I generally consider unit tests a design-tool, there’s no doubt that they provided a significant sanity check.

So, what problems did I run into?

Given the use of NHibernate, the application side of things was migrated quiet easily. However, I did have to modify a few things. This is possibly because of flaws in my code (vs limitations of NHibernate) – hopefully someone will point such flaws out.

Unique Constraints

I use unique indexes to ensure that things like user emails are unique throughout the system. However, I haven’t yet figured out a way to catch such exceptions in a db-agnostic manner. My code looked something like:

try
{
    Session.SaveOrUpdate(user);
    transaction.Commit();
    return true;
}
catch (GenericADOException ex)
{
    transaction.Rollback();
    var sql = ex.InnerException as SqlException;
    if (sql != null && sql.Number == 2601)
    {
        return false;
    }
    throw;
}
catch (Exception)
{
    transaction.Rollback();
    throw;
}

Obviously this code isn’t going to work with the PostgreSQL driver. I keep thinking there *is* a way to handle this across multiple databases, but haven’t found the answer yet.

Count – Long vs Int

When I fetch the number of rows for a query – more often than not to do paging – i do a select count(*) with a UniqueResult<int>(). However, PostgreSQL’s count(*) returns a long, while SQL Server returns an int (I’ve run into this before, since sqlite also returns a long). This causes a cast exception. Again, I’m sure there’s a simple solution to this inconsistency.

Bit and Booleans

There was one specific query which tried to retrieve all active users. The IsActive field was declared as a bit in SQL Server (and a boolean in PostgreSQL). The following query didn’t work in PostgresQL “from User where IsActive = 1″. Why? because 1 isn’t a valid boolean, and you’ll get a database error that an integer can’t be converted to a boolean. I had to change the query to use ’1′ (in single quotes) or one of the other valid boolean values for true.

Connection String

Its also worth mentioning that the connection string example listed here, didn’t work for me using the latest Npgsql drivers. Instead of Initial Catalog, I had to use the Database key.

Aside from those little issues, the code was up in running in about 10 minutes. Most of my time was actually spent getting to know PostgreSQL. Here’s what I’ve learned so far.

Learning PostgreSQL

  • PostgreSQL doesn’t consider two null values to be the same, meaning you can put a unique constraint on a nullable column without any problems. In SQL Server you need a Nullbuster column to achieve this same result (its a neat trick, but its neater not having to use it).
  • PostgreSQL’s uniqueidentifier type is called UUID, a bit is a boolean, and a datetime is a timestamp.
  • All of PostgreSQL’s text types are unicode by default (so there is no nvarchar you simply use varchar and get the same benefits)
  • PostgreSQL’s handling of auto incrementing (identity) is pretty different. Instead of declaring a column as int not null identity(1, 1) primary key, you define it as serial not null primary key. “serial” is just some syntactical sugar around what actually happens – PostgreSQL generates a sequence table for that column, and sets the columns default value to the sequence’s next value. Of course, identity(1, 1) is just some syntactical sugar around however SQL Server implements the feature – its just more transparent in PostgreSQL.

Hopefully as the weeks go by, I’ll have more information to share about the migration, and what I’ve learned.

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

30 Responses to Migrating to Postgresql with my friend NHibernate

  1. karl says:

    @Dermot:
    Had to try, SQL Server doesn’t like that. it’s possible nhibernate would translate it properly though (but I doubt it). Good to know that’s the preferred way for postgresql though, thx!

  2. Dermot says:

    On the true/false query syntax – will SQL Server (which I’ve never used) not accept syntax like “from User where IsActive”? That is, use your boolean as input to the conditions in the where clause? This is how PostgreSQL “expects” you to do it, and I’d have expected it to be the more portable approach.

  3. karl says:

    @Dave
    I’ve posted a follow up at http://codebetter.com/blogs/karlseguin/archive/2009/10/24/postgresql-day-2.aspx#398028

    Also, postgresql does have partial indexes. PostgreSQL (and MySQL) also have a hash type index, which is great when ur looking something up by equality – like a user by email.

  4. Dave F. says:

    @Karl, would like to hear more about your experiences. I want to try this out in a VM.

    @Harry, MSSQL has indexed views also and I was surprised at the optimizer’s sophistication on when to use it versus using the underlying table’s indexes. MSSQL 2008 also has filtered indexes, that can help further tune the app.

  5. Harry M says:

    Its worth mentioning that MSSQL has indexed views which get auomatically substituted into queries by the query optimizer.

    The reason this is important is that when you find a query is slow and the data would be faster to query denormalised, you can do this by writing a view. With other DBs you have to do something like change your code to point at a table holding a materialized query result, or change your code to have denormalized values on other tables.

    *and I think oracle too

  6. Andrew says:

    @Dmitriy

    All I can say is try it then. :)

    After doing nothing but RoR for a while (for new website development), I started a new website in ASP.NET MVC for a friend and just kept thinking to myself “Why are they making me do all this work!”. Once you’ve gotten used to Convention of Configuration, going back takes a while to get used to.

    That is why I think Microsoft completely misjudged MVC. MVC is a tool to handle a specific problems, but they made it so generic in hopes to handle more problems, that the solution itself became inefficient.

    You miss stuff like GUIs for all your tools, but you get over it rather quickly once you start getting into it.

  7. @Andrew,
    That is what I read everywhere on the net :)
    Just can’t believe that Python+Django is so much better.

    Obviously I have to give it a try. But really looking for good comparisons before that or maybe just to .NET minded and to tight to compile-time checking…

    @Karl,
    Waiting for the blog post from you about that. Will read it with pleasure.
    Thans for the link. I read it.
    Still not blown away with that :)

  8. karl says:

    @Dmitriy
    The topic is something that I hope to blog a lot about as I use the technologies more and more for real production use (who knows, maybe I’ll change my mind). But its really a double whammy.

    Fisrt, dynamic languages make coding far more natural. You aren’t fighting with the language or the compiler, things tend to be far less verbose, and you have the power to do things simply not doable in strongly-typed languages. Things like method-missing and duct typing are the tip of the iceberg.

    Then you have the frameworks, which are far more complete, and far more focused. You might think that the .NET way, of having each technology in its own silo, is best. But the truth is that having a fully integrated stack, from data, to domain to presentation is far more productive.

    I strongly urge you to go through the 4 part django tutorial, its a real gem and will really help you become a better programmer, even if you stick with .NET. It’ll open up the box.
    http://docs.djangoproject.com/en/dev/intro/tutorial01/#intro-tutorial01

  9. karl says:

    @betrand
    I know that on WinMo we need to be careful of concurrency with sqlite, but that might be more of a platform issue. I must say, proving that a product is TOO conservative, isn’t something I’ve spent too much time testing against…sorta refreshing.

    @DannyT
    We aren’t live yet, so migration wasn’t a huge concern. We do have a “beta” db that we did migrate over. I generated INSERT statements using a free tool that I found called SQLDumper from Ruizata Project, and ran the output through a regex text editor (replace “dbo.” and stuff like that), and then simply imported in to the new db. Had we had to do a hot swipe, I would have had to give it a lot more thought.

  10. Andrew says:

    Dmitriy,

    I can’t speak for Karl obviously, but I think once you’ve done you’ve tried RoR or Django it’s difficult to go back to ASP.NET MVC since it feels so incomplete compared to them. FWIW, I really think that Microsoft lost the plot when they put out MVC.

    I use .NET for work, but for my personal projects (some of which are quite large) I’ve started distancing myself from it and I’m not sure I’ll ever go back.

  11. Mark says:

    @Karl – Use Spring.Net and you can get rid of your exception handling. http://www.springframework.net/doc-latest/reference/html/orm.html#orm-hibernate-daos

    And as was said above, use True/False replacement

    In the past, I have had to switch from Sybase to DB2. We had to change column and table names too. Time was measured in hours at most.

  12. Karl,
    Just curious why are Python or Ruby are better/more productive languages for you?

    I saw a lot of talks about them lately but haven’t got my hands to try them.

    Do you have some comparisson for them and .NET langs?

    Cheers.

  13. @Danny: well, what we did has no universal virtues whatsoever so I’d encourage anyone interested to do their own benchmarking with use cases that are relevant to them. We basically hammered a typical application with requests and measured how many per second it actually handled. We were very very surprised by the results, as we had similar preconceptions about it. But really, don’t take my word for it, do your own testing.

  14. DannyT says:

    @Bertrand really? What sort of volumes are you talking about here? That’s quite fascinating as I’ve always been under the impression that SQLite would just fold under anything more than a handful of users.

    @Karl, thanks for sharing, would also be interested in any benchmarks you get round to. Also, how did you go about data migration? Assuming that wasn’t too large a hurdle this exercise really does tout the ROI of using NH.

    “Oh so you spend $x thousand on SQL licensing per year, give me a day and i’ll reduce that to $0, split the difference?”

  15. @karl: yes, we had noticed that too but have you tested their assertion? When we did our own performance tests, including with high traffic, and surprisingly even with high concurrency, we found it to behave remarkably well, actually lightyears ahead of PostgreSQL, despite the known locks that it has because of its design.

  16. karl says:

    @Betrand
    Tthe sqlite website itself very strongly urges developers to look at alternatives when dealing with high concurrency or high volume websites – which is what this application is.

    SQLite is a great database with a number of usage. But where I’d consider Mysql/PostgreSQL/SQLServer/Oracle/XXX generalized DBs, I’d consider SQLite specialized.

  17. So why not SQLite?

  18. karl says:

    @Matthieu
    The move away from .NET isn’t only driven by costs. Its something I rather blog about in the future, when I’ve actually done something about it. In short though, I just find it a lot funner and productive to write in Python or Ruby. I think the languages and frameworks are much better.

  19. Matthieu says:

    Thanks for the details.

    So It’s not only the cost of the RDBMS which is too high ? Have you evaluated the Mono option ?

  20. karl says:

    @Matthieu
    The web app will stay on a windows server, hitting a linux postgresql db.

    The goal is to release a new major portion “phase 2″ (which is conveniently stand-alonish) in Python (Django), and then look at going back to the existing “phase 1″ and converting it from .NET to python.

    The reconversion of phase 1 is a long term plan that may or may not happen. The development of phase 2 in django (and subsequent phases) is something we are very committed to.

    In other words, the goal is to move everything to a LAPP stack.

  21. Matthieu says:

    You develop under Windows with Visual Studio and deploy on Linux with Mono ? It’s an ASP.NET MVC / NHibernate application ?

    What kind of web server do you use with ASP.NET on Mono ? It will be very interesting to get more details !

  22. (Other) Rob says:

    I really wonder why it’s taken so long for people to who want a “free” option to turn to PostgreSQL. I’m not bashing MS SQL (I use it all the time, and I don’t have idealistic opositions to MS – i love MS SQL, actually), but if your project needs a cheaper option, PostgreSQL is the way to go. I have no idea why MySQL got so popular. PostgreSQL was far more mature and fully-featured than MySQL ever was, and unlike the “sort-of-free” nature of MySQL, PostgreSQL was always fully free (as in Free Beer, not just Free Speech).

  23. karl says:

    brad
    The argument is that nice things don’t cost money. PostgreSQL and MySQL aren’t compromises, they are full featured, equally capable and equally proven solutions.

    The argument against SQL Server is why pay for something when you can get the same for free? Would you pay for SQL Server is Microsoft gave it to you for free? Of course you wouldn’t. The only reason to use SQL Server is familiarity, which is the only weak argument I’ve seen actually made.

  24. brad says:

    So your argument against SQL Server comes down to “nice things cost money”? Very weak indeed.

  25. karl says:

    @dave
    I didn’t even think of it. Looking into it now though, but likely to stick with Postgres

  26. dave-ilsw says:

    Did you consider using Firebird?

  27. Symon Rottem says:

    We normally use query.substitutions in config to remap the true/false values (eg. true=1, false=0). This could probably be usedfor strings just as easily. It also means you can use the keywords true and false in your queries so they’re a bit more meaningful.

  28. karl says:

    @Dan:
    No performance numbers. I’m running linux/postgresql on a dual core atom with 4gb of ram, and should be getting a twin machine next week. I’ll put windows/sqlserver on it and likely run simplistic performance scripts against the two. Nothing noticable with the unit tests yet (1 integration tests used SqlBulkCopy, which I’ll have to replace, and am expecting it to run slower).

    @Rob
    Right, same boat…going to a free solution that runs on a free OS, liberates your scaling and hosting options. I can now focus on a host that has cheap RAM/RAID upgrade versus worrying about finding a host that has cheap MS licences (i.e., I’m no longer limited to a bizspark vendor).

  29. Rob says:

    Nice article. I’ve currently got 1 SQL Server db as part of my hosting, but as you mention it is costly. I don’t want the expense of buying a second db, so it restricts me to putting everything in the one db!

    I’ll keep watch on how things go. I’d be tempted to switch to a cheaper alternative should my project ever change from a learning platform to something I actually deploy.

    Nice to see NHibernate plays pretty well with it.

  30. Dan says:

    I’m curious if you’ve done any benchmarking of your application running on PostgreSQL vs SQL Server? You mentioned you had some unit tests that hit the database, have you seen any noticeable difference in execution times for them?