Last night I decided to accelerate our plan to migrate from SQL Server to PostgreSQL and gave it a quick attempt to see what the work involved was. I was hoping that due to our usage of NHibernate, and by having meticulously written database agonistic code the migration would be cake.
Turns out I was right and although there were a couple bugs along the way, the process took less than 30 minutes. Today I want to talk about the small issues we faced in migrating our code, as well as provide an overview of what I've learned about PostgreSQL so far.
First though, why migrate? I want to avoid getting rantish [in this post], but the cost of ownership for SQL Server isn't remotely close to being competitive. SQL Express is great, provided you can live with (or are willing to) its numerous constraints. Bizspark (I only bring it up because I feel someone will mention it in a comment) is a subprime mortgage - it lulls you into a sense of free only to nail you with licensing cost down the road. I'm sure someone can quote some obscure feature that might be needed by some system, but for the rest of us, the only reason to use SQL Server is familiarity. Familiarity is a pragmatic and reasonable reason to pick a technology, but only for so long.
Secondly, why PostgresQL? Well, I've used sqlite and MySQL extensively in the past (including submitting patches to their .NET drivers). However, MySQL is currently in a weird state (the sun/oracle thing) and I need ACID compliance and my experience with InnoDB has been average.
Thirdly, its worth pointing out that the system being migrated has a comprehensive set of unit tests, including a number tests which hit the DB. While I generally consider unit tests a design-tool, there's no doubt that they provided a significant sanity check.
So, what problems did I run into?
Given the use of NHibernate, the application side of things was migrated quiet easily. However, I did have to modify a few things. This is possibly because of flaws in my code (vs limitations of NHibernate) - hopefully someone will point such flaws out.
Unique Constraints
I use unique indexes to ensure that things like user emails are unique throughout the system. However, I haven't yet figured out a way to catch such exceptions in a db-agnostic manner. My code looked something like:
try
{
Session.SaveOrUpdate(user);
transaction.Commit();
return true;
}
catch (GenericADOException ex)
{
transaction.Rollback();
var sql = ex.InnerException as SqlException;
if (sql != null && sql.Number == 2601)
{
return false;
}
throw;
}
catch (Exception)
{
transaction.Rollback();
throw;
}
Obviously this code isn't going to work with the PostgreSQL driver. I keep thinking there *is* a way to handle this across multiple databases, but haven't found the answer yet.
Count - Long vs Int
When I fetch the number of rows for a query - more often than not to do paging - i do a select count(*) with a UniqueResult<int>(). However, PostgreSQL's count(*) returns a long, while SQL Server returns an int (I've run into this before, since sqlite also returns a long). This causes a cast exception. Again, I'm sure there's a simple solution to this inconsistency.
Bit and Booleans
There was one specific query which tried to retrieve all active users. The IsActive field was declared as a bit in SQL Server (and a boolean in PostgreSQL). The following query didn't work in PostgresQL "from User where IsActive = 1". Why? because 1 isn't a valid boolean, and you'll get a database error that an integer can't be converted to a boolean. I had to change the query to use '1' (in single quotes) or one of the other valid boolean values for true.
Connection String
Its also worth mentioning that the connection string example listed here, didn't work for me using the latest Npgsql drivers. Instead of Initial Catalog, I had to use the Database key.
Aside from those little issues, the code was up in running in about 10 minutes. Most of my time was actually spent getting to know PostgreSQL. Here's what I've learned so far.
Learning PostgreSQL
- PostgreSQL doesn't consider two null values to be the same, meaning you can put a unique constraint on a nullable column without any problems. In SQL Server you need a Nullbuster column to achieve this same result (its a neat trick, but its neater not having to use it).
- PostgreSQL's
uniqueidentifier type is called UUID, a bit is a boolean, and a datetime is a timestamp.
- All of PostgreSQL's text types are unicode by default (so there is no
nvarchar you simply use varchar and get the same benefits)
- PostgreSQL's handling of auto incrementing (identity) is pretty different. Instead of declaring a column as
int not null identity(1, 1) primary key, you define it as serial not null primary key. "serial" is just some syntactical sugar around what actually happens - PostgreSQL generates a sequence table for that column, and sets the columns default value to the sequence's next value. Of course, identity(1, 1) is just some syntactical sugar around however SQL Server implements the feature - its just more transparent in PostgreSQL.
Hopefully as the weeks go by, I'll have more information to share about the migration, and what I've learned.
Posted
Thu, Oct 22 2009 9:23 AM
by
karl