Moving and Moving

A little while ago I decided to move my blog to openmymind.net. I owe a lot to Brendan and CodeBetter, but I have an itch to talk about a broader set of topics than what I feel is appropriate here. I’m also becoming increasingly disenchanted towards MS and .NET and I know the constant nagging is seen as noise by a lot of people. If you’re interested, I hope you’ll follow me there.

Also, in less than 8 days, I’m packing a couple suitcases and moving from scarcely populated Canada to densely populated Hong Kong. I’ll be joining a talented team of Java and .NET developers in the financial industry. The offer was appealing in a number of ways, but their intense focus on TDD, their ALT.NET mentality and their desire to bring in Rails was really appealing.

See you on the virtual and geographical flip-side.

Posted in Uncategorized | 5 Comments

Contributing to OSS, a Git Bootcamp

So you want to contribute to an OSS project, but its hosted on github and you don’t know where to start. This guide will cover the basics you’ll need to get contributing – something made relatively easy by Git itself.

First you’ll need to install a Git client. We’ll be using msysgit, so grab the latest full installer from: http://code.google.com/p/msysgit/downloads/list

Run the installer. I’ve disabled Shell integration (but you don’t have to). What you want to do is make sure you pick Bash Only and Windows Style when those screens come up. After it’s done installing, you should have a Git Bash shortcut on your desktop. Edit the shortcut and change the “Start in” path to where you do most of your work (e.g., most of my source code is in c:\work). Now start it up and you should get a bash shell. For the time being you just need to know that cd works the way it does in dos, and to use ls -l instead of dir.

Now we’ll do some basic configuration. Within the bash, enter:

git config --global user.name "Your Name"
git config --global user.email "your@emailaddress.com"

Next, create a free acccount on github. We now need to create a public/private keypair – don’t get discouraged, you only have to do it once and it isn’t hard. In your bash shell, type:

ssh-keygen -t rsa -C "your@emailaddress.com"

It’ll ask you where to save the key, just hit enter to accept the default value (the value that shown inside the parenthesis). Next it’ll ask you to create a password, so type something in and hit enter then confirm your password.

The last step in getting setup is to tell github about your public key. Go into your Account Settings on github and click the “SSH Public Keys” button and then the “Add another public key” link. Your title can be something like “My Home PC”. In the Key textarea, copy the contents of:

c:\users\YourName\.ssh\id_rsa.pub

If you are using XP than replace c:\users\ with C:\Documents and Settings\. If you still can’t find the file, goto your bash shell and type cat ~/.ssh/id_rsa.pub and copy and paste it from there (you may need to change the property of the window (top-left icon) and in the options enable “Quick Edit Mode”).

Ok, now we can get rolling. In GitHub find the project page that you’d like to contribute to. In this example we’ll pretend you wanna work a bit on http://github.com/karlseguin/Metsys.WebOp.

In the top right, hit the “Fork” button. You’ll be taken to your fork of the Metsys.WebOp project, something like: http://github.com/YourName/Metsys.WebOp

You should also see Private, Read-Only and Http Read-Only buttons. Make sure Private is selected (should be anyways) and copy the URL. Next, go into your bash and type:

git clone THEURL Metsys.Webop

You should now have a copy MetSys.WebOp on your computer. You can open the project, make changes, add files, delete files, whatever.

Rather than continuing linearly through what you do, we’ll switch to talking about what you can do.

First and foremost, the most important command is git status. Type this within the Metsys.WebOp folder (cd Metsys.WebOp) or any subfolder and you’ll see the current state of your repository. If you actually READ everything, it’ll normally tell you what you likely want to do next.

Whenever you add a file or files, you’ll want to type git add . to add all pending files (again, git status will make it clear that files exist which need to be added). The . means that all pending files are added. If you need to add a file which is being ignored (there’s a .gitignore text file in the root of the project which has a bunch of patterns to ignore), you use the git add -f FileName command (you’ll typically use this when you want to add a 3rd party reference .dll).

Whenever you want to commit your changes, type git commit -a -m 'Your Commit Message'. This will commit all files that you have changed.

Committing only updates your local repository. You’ll want to push this back to git hub by using git push (and git pull does the opposite, updating your local copy with the remote (github) one).

Where things get interesting is when you want to actually contribute to another project. Since you’ve pushed to your github repository you could goto your github project and hit the “Pull Request” button. However, to increase the likelihood of having your changes accepted you should do everything you can to make it painless for others to merge your changes in. You do that by first merging their latest copy with your changes (of course, since you just forked the repository there will be no differences in this case).

First you’ll add a reference to the main repository by typing:

git remote add KarlsFork git://github.com/karlseguin/Metsys.WebOp.git

then you pull that repository into your working copy:

git pull KarlsFork master
(master is the main branch, don't worry about branching for now).

Now execute git status

If there are any merge conflicts you’ll have to resolve them (these are typical diff files). Once they are resolved, you re-add them using git add . Next you commit your changes with git commit -a -m 'Merged Karls master', push using git push and you can finally do a Pull request via your proeject GitHub site (if there is a long list of recipients, try to only send the notice to the core developers you know are responsible for it).

By doing a merge before asking for a pull request, you ensure that pulling your changes into the main copy is a 1 command affair.

As you continue to work and move forward, you’ll want to pull from the main frequently, in order to avoid having to do complex merges. If the project is busy, you might even add other forks (using git remote add) and merge with them (sometimes the main fork doesn’t integrate often).

There’s a bunch more stuff you can, and will eventually need to do. You can learn a lot from:
http://learn.github.com/ or from TekPub’s Mastering Git series: http://tekpub.com/preview/git

Posted in Uncategorized | 6 Comments

NoSQL For The Rest Of Us

No one would blame you for strictly associating NoSQL with performance. Most of the back and forth about NoSQL – an umbrella term given for non-relational storage mechanisms – has squarely put the focus on performance, sites with massive traffic, and server farms. It’s an interesting conversation, but one that risks alienating NoSQL from the majority of developers.

The Problem

Does NoSQL provide us simple developers with any tangible benefit? As a matter of fact, it can – one as significant for us as performance is for facebook. First though you need to understand that all of those tools you’ve been using to access your data, such as DataSets, Linq2Sql, Hibernate, NHibernate, EntityFramework, ActiveRecord, SubSonic SQLAlchemy, are meant to help you deal with the well-known object-relational impedance mismatch. The short description is that data stored in code (typically using an Object Oriented approach) and data stored in a relational DB requires coercion to move to-and-fro. The amount of coercion will greatly vary from system to system as will the visibibility (or leakiness) of said coercion (the tool you use has a significant impact on this as well).

A lot of developers don’t feel that object-relational impedance mismatch is really a problem or even exists. That’s only because, as the only solution, you’ve been dealing with it for so long – possibly your entire programming career (like me), that you don’t think about it. You’ve been trained and desensitized to the problem, accepting it as part of programming the same way you’ve accepted if statements.

A Solution?

By changing how data is stored, the object-relational mismatch no longer applies to NoSQL solutions. Of course, just because the object-relation mismatch is gone, doesn’t mean something else hasn’t taken its place. There’s 4 primary storage techniques used by NoSQL solutions. I’m going to look at the two I’m most familiar with: Document (via MongoDB) and ColumnFamily (via Cassandra).

Now I don’t have any NoSQL systems in production. My experience with MongoDB has been as a main contributor to the C# MongoDB driver (Norm) – largely focusing on the underlying communication protocol. I’ve also been writing a sample application as a demo for the driver and am prototyping something here at work (with plans to go into production). My experience with Cassandra is even more limited – having spent this past weekend looking at writing a C# driver for it (Apollo).

What I’ve noticed with MongoDB specifically (and probably document-oriented database in general) is that your data layer practically vanishes. This makes sense given the support for arrays and nested documents, as well as the ability to serialize and deserialize from a non-ambiguous and simple-type protocol like JSON (or BSON). This is huge productivity win for developers – shorter development time, less code and therefore less bugs.

On the flip side, my initial reaction to ColumnFamily storage (and I’d assume Key-Value engines) approach is that its even further away from OO than a relational model – thus the mismatch is even greater. You end up dealing with individual values (or arrays of individual values) and the language of Cassandra bleeds deeply into your application (much like the language of RDBMS bleed in your code when you use DataSets). Again, not a huge surprise sine Cassandra *is* heavily tuned for performance.

The Drivers

Ultimately, the drivers and tools you use to communicate with the storage engine are going to have a significant impact. For example, before Norm, the main way to communicate with MongoDB from C# was essentially through the use of glorified dictionaries. Before NHibernate we were using DataReaders or DataSets. However, the greater the difference between OO and storage model, the greater the complexity and leaks. Also, NoSQL drivers are young and have tons of room to grow, whereas the last hot thing to happen on the RDBMS front was Rail’s ActiveRecord.

Conclusion

Keep in mind that I am biased. Not only is my knowledge of Cassandra limited, but I’ve also had a hand in shaping the MongoDB drivers – obviously MongoDB fits well with my vision of what data access should look like. Maybe as my involvement in the ColumnFamily approach grows, so too will my opinion of the technology. However, it seems pretty clear to me at an implementation level that document-oriented databases (as well as object oriented database, like db40, I’d assume) are relatively close to the OO model used in code, and as such provide the greatest value to programmers.

At this very moment though, risks of new technology aside and the inconvenience of having to learn and grow, I’d say that there is compelling reason for developers to move away from relational storage engines (at least to prototype and play with). Reduced complexity within the application layer, not performance and scalability, is NoSQLs greatest strength.

Posted in Uncategorized | 12 Comments

MongoDB, 5 characters, and a free job board

Today I came across shapado.com – a StackExchange-like open source system running on ruby and mongodb. It took a couple clicks and a few keystroke, and I had http://jobs.shapado.com/ setup and running for free. It was a quasi joke at first, but I figured it might be helpful to get this up and running. So, if you have any jobs to post, or would like to request work, please post away :)

You can also set up your own, or download the source from the seemingly unreliable gitorious.

You can also help out by voting this up on reddit

 

 

Posted in Uncategorized | 1 Comment

Unit Test the Behavior, Not the Implementation

As you write tests, you’ll often come across situations where the code which exibits a certain behavior is different than the code which causes the behavior to exist. Consider a simple case in Metsys.Bson – a BSON serializer. You can configure the serializer and deserializer to use a different name for a property by doing something like:

BsonConfiguration.ForType<User>(t => t.UseAlias(u => u.Id, "_id"));

It turns out that this feature was implemented without ever touching the core serializer or deserializer. Both of those rely on the Name property of a neat class called MagicProperty. The logic behind what the name of the property is was encapsulated within MagicProperty. Therefore the serializer and deserializer were able to work as-is, relying on the Name property to be whatever was right.

The point though is that while the code to make this feature work exists largely in the MagicProperty (and the related TypeHelper), I strongly consider this a behavior of the serializer and deserializer. Therefore, the tests to make sure this works are done against those, rather than MagicProperty and TypeHelpe. Here’s what one of those tests looks like:

[Fact]
public void UsesAliasWhenSerializing()
{
    BsonConfiguration.ForType<Skinny>(t =>
    {
        t.UseAlias(p => p.Nint, "id");
        t.UseAlias(p => p.String, "str");
    });

    var result = Serializer.Serialize(new Skinny { Nint = 43, String = "abc" });
    Assert.Equal((byte)'i', result[5]);
    Assert.Equal((byte)'d', result[6 ]);
    Assert.Equal((byte)0, result[7]);

    Assert.Equal((byte)'s', result[13]);
    Assert.Equal((byte)'t', result[14]);
    Assert.Equal((byte)'r', result[15]);
    Assert.Equal((byte)0, result[16]);
}

There isn’t any mention or references to the implementation. When I talk about writing effective unit testing, this is largely what I’m trying to convey. You should mostly focus on testing what your code is doing, not how it’s doing it. It’s also worth mentioning that TDD can really help you shine here – by writing the test first based on what I felt the behavior was, I didn’t run the risk of getting confused by what the implementation turned out to be. Had I done the implementation first, I might have immediately started testing MagicProperty and TypeHelper and considered my job done – without ever writing a test against the intended behavior.

Posted in Uncategorized | 7 Comments