Metsys.Bson – the BSON Library

Earlier this month I detailed the implementation of the bson serialization we used in Norm – the C# MongoDB driver. I’ve since extracted the serialization/deserialization code and created a standalone project for it – in the hopes that it might prove helpful to someone. If you need an efficient binary protocol to transfer data, look no further.

There are two methods you need to be aware of: Serializer.Serialize and Deserializer.Deserialize.

User u1 = new User{...};
byte[] bytes = Serializer.Serialize(u1);
User u2 = Deserializer.Deserialize<User>(bytes);

You can also configure some behavior:

BsonConfiguration.ForType<User>(t => t.UseAlias(u => u.Id, "_id").Ignore(u => u.FullName).IgnoreIfNull(u => u.Status));

IgnoreIfNull exists because there can be a different meaning to a value assigned to null and no value (something we run into when implementing the Map Reduce functionality of Norm). The other two are hopefully more obvious.

Json.NET also provides a BSON implementation. While it may have more feature (I’m going to assume), a simple test showed Metsys.Bson running about 4 times faster.

You can grab it from: http://github.com/karlseguin/Metsys.Bson

Posted in Uncategorized | 2 Comments

ASP.NET Performance Framework

At the start of the year, I finished a 5 part series on ASP.NET performance – focusing on largely generic ways to improve website performance rather than specific ASP.NET performance tricks. The series focused on a number of topics, including merging and shrinking files, using modules to remove unecessary headers and setting caching headers, enabling cache busting and automatically generating cache busted referneces in css, as well as an introduction to nginx.

Yesterday I managed to put a number of those things together into a framework which I hope will make it easier for developer to leverage these best practices. The project is called Metsys.WebOp, and you can download it from github. It comes with a sample application, which not only shows how to use the framework, but also has all the documentation you’ll hopefully need.

The first part of the framework is a console application which is meant to be run as part of your build process. This is driven by a text file which supports 4 commands – merging files together, shrinking them, generating cache busting hashes and pre-zipping files. Here’s what the sample app’s command file looks like:

#combine JS files
combine: js\all.js: js\jquery.js, js\jquery.rollover.js, js\web.js

#combine css files
combine: css\merged.css: css\reset.css, css\main.css

#shrink the generated files
shrink: js\all.js, css\merged.css

#generate cache-busting resource hashes
busting: hashes.dat

#pre-zip our files
zip: js\all.js, css\merged.css

The next part is meant to be used from within a MVC application (it wouldn’t take too much effort for someone to make it work with WebForms) – first by allowing you to configure the runtime component, and then by providing extension methods to HtmlHelper. Essentially this gives you 4 methods, Html.IncludeJs, Html.IncludeCss, Html.Image and Html.ImageOver. You can also toggle debug mode, which’ll make all of this transparent during development (nothing worse than dealing with merged and shrank files in development).

The last part are a couple HttpModule which make everthing possible. The Zip module will returned the pre-zipped files (generated by the build process) should the browser accept zip files. The WebOp module will remove unecessary headers and add caching headers to js, css and images – only really practical if you are also using the cache busting featuers.

You can download the project from http://github.com/karlseguin/Metsys.WebOp.

You might also be interested in checking out the mscd project, which does a lot of the same stuff, but is probably more mature.

Posted in Uncategorized | 9 Comments

WebForms vs MVC (again)

There’s a new video up on www.asp.net which aims to help developers pick between ASP.NET WebForms and ASP.NET MVC. The video boils down to 5 benefits per technology which Microsoft thinks you should consider.

Let’s go over the points, shall we? First, ASP.NET WebForms:

1 – Familiar control and event base programming model

The claim here is that the ASP.NET model is comfortable for WinForm programmers (thankfully this unbiased analysis left out who it’s more familiar for). This is largely accurate, but disingenuous. The differences between web and desktop cannot be overstated nor can one overstate how bad ASP.NET’s (or any other framework) is at hiding the difference. “Familiar” is probably the right word to use so long as you recognize that, in this case, at best it means: superficial; at worst: a serious pain in the ass. Your knowledge of building a VB6 app will allow you to write a “Hello World” web application – great.

Familiarity can be a liability when it tries to force a square peg into a round hole.

It also largely relies on your inability (or unwillingness) to learn. Today, next month or even next year may not be the right time for you to learn something new – that’s fine. Eventually, sticking with what you know, only because you know it, will kill your career and possibility part of your spirit.

2 – Controls encapsulate HTML, JS and CSS

It’s true that in ASP.NET WebForms controls can, and frequently do, encapsulate HTML, JS and CSS. How this ads “value” is beyond me. You can’t, and shouldn’t, be trying to build website without a solid command of HTML, JS and CSS. Whatever programming language and framework you use, the ultimate output of any website is HTML, CSS and JavaScript. Your server code essentially generates a stream of characters, which a browser loads and renders. To suggest, or think, that generating HTML, CSS or JavaScript in C# has any advantage is insane. It’ll be more complicated to learn, do and maintain – and the end result will be inferior. Its like saying we should write C# in VB.NET; or drive cars by bolting planes to the roof and getting in the cockpit.

3 – Rich UI controls included – datagrids, charts, AJAX.

Point 3 is a different perspective on point 2, which is a different way of saying point 1. However, it is the most interesting and important perspective. Fancy tables and charts, as well as client-side behavior, shouldn’t be a server-side concern. This is fundamental to what we all know about good and bad design. Classic ASP was a mess because it intermingled presentation code with server side code. The value of WebForms is that presentation logic is now a server side concern. Do you really believe this? Would you consider generating your HTML from a stored procedure?

The claim also implies that by using ASP.NET MVC, you won’t be able to have a rich UI. In truth, you won’t only have access to a wider range of controls; you’ll also avoid a bunch of poor abstractions, and generate JavaScript by writing JavaScript, css by writing css and html by writing html.

4 – Browser differences handled for you

I’m guessing that the claim is that some of the controls mentioned in point 3 might render different HTML based on the requesting browser. Guess what, most jQuery (or any other js framework) plug-ins are fully compatible with all relevant browsers because they too can generate different HTML. In fact, doing this on the client side is almost always better – since you can tell the exact capabilities of a browser.

Also, it would probably be better if you generated correct HTML, CSS and JS in the first place – something you can’t normally control using ASP.NET WebForms. So not only is this really just a benefit because IE is a pain, but its only worth mentioning because points #1, #2 and #3 mean that you’ve lost complete control over doing it right in the first place.

5 – SharePoint builds on WebForms

Yes, if you use SharePoint, you’ll have to use WebForms.

Now on to ASP.NET MVC:

1 – Feels comfortable for many traditional web developers

ASP.NET WebForms is familiar while ASP.NET MVC is comfortable – that’s helpful. Nonetheless, when I see “traditional” I think “not-modern”. A more honest counterpoint to the WebForms claim would be: “A more natural way to build web applications”. WebForms tried to help WinForm developer’s transition to the web. ASP.NET MVC is a model that better reflect the realities of programming on the web. It’s more than just comfort, and has nothing to do with tradition.

2 – Total control of HTML markup

HTML, JS and CSS are yours to command. That doesn’t mean you can’t use controls to speed up development and improve your applications. The way this is worded sure makes it sound like MVC is a lot more work than WebForms though. It isn’t.

3 – Supports Unit Testing, TDD and Agile methodologies

I’m not sure what the technology stack has to do with the development methodology, so we’ll just ignore the last part. That aside, its true that MVC makes it possible to unit test your code. The counter point to that is that WebForms is essentially impossible to unit test. This also understates the architectural superiority and design of MVC – its doesn’t only allow you to leverage a number of best practices, it itself is actually built around those same practices. Code that can be unit tested, regardless of whether it is or not, is almost always superior to code that cannot.

4 – Encourages more prescriptive applications

So if ASP.NET MVC lets you build you application the way you should be building it, should you infer that ASP.NET WebForm forces you to build applications the wrong way? Yes, you should.

5 – Extremely flexible and extensible

Both frameworks share this value – but ASP.NET MVC is more about building on top of existing code, while ASP.NET WebForms is more about hacking things until they work. If you think this means that ASP.NET MVC can only be useful once you’ve extended it, then you are wrong. It works great out of the box is is feature rich.

Other Stuff

The video goes on to make weird assertions, like the possibility of turning back and picking a different stack if you feel you’ve made the wrong choice because of how similar and how much infrastructure they share. The better solution is to pick the right technology because going back months or years into your project doesn’t sound like good advice to me.

It also mentions that it’s common to have some pages handled by MVC and others by WebForms. It’s good to know that you can do this – especially since it’s a good way to upgrade from WebForms to MVC. However, I’d hardly call it common or even recommended. It’s a useful transitional tool which you should aim to get out of as quickly as possible.

Ultimately, the first 4 values of WebForms all boil down to the same thing: there’s a System.Web.UI namespace which represents the wrong way to build a web app. There are good reasons to pick WebForms – but they all come down to time and practicalities of learning new things (and SharePoint). I won’t tell you that you have to learn MVC, because that may not be practical for you. I’ll repeat what I’ve said before, ASP.NET MVC and WebForms DO NOT serve different purposes and one is not better suited for a particular type of application than the other(except SharePoint). They are completely overlapping technologies, and ASP.NET MVC is superior to WebForms.

Posted in Uncategorized | 31 Comments

The 8th Phase

I once posted a semi-serious post entitled The 7 Phases of Unit Testing. The phases are:

  1. Refuse to unit test because “you don’t have enough time”
  2. Start unit testing and immediately start blogging about unit testing and TDD and how great they are and how everyone should do it
  3. Unit test everything – make private methods internal and abuse the InternalsVisibleTo attribute. Test getters and setters or else you won’t get 100% code coverage
  4. Get fed with how brittle your unit tests are and start writing integration tests without realizing it.
  5. Discover a mocking framework and make heavy use of strict semantics
  6. Mock absolutely everything that can possibly be mocked
  7. Start writing effective unit tests

I think the cycle I went through is extremely healthy – unit testing is something best learnt from practice and is something you refine over time. Judging by the comments on the original post, a lot of you agree.

Recently though, I’ve felt like adding another stage:

8 . Sometimes the best tests aren’t unit tests

After awhile, it becomes obvious that some tests are significantly more meaningful when you expand your scope – say to include hitting an actual database. Narrow unit tests and wider integration tests can always work together; but I’ve found that, in some cases, more comprehensive tests can replace corresponding unit tests. This may not be a proper, but it is practical.

When I say unit test, I mean the smallest possible unit of code – generally a behavior. Most methods are made up of 1 or more behavior. I’d say that you shouldn’t have too many methods with more than 6 behaviors (as a rough goal). As an obvious example, in NoRM this method helps identify the type of the items in a collection:

public static Type GetListItemType(Type enumerableType)
{
    if (enumerableType.IsArray)
    {
        return enumerableType.GetElementType();
    }
    if (enumerableType.IsGenericType)
    {
        return enumerableType.GetGenericArguments()[0];
    }
    return typeof(object);
}

Clearly, this method is a good candidate for 3 or 4 unit tests (one when the type is an array, one when it’s a generic, one when its something else, and maybe one when its null).

As your code moves closer to the boundaries of 3rd party components, the value of unit testing may suffer. You’ll still get the benefits of flushing out coupling and enabling safe refactoring (which shouldn’t be underestimated), but you’ll likely miss out on making sure things will work like they should in production. The solution can be to expand the scope of your tests to include the 3rd party component.

The most common example is database code. Testing a Save method by mocking the underlying layer might work, but there’s value in making sure that the object actually does get saved. That isn’t to say that a single test that hits the database is good enough – your Save method might be made up of multiple behaviors, some which are better validated with one form of testing than another.

Really, that’s one of the key things to remember as you walk down this path – don’t think that just because your method is actually saving an object that your job is done. There are likely other behaviors that aren’t being tested at all. It’s easy to abuse these types of tests and get a false sense of security. The other key is to make sure that it runs like a unit test – namely, that its fast, doesn’t require any manual setup, and isn’t dependent or doesn’t break any other test.

Lately, I’ve seen interest in using in-memory databases for this type of thing. The benefit is that they are super fast and don’t leave stale data. They also don’t require special setup. On the downside you still aren’t truly testing the most fundamental behavior of your method – that an object will be saved to the database in production. Even with the best O/R tool I’ve seen code work against one database but not work against another – due to a bug on my part. Writing a script that can automatically and quickly setup and teardown against the final database, and having your team members set up a local database, may or may not work for you (it’ll depend on the nature of your team and your system).

Ultimately, the most important thing is that you have automated tests which aren’t a nightmare to setup, maintain or run. Integration tests have more dependency and thus are more fragile, but can be an efficient way to verify correctness.

Posted in Uncategorized | 9 Comments

BSON Serialization

BSON is a binary-encoded serialization of JSON-like documents, which essentially means its an efficient way of transfering information. Part of my work on the MongoDB NoRM drivers, discussed in more details by Rob Conery, is to write an efficient and maintainable BSON serializer and deserializer. The goal of the serializer is that you give it a .NET object and you get a byte array out of it which represents valid BSON. The deserializer does the opposite – give it a byte array and out pops your object. Of course, there are limits to what they can do – they are mostly meant to be used against POCO/domain entities.

Grammar
The first thing to understand when building serializers is how to read grammar. In programming languages, grammar is a way to express the valid keywords and values a parser might run into. Both the JSON and BSON grammars are great to learn, given how simplistic yet powerful they are. The JSON grammar, available on the homepage of json.org gives a nice representation of what valid JSON should look like. The BSON grammar, available at bsonspec.org under the specification button, follows a more traditional dialect. Essentially, you have symbols on the left and expressions on the right. The expressions can, and often will be, made up of additional symbols and or actual values. Eventually though, you’ll end up with a symbol which is only made up of values – which means you can stop going down the rabbit hole. Its also very common for a child symbol to reference a parent symbol – but eventually something breaks this cycle.

An Example
So, say we wanted to serialize the following json:

{"valid": true}

Everything in BSON starts with a document. From the bson specification, we can see that a document is made up of a 32bit integer (representing the total size of the document, including the integer itself), another symbol called an e_list, and finally a termination character. As a start, we’d have something like:

Now, an e_list itself is made up of a symbol called an element followed by another e_list or an blank string. An element is made up of a single byte type (with \x08 representing a boolean), a symbol called e_name and a byte value for true or false. So now we have:

The only thing missing now is our e_name (which represents the word “valid” in the original JSON). An e_name is really just a cstring which is our value UTF8 encoded into an array of bytes with a trailing byte of \x00:

Our final byte array looks something like:

Serializing a single bool value might be the simplest of cases, but once you understand that, you’re well on your way to being able to serialize anything. Sure, serializing an array might be a bit trickier, since each element within the array is its own document – but the challenge is mostly implementation versus conceptual.

What’s the Length?
It may surprise you at first, but the most difficult part to implement is actually determining the length of a document (or various other symbols which have a length). The problem is that we don’t know the length until after we’ve serialized it. Some implementations will essentially serialize the object graph twice, first to calculate lengths, then to write out the array. In NoRM we do things more efficiently. We keep a linked list of documents, and a pointer to the current document. A document is a very simple object – it keeps track of where it started, who its parent is (null in the case of the root document) and how much data was written. When a new document is needed – say when we start serialization, or when the grammar dictates that we need a new document (arrays or nested objects), we mark where we are and write out a placeholder length. Then, when the document ends, we seek to our placeholder, and write out the length. The relevant code looks like:

  private void NewDocument()
  {
      var old = _current;
      //we start the Written at 4 because of the length itself
      _current = new Document { Parent = old, Start = (int)_writer.BaseStream.Position, Written  = 4};      
      _writer.Write(0); //length placeholder
  }
  private void EndDocument(bool includeEeo)
  {
      var old = _current;
      if (includeEeo)
      {
          Written(1);
          _writer.Write((byte)0);
      }            
      _writer.Seek(_current.Start, SeekOrigin.Begin);
      _writer.Write(_current.Written); //override the document length placeholder
      _writer.Seek(0, SeekOrigin.End); //back to the end
      _current = _current.Parent;
      if (_current != null)
      {
          Written(old.Written);
      }

  }
  private void Written(int length)
  {
      _current.Written += length;
  }

EndDocument is pretty interesting. Since the length of a nested document contributes to the length of the parent document, we need to make sure to update the parent (now current) document with the length of the nested one.

Conclusion
Everything else is pretty straightforward in terms of serialization – largely reliant on reflection and reflection helpers. We use Jon Skeets reflection to delegate approach to make things even faster (something I truthfully don’t fully understand). Currently our implementation has some coupling to other NoRM components. Hopefully one day the BSON stuff will be stand-alone. If you can’t wait, you can either use another library like JSON.NET (which more mature anyways), or spend a few minutes (it shouldn’t take more than that) pulling out our serializer/deserializer.

Posted in Uncategorized | 2 Comments