CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Karl Seguin

.NET From Ottawa, Ontario
  • And we're off...

    Yesterday was my last day at Fuel. It was a hard decision, but one I felt was necessary in order to grow.

    I'm heading back into more enterprise-type of work. I'll essentially be building management systems and services for emergency-room medical equipment. Our teams portion of the work isn't as critical as the medical devices themselves, but there's still an extremely high need for quality.

    From my observations, our field suffers badly from high turnover. Part of that is the economic flux in software development - I think most companies are still grappling to figure out how much a software developer is actually worth to them with respect to an open market. A major problem is how deceivingly simple it is to write software - but how difficult it is to write good software. A young company can start off hiring a group of junior developers at $40K (the going rate in Ottawa as far as I can tell), but as soon as jobs starts to get bigger and expectations greater, they seem unable to make that jump to $80K. We can get 2 programmers for that price - they reason. Of course, as has been shown numerous times before (don't managers read these things?!), top programmers are unbelievably cheap with respect to productivity.

    We generally learn from our failures and success, and by being exposed to a variety of programs and fellow programmers. Stay at one place for too long, and you get really good at doing one thing, but opportunities to learn become few and far apart. I think good developers start to thirst for failure - or at least the risk of failure. I'm not taking about the failure that comes from an unreasonable deadline either, but from true problem solving.

    Which of course leads to another big reason I see developers leaving their jobs - they want to do things differently. The most drastic example is the move to Agile methodologies. It's easy to get caught up in the Agile-hype - in large part because a lot of it is common sense - but the reality is that most companies aren't ready or capable of doing such huge transition. It's far easier to change companies than to try to change a company.

    I think all of these problems can be summed up by companies not understanding software development or developers. Some companies don't even realize they are in the software business...yikes!

    People often say the grass is greener on the other side. But sometimes it really is greener on the other side. I guess we'll soon find out.

  • Custom ASP.NET Page Tracing

    Starting a new large-scale project with an ASP.NET front-end, I wanted to add a mechanism to get constant feedback about what my app was doing. Something similar to the built-in tracing mechanism, but far more lightweight. <%@Page trace="true"...%> is great for specific cases, but it probably isn't something you want to leave on all the time while developer – and even if you do, there's so much noise you might miss out on those vital pieces of information.

    So after googling around for a good solution and being surprised that I couldn't find anything, I decided to write my own. I can't help but think that there's a well known solution that everyone (but me) uses, so if there is, please tell me! Otherwise here are the very simple steps to reproduce what I have so far.

    The approach is simple: store log messages in a List within then HttpContext, and dump it to the page. First, we start off with the class that'll hold each trace message. Within this class we'll hold the actual message, a type (log, debug, error), the elapsed time in milliseconds from the first logged message and the delta in milliseconds from the previous message:

    public class TracerItem
    {
       public TracerItemType Type { get; set; }
       public DateTime Dated { get; set; }
       public string Message { get; set; }
       public int Delta { get; set; }
       public int Elapsed { get; set; }
    
       public TracerItem() {}
    
       public TracerItem(DateTime dated, TracerItemType type, string message, int elapsed, int delta)
       {
          Dated = dated;
          Type = type;
          Message = message;
          Elapsed = elapsed;
          Delta = delta;
       }
    }
    
    public enum TracerItemType
    {
       Info = 1,
       Debug = 2,
       Error = 3,
    }
    

    There's really nothing to explain there. Next, we'll create our static Tracer class, which defined the very important Log method:

    public static class Tracer
    {
       public static List<TracerItem> GetLogs()
       {
          var items = (List<TracerItem>) HttpContext.Current.Items["__tracer"];
          if (items == null)
          {
             items = new List<TracerItem>();
             HttpContext.Current.Items["__tracer"] = items;
          }
          return items;
       }
       public static void Log(string message, params object[] arguments)
       {
    #if TRACER
          Log(message, TracerType.Info, arguments);
    #endif
       }
       public static void Log(string message, TracerItemType type, params object[] arguments)
       {
    #if TRACER
          var now = DateTime.Now;
          var items = GetLogs();
          var delta = items.Count == 0 ? 0 : (int)now.Subtract(items[items.Count - 1].Dated).TotalMilliseconds;
          var elasped = items.Count == 0 ? delta : items[items.Count - 1].Elapsed + delta;
          items.Add(new TracerItem(now, type, string.Format(message, arguments), elasped, delta));
    #endif
       }
    }
    

    The GetLogs method initializes and gets our list of trace messages (this method isn't thread-safe). Notice that I use a preprocessor directive within the Log methods. According to this codeproject article, the JIT compiler will optimize out empty methods. This means that if we don't define the TRACER symbol our log code won't have any impact on performance (it also means that you need to define the TRACER symbol for it to work).

    Next, we can use our classes to actually log events. For example, we might want to trace a specific cache miss:

    var user = (User) Cache.Get(string.Format("userbyid_{0}", id));
    if (user == null)
    {
      Tracer.Log("Cache miss => user by id {0}", id);
      //get the user
    }
    return user;
    

    The only thing left to do is display the actual content. You'll probably want to change this to fit your site, but this is what I did within a UserControl for an ASP.NET MVC site (loaded via RenderUserControl from my master):

    <% var items = Tracer.GetLogs(); %>
    <div id="tracer">
       <div class="item header">
          <div class="index">#</div>
          <div class="elapsed">elapsed</div>
          <div class="delta">delta</div>
          <div class="message">message</div>
       </div>
       <% for(int count = 0; count < items.Count; ++count) { %>
       <div class="item <%=items[count].Type %>">
          <div class="index"><%= count+1 %></div>
          <div class="elapsed"><%= items[count].Elapsed %>ms</div>
          <div class="delta"><%= items[count].Delta %>ms</div>
          <div class="message"><%= items[count].Message %></div>
       </div>
       <% } %>
    </div>
    

    Along with the accompanying CSS:

    #tracer{width:1000px;margin:0 auto;padding:10px;}
    #tracer .item {clear:both;}
    #tracer .header{font-weight:bold;}
    #tracer .item div{float:left;text-align:center;}
    #tracer .Info{color:#888;}
    #tracer .Debug{color:#2B2;}
    #tracer .Error{color:#B22;}
    #tracer .item div.index{width:20px;}
    #tracer .item div.delta{width:90px;}
    #tracer .item div.elapsed{width:90px;}
    

    The whole thing is rendered on top of my header, so I always see what's going on. I am interested in expanding all of this into a stand-alone library. I also want to figure out how to hook into NHibernate's logging and add an "NHibernate" tab so you can see all of the SQL being executed.

  • What's in a Title?

    The relative hype around the Foundation ebook has been pretty fun. Today I noticed a very detailed (and positive) review of the book. Which is, of course, flattering.

    If there's one thing a few people don't care for though, it's the title. They don't feel that it properly captures the spirit of the book, or that it isn't as marketable as it could be. I think if this was a "real" book, the publisher would have insisted on something different - likely with the words "ALT.NET" in big print.

    However, given that it wasn't commercial, I had the luxury of being a little cleverer. The point, which I think most people get, is that this stuff *IS* fundamental. If enterprise developers think "fundamentals" mean if-statements, recursions and hash algorithm, than we're in trouble. IoC might not be what they teach in school - but it should be. Or at least it should be the first thing you teach yourself. It reminds me of the funny pro-Google quote:

    "Google uses Bayesian filtering the way Microsoft uses the if statement"

    If I had to do it again though, I'd probably change the name. My ego can't stand knowing I might have gotten more praise with a better title.

     

    Posted Aug 14 2008, 11:39 AM by karl with 6 comment(s)
    Filed under:
  • Got Shoes?

    An article I wrote for DotNetSlackers on Ruby Shoes just got published. Shoes is a little framework a colleague turned me onto - it's for building cross platform desktop applications on Ruby. I'm not sure its quite ready for prime time, but I do think its a far better way to learn Ruby with respect to something like Rails. Rails is just too complicated and leaks too much to serve as a good learning tool.

    Shoes also has a crazy manual that's completely out there. It almost reads like a bed time story (the first 20 pages are pretty fluff though and can easily be skipped) yet still manages to act as a good introduction and reference manual. Its comparing apples to oranges, but the Shoes manual is now my favorite programming reference (overtaking The C Programming Language).

    Oh, and since we're talking about programming books, the Dragon Book has always been very very low on my list...

  • Back to Basics: LinkedLists

    I tend to subscribe to the belief that programmers with some C background are typically better off than those without. This is largely because C is far less abstract from the underlying O/S and hardware than languages like C# or Java – specifically the memory model. This kind of knowledge is just handy to have – even for day to day programming.

    I've covered fundamental memory topics before, including chapter 7 of the Foundation series (free ebook) as well as with the arrays are just fancy pointers post. It's time to put some of that information to quasi-practical use.

    One of the first real programming classes I had dealt with data structures. Technically all the primitive type you deal with are simple data structures - a System.Int32 is a data structure capable of holding a 4 byte value representing an integer. What I want to talk about though are slightly more complex data structures – Arrays, Linked Lists, HashTables and Binary Trees (maybe). I think it's great that .NET and Java come loaded with a bunch of data structures. While you can probably get away without ever knowing how they really work, there might be instances where it'd be useful to know how things work under the hood – besides, I think this is just really fun stuff to mess around with.

    Arrays

    We've already covered arrays in depth. In case you forgot, the most important point to know is that arrays are cut up from a contiguous block of memory. It's this fact that lets them be indexed speedily using simple pointer arithmetic. If you want to access the 4th array element (index 3), you simply need take your pointer, which is pointing at the start of the array, and increment its memory location by 3 times the size of the type in the array. This explains why an array's dimension needs to be defined as soon as you declare it. It also explains why you can't simply expand the size of an array – there's a good chance that the memory directly next to the array is being taken by something else. The only way to grow an array is to reallocate a larger chunk of contiguous memory and copy the old array into the new one. This is exactly the strategy employed by many .NET classes –including the StringBuilder and ArrayList (if you're interested, open up Reflector, and look at the Capacity property on the ArrayList class).

    Linked Lists

    The fact that .NET's ArrayLists take care of this automatically (and rather efficiently) is pretty impressive. In school though, for reasons I'll guess at later, we didn't build our own ArrayList, but rather built well-known LinkedLists. As of .NET 2.0 there's actually a built-in generic LinkedList (Java has had one for a while as far as I know). In most situations you'll want to use an ArrayList, but LinkedList do serve a unique purpose – and again, I just think it's good stuff to know as it illustrates how memory works. So, let's take a look at how to build a LinkedList in C#. We'll keep it basic, and won't worry about implementing typical collection interfaces (IEnumerable). There's already a working LinkedList, we just want to explore the data structure to get an even greater understanding of programming.

    What Are Linked List?

    If an array can be defined as a data structure based on a contiguous chunk of memory (which is synonymous with fixed-size), then a LinkedList should be defined as a data structure based on dispersed chunks of memory. As such, LinkedLists can grow to unlimited sizes (only limited by available memory). The fundamental aspect of a LinkedList is that each element within the list points to the next element in the list. This is necessary since items aren't placed one after the other and simple pointer arithmetic's won't work - you can't just do pointer + sizeof(int) to figure out where the next element is - it could be anywhere!

    How do we do it?

    Building a linked list is pretty simple. We'll need a generic LinkedList class which will expose our Add and Remove method (and Find and whatever else we want), as well as a LinkedListItem class, which will wrap the added item and include a reference to the next item.

    Here's a memory representation of what it'll look like:

    Linked List memory 1

    Notice that our containing LinkedList class has a reference to our first item within the linked list. This is necessary otherwise we'd have no point of access to any of the items.

    LinkedList v1

    For the first version we'll focus on getting the core components - it won't be able to do anything, but it will set the groundwork for v2.

    public class LinkedList<T>
    {
       private LinkedListItem _first;
    
       public LinkedList(){}
       public LinkedList(T firstItem)
       {
          _first = new LinkedListItem(firstItem);
       }
    
       private class LinkedListItem
       {
          public readonly T Item;
          public LinkedListItem Next;
    
          public LinkedListItem(T item)
          {
             Item = item;
          }
       }
    }

    All we can do with this code is instantiated a new version of the LinkedList class and pass in the first element - we can't add more items, delete items or retrieve them. However, it does illustrate the purpose of the LinkedListItem class as a wrapper.

    LinkedList v2

    For this version of our list we'll add the ability to add new items to the list by adding two methods. We need an Add method to actually add a new element to our list, but we also need a method to find the last element of our list.

    public void Add(T itemToAdd)
    {
       if (_first == null)
       {
          _first = new LinkedListItem(itemToAdd);
          return;
       }
       LinkedListItem last = FindLastItem();         
       last.Next = new LinkedListItem(itemToAdd);
    }
    
    private LinkedListItem FindLastItem()
    {
       if (_first == null)
       {
          return null;
       }
       LinkedListItem item = _first;
       while (item.Next != null)
       {
          item = item.Next;
       }
       return item;
    }

    There's a very common pattern with LinkedList - walking through the elements. To find the last entry - the one where Next is null, we need to go through each item. We can't just index a spot and jump to it.

    LinkedList v3

    You wouldn't know it from the above code, but one of the things LinkedList excel at is speedily inserting records. In fact, high insert/retrieval ration is probably the only good reason to use them on a modern system. They way we've coded it now, inserts are actually a pretty slow linear operation - the longer the list gets, the longer it takes to find the last element. However, with the simple addition of a reference to the Last item we can make insertion a constant and very fast operation. Here's a visual representation of the memory for our improved LinkedList:

    Linked List memory 2

    Here's the code (LinkedListItem hasn't changed, so it's been left out):

    public class LinkedList<T>
    {
       private LinkedListItem _first;
       private LinkedListItem _last;
    
       public LinkedList(){}
       public LinkedList(T firstItem)
       {
        _first = new LinkedListItem(firstItem);
        _last = _first;
       }
       public void Add(T itemToAdd)
       {
          if (_first == null)
          {
             _first = new LinkedListItem(itemToAdd);
             _last = _first;
             return;
          }         
          _last.Next = new LinkedListItem(itemToAdd);
          _last = _last.Next;
       }
       private class LinkedListItem{...}
    }
    

    Why LinkedLists?

    We'll take a break from making improvements to our code to discuss what the appeal of LinkedLists are. Obviously, a main advantage is that they aren't fixed in size, like arrays. Another advantage is that thanks to our last code change, insertion is a constant and speedy operation. Yet another advantage is that an item can be added to the middle of the linked list with minimal effort - we don't need to shift our entire list, simply reroute our next pointer. Finally, since LinkedLists don't require contiguous memory, they can be used with highly fragmented memory space.

    Insertion into an ArrayList is also fast, but not constant. Once the ArrayList fills, it must grow. Additionally, when it does grow, a contiguous group of memory must be found to meet the new size. Of course, LinkedList do have one major drawback: random access is brutally slow. We have to walk through the list to find the desired item.

    LinkedList v4

    Deleting from a LinkedList is a little tricky (at least the way we've set it up). We don't just remove the item, we also have to reroute our list. The item that pointed to our removed item must now point to the removed item's next. If 1 --> 2 --> 3 and we remove 2, then 1-->3. This is pretty tricky to cleanly implement as-is. One way we can improve the maintainability of our LinkedList is to change each item so that it not only tracks the Next item in the list, but also the Previous one (this is called a doubly-linked list).

    public class LinkedList<T>
    {
       private LinkedListItem _first;
       private LinkedListItem _last;
    
       public LinkedList(){}
       public LinkedList(T firstItem)
       {
          _first = new LinkedListItem(firstItem);
          _last = _first;
       }
       public void Add(T itemToAdd)
       {
          if (_first == null)
          {
             _first = new LinkedListItem(itemToAdd);
             _last = _first;
             return;
          }         
          _last.Next = new LinkedListItem(itemToAdd);
          _last.Next.Previous = _last;
          _last = _last.Next;
       }
    
       public void Remove(T itemToRemove)
       {
          LinkedListItem found = Find(itemToRemove);
          if (found == null)
          {
             return;
          }
          //this is the _first element
          if (found.Previous == null)
          {
             _first = found.Next;
          }
          else
          {
             //reroute the list
             found.Previous.Next = found.Next;
          }
          //this is the last element
          if (found.Next == null)
          {
             _last = found.Previous;
          }                 
       }
       private LinkedListItem Find(T itemToFind)
       {
          if (_first == null)
          {
             return null;
          }
          for (LinkedListItem current = _first; current != null; current = current.Next)
          {
             if (current.Item.Equals(itemToFind))
             {
                return current;
             }
          }
          return null;
       }
    
       private class LinkedListItem
       {
          public readonly T Item;
          public LinkedListItem Next;
          public LinkedListItem Previous;
    
          public LinkedListItem(T item)
          {
             Item = item;
          }
       }
    }

    Fin

    Today, it's safe to consider LinkedLists as highly specialized data structures. However, back when memory was a real constraint, they were a core part of any program. Even the choice of creating a doubly-linked list could be difficult as every item within the list consumed an extra reference (8bits on an 8bit platform, 16bits on a 16bit platform and so on). Even if LinkedList aren't as useful today as they once were, they're still great for illustrating how our code interacts with system memory. They also show us how arrays and ArrayLists really work, by providing an interesting contrast. Thanks Mr. Woollard for teaching me all about data structures!

  • Beautiful Code

    I'd like to think that I'm a pretty passionate programmer, but I don't think I'd describe code as being beautiful. Maybe I've seen one or two nice editor themes that really stand out, but actual code is just a bunch of ASCII characters as far as I'm concerned. At best I'll find myself impressed with a programs organizational structure.

    Every now and again though, I'll see a visualization of code that's either really cool or downright breathtaking.

    Game of Life

    Probably the best known one that I can think of is John Conway's Game of Life. Back in 1970, Conway devised a "game" with 4 simple rules which dictate whether a given cell is on or off based on the on/off state of neighbouring cells. When properly initialized, interesting patterns emerge. You can check out a Java implementation here and a number of videos about all types of implementations on youtube. It's pretty astounding how four simple rules (which are equally simple to implement) can produce something so rich.

    Sorting

    One of my personal favourites though is the visualization of sorting algorithms. I'm pretty far from an algorithm freak, but there's something just fascinating about watching these go. It isn't so much how they individually look, but rather the diversity of how they operate - which translates in a diverse visual representation. Take a look at this Java applet and hit the "Start All" button in the upper left corner. I find both Bitonic and Heap sorts mesmerizing. And, if you're interested in a more informative version, check this not-so-pretty Java applet, which steps through the Java code for each sorting phase.

    Math

    It'd be silly not to mention more mathematical-based patterns. Fractals can get pretty crazy, but even the simple Sierpinski Triangle can be hypnotic (maybe it's because I like moving things). There's also Voronoi Diagram and Delaunay Triangulation, which I know nothing about, except that they are much more fun with the "More Colorful" checkbox checked.

    Tierra

    Although not visually represented, my all time favourite has to be Thomas Ray's Tierra computer simulation. Thomas Ray essentially created evolution within his computer by creating a self replicating code which randomly mutated itself. Out of this seeming chaos emerged parasites, liars and even a more efficient algorithm than the one he himself programmed. The best description, which is absolutely worth the short read, are the first two pages of chapter 15 in Kevin Kelly's most excellent Out Of Control book (and they happen to be freely available - page 1 and page 2).

    Terrarium

    If you agree with me and find Tierra amazing, you'll probably be interested in Microsoft's Terrarium. This is considerably more complex than everything else we've looked at so far, but it's also the best suited at helping you learn. Terrarium was a learning tool built by Microsoft meant to showcase .NET and help new developers learn the framework and languages. You essentially create an insect by inheriting a base object, and override behaviour - such as what to do when you run into another insect or need to find food. You'd then release your code on a multiplayer server and could see how your insect did compared to everyone else's. I did play with it for a few months years ago, and while the idea was certainly really neat and forward thinking, the implementation was a little weak. Anyways, if you're interested, it looks like the project might be getting resurrected.

    Assembly

    The last thing I'd like to share are the winners of the annual Assembly contest. Although there are a lot of different categories, the ones that always draw my attention are the file-size limited ones. For example, take last years winner in the 4K winner Candystall by Pittsburgh Stallers vs Loonies. You can watch the video here and download the .exe here. Would you be able to do that in a file under 4k in size? I wouldn't! If you're willing to add a bit more file size, you can also have the world's smallest 3D game engine: kkrieger. At 96k it truly is hard to believe.

    Fin

    There are countless more, and if you know of anything really neat, please leave a comment. As a final entry, check out Radiohead's House Of Cards video - filmed without a camera.

  • Looking for .NET projects to contribute to

    I've been getting a fair bit of mesages from developers, who are enthusiastic about DDD and the common ALT.NET toolset, looking for a good project to contribute to - as a means of doing some hands on learning and helping out the community at the same time. I know one place to start might be Jeff Atwood's open source list, but I was hoping someone might have more specific ideas.

    I think one of the challenges is that many open source projects tend to be tools for developers, and as such they are either pretty complicated, or don't relate well to typical enterprise development. The other problem is that many open source projects are well established and have a large codebase, which makes them significantly harder to get involved with.

    Anyways, if anyone knows of a good open source project where a junior developer would be able to understand and contribue, please let us know. (Self-promotions welcomed)

     

  • Password : You're doing it wrong

    I'd like to think that I deal with passwords the way most developers do. When dealing with registration or something else that requires the user to provide a password, I follow some general guidelines:

    1. The password must be a minimum length, normally no less than 6 characters
    2. The password's maximum length is very high (200 characters)
    3. I'll typically check for at least a mix of letters and numbers. For applications with considerably more sensitive data, I'll have more requirements – such as mixed casing or special characters.
    4. Hash the password with a salt (the salt can be a fixed string, or something more unique to the user - again based on sensitivity). Salting means that if someone gets access to a dump of your Users table, they'll still have a hard time logging into the system with a dictionary attack.
    5. Since hashes can't easily be reversed, send out a new password to users when they forgot password and have them change the temporary password as soon as they log in.

    All this is pretty vanilla and you can change an SHA1 hash implementation for some encryption or whatever else tickles your fancy. What I've noticed though isn't that developers are hitting some type of technical hurdle when dealing with password, but rather, a usability one. The point behind bullets 1, 2 and 3 is that users ought to be able to enter anything they want as a password, provided it meets a minimum set of guidelines. As developers, we should try very hard never to impose restrictions which limit the effectiveness of a password. Lately though, I've been astonished at some of the limits sites impose on passwords – forcing me to come up with a less secure password than what I would have liked.

    Here are some popular sites which have such restrictions:

    • digg only accepts letters and numbers
    • SourceForge only accepts letters and numbers (when you change your password, SourceForge even goes through the trouble of showing you a little dynamic update as you type (weak > normal > strong and then "invalid character" when you enter an exclamation mark)
    • Passport limits password to 16 characters (isn't Microsoft a champion of passphrases?)
    • MySpace only accepts passwords up to 10 characters long, but at least require 1 number or punctuation character
    • Wikipedia let me register with a password of ‘a', but at least has very informative help on choosing a strong password. (Additionally, given what an anonymous user can do on Wikipedia, I'm not too disappointed in this policy)

    And for sites with better policies:

    • Ebay, PayPal and Google have useful help and accept special characters (PayPal and Google even require at least 8 characters),
    • Twitter and Facebook don't have any “choosing a strong password" help, but still seem to accept everything

    The most shameful site I've ever come across though is completely unacceptable – not only because of the ridiculous limitations it puts on passwords, but also the type of data it's responsible for. The Bank of Montreal's Mosaik Credit Card (BMO is major Canadian bank), has a password limit of 8 characters and only accepts letters and numbers (there's actually a maxlength="8" attribute on the form).

    Here's a simple rule to follow. If Windows Calculator displays the total number of possible combinations in non-exponential form, you password guidelines suck.

  • My Thoughts on ASP.NET's MVC

    Like me, you might have been surprised that the foundation series didn't have a chapter on the MVC pattern. I'm no fan of the existing page model (I actually think it's horrible), and I've successfully used MonoRail on a few projects, so it would have made for a good topic. My reasons for not including something on MVC were simple: we were and continue to be flooded with MVC information (as though it's a brand new invention), and I didn't think I could explain MVC using MonoRail effectively (I find it has a steep learning curve). I considered using RoR, but figured that would confuse people even more.

    Hopefully though, if you're a fan of the foundation series, you've already downloaded the learning application which puts the theory to practice using ASP.NET's MVC framework. So, what do I think about ASP.NET MVC? Overall I've been very impressed. I can't think of a good reason for starting a new project using the WebForms model - or MonoRail for that matter (sorry). If you're an ASP.NET developer, it's really a no brainer.

    I do have two major issues with it though. First, if you come from almost any other MVC framework (MonoRail, Django, RoR, Akelos, etc...) you might be expecting an actually Model framework - instead you get an empty Model folder. In other words, the MVC framework doesn't add anything to the .NET O/R Mapping / DAL story. From Microsoft's point of view this makes sense, since they feel that they are already offering solid solutions - DataSets, SqlDataSources, LINQ to SQL, Entity Framework. Truth be told, this is fine with me, as it lets me use NHibernate. I just think, given what the other MVC frameworks offer, it's a little dishonest - you'll end up disappointed if you're expecting to be able to do this out of the box:

    public class Car : ActiveRecord
    {
    }
    ....
    Car.FindById(1);

    My real problem though is simply that neither C# nor VB.NET lend themselves all that well to view logic. Jeff Atwood actually just blogged the same criticism. Jeff uses RoR to highlight the problem. I don't fully agree. I won't say that RHTML is great, but I will say that it's far better than C# or VB.NET. I think views need a specialize language - I'm sure that anyone who's done some significant work in either RoR or Django would agree. There are solutions available now - NVelocity and Boo (I assume you could use it with the MVC framework?), but I'm just going to trudge along with C# until IronRuby is a viable solution.

    Aside from that, everything is pretty solid - routes work great, helper methods are adequate (they're starting to add more and more), and testing is actually doable - I haven't run into any problems, but from what I've read things aren't 100% perfect yet (either way, it's a huge step up from WebForms).

    So, to recap. MVC good. WebForms Bad. C# in views less than ideal. Empty Model folder = M. Oh, and download the learning application!.

    Posted Jul 22 2008, 09:49 PM by karl with 8 comment(s)
    Filed under:
  • Foundations of Programming - Learning Application

    If you're anything like me, you probably learn a lot better by going through code rather than reading books. I'm happy to release the Foundations of Programming Learning Application - it's a complete solution meant to show what was covered in the Foundations series. It's a Visual Studio 2008 solution.

    You can download it here. It should require no configuration (my fingers are crossed on that one) and ought to just run out of the box. There are comments sprinkled all over to help explain things or provide some insight. No doubt there'll be typos, since I'm nothing without word.

    (you can grab the free ebook from: http://codebetter.com/blogs/karlseguin/archive/2008/06/24/foundations-of-programming-ebook.aspx)

    What is it?
    It's a sample awards website - with categories and nominees. The root container is called a Round - a sample Round would be called 'The 2008 CodeBetter Awards'. A Round has a state (planning, annoucements, voting, winners) and a number of Categories (Best Blogger, Best Blog Post, Best Open Source Project, ...) with each categories having a Nominee (Title, Summary, Link, Author...). The website is using the ASP.NET MVC Preview 4 - I don't think you'll need to install anything extra as all the DLLs are included with the project. I'm using an SQL Lite database with a relative path to the file, so all should work as-is. Dummy data is already loaded.

    The web application mostly shows a read-only view of the data. There's also a sample console application that does more administrative stuff (it isn't interactive, it just runs through 4 steps or so). You can run the administrative portion over and over again - the first step is to clean itself up. The admin part basically adds a new round, with categories and nominees.

    Of course, there's a project full of unit tests as well.

    I tried to keep everything simple and straightforward (which is largely why I didn't want to build a whole web-based admin module and user registration and all that). Like most, I'm pretty new to ASP.NET MVC. Some might think my views have too much code, I think they have the perfect amount Stick out tongue. There's extensive use of Lambdas, so if you have a hard time reading them, I hope my excessive examples will help illuminate them.

    Posted Jul 18 2008, 09:34 AM by karl with 21 comment(s)
    Filed under:
  • Announcing the .NET Extension Library

    285 days ago I blogged about my dislike for extension methods. Extension methods aren't very discoverable, and they can lead to poor communication (between members of a team, from project to project, and between a developer and the code he or she is trying to read).

    I still think extension methods have serious shortcoming, but I've soften my view on them quite a bit. I don't know what changed, maybe it was my extended foray into Ruby or spending the last few weeks neck-deep in the ASP.NET MVC framework, but I decided to attempt to put out a standard extension library for .NET developers. One of my gripes with extension methods is that they can make projects rather inconsistent, a new developer might be surprised that string.Left suddenly doesn't work on a new project. Ideally, having a well-known extension library might help solve that problem.

    The .NET Extension Library isn't just about extension methods though. It's about providing core functionality to projects. For example, I've blogged a few times about how hard it is to unit test code that caches - largely because there isn't an cache interface. This is a problem no more. The .NET Extension Library has an ICache interface, a CacheFactory (which really just returns HttpRuntime.Cache - for now), as well as enhancements to the cache API. Here's one of my favourite examples of what you can do:

    User user = CacheFactory.GetInstance.Fetch(
         "user.{0}".Sub(userId), 
         () => _dataStore.GetUser(userId),   
         3.DaysFromNow());
    

    Fetch is a mix between Get and Insert. My second favourite addition are the extensions to IEnumerable, which bring all collections/arrays inline with the List when it comes to methods like Each, TrueForAll, Find, FindAll and more. You no longer need to use the static Array methods.

    Input is always welcomed, as are suggestions and/or additions. As much as I'm up for adding useful extension methods, I rather focus on classes that show up in every project kinda thing, such as an abstraction for logging and such.

    Anyways, check it out at codeplex: http://www.codeplex.com/nxl

  • Scale Cheaply - Memcached

    I generally subscribe to the attitude that premature optimizations are evil, but I strongly believe that a robust caching strategy should evolve alongside the rest of the system. Waiting too long makes it hard to cleanly and thoughtfully add caching. Besides, in my experience, a considered caching strategy generally means I worry less about performance in other areas - especially data access and data modelling. In other words, I can build those complex parts for maintainability, as opposed to having to worry about the cost of each individual query.

    .NET developers are pretty cache-savvy - thanks largely in part to the powerful System.Web.Caching namespace and ASP.NET's simple to use OutputCaching capabilities. For that reason, and the fact that it tends to be very application specific, I don't want to go over how to decide what to cache, how to deal with synch issues, updates and so on. Instead, I specifically want to talk about Memcached.

    You're probably already familiar with Memcached - it's a highly efficient distributed caching system. It's used generously by all the big web 2.0 players (In may 2007 it was revealed that Facebook relies on 200 16GB quad-core dedicated Memcached servers). Interest in Memcached from the .NET community has been relatively low (although over the last year more and more people are talking about it). Frankly, if you're doing anything that requires horizontal scaling you're seriously shooting yourself in the foot by overlooking it. It runs on windows - although we run it on Linux and there's really no reason for you not to learn that too!

    Fundamentally, there are two problems with the built-in cache. First, it's limited to the memory of a single system which happens to be shared with the rest of your application domain. Secondly, if you have two servers, each with their own in-memory cache, users are likely to see very weird synching issues. Memcached isn't as fast as in-memory caching, but will scale to virtually unlimited amount of memory. There isn't any redundancy of failover, simply memory spread across multiple servers.

    The best part is that it literally takes seconds to get it up and running. First, download a windows build onto your development machine here. (look for the win32 binary of memcached). Unzip the package somewhere, I put mine in c:\program files\memcached\. Next, from the command line, run memcached -d install. This will install memcached as a service. You can run memcached -h for more command lines options. You'll need to start the service (I also changed my startup type to manual, but that's completely up to you).

    The next step is to install the client library. I use suggest Enyim Memcached from CodePlex. The project comes with a sample configuration file, which you should be able to easily incorporate into your web.config or app.config. While developing, only put one server 127.0.0.1 on port 11211 (which is the default). You also need to add a reference to the two dlls.

    Aside from that, you basically program against a simple API. You create an instance of MemcachedClient (it's thread-safe so you can use a singleton, or re-create it since it's inexpensive to create), and call Store, Get or Remove (or a few other useful methods) like you would the normal cache object. As I've blogged about before (here and here), I'm a fan of hiding all of this behind an interface to ease mocking and swapping.

    Here's an example:

    MemcachedClient client = new MemcachedClient();
    client.Store(StoreMode.Set, "Startup", DateTime.Now, DateTime.Now.AddMinutes(20));
    DateTime startup = client.Get<DateTime>("Startup");
    client.Remove("Startup");
  • Get Your Func On

    I've noticed that I have a 2 step pattern for learning new framework or language features. I'm guessing this is pretty typical for most people. First, I'll use the feature within framework classes or 3rd party ddls. Then I'll leverage it more directly within my own code. What's surprising to me is the length of time which occurs between step 1 and step 2.

    Take generics for example. Back in the 1.x days, I wrote a ton of repetitive classes that inherited from CollectionBase. So when 2.0 came out, I immediately and aggressively started to use generic collections. However, it was quite some time later (a year?) until I wrote my own class that leveraged them directly. Today, I don't write a new generic class every day, but I do consider them an important part of my toolbox and kinda wonder what took me so long to take them up.

    I have a feeling that many developers are in the same boat - it's easy to consume code that implements new features, but not so easy to grasp how to implement those same features ourselves.

    As it turns out, the other day, I had another such ah-hah moment with the System.Func generic delegate. Like me, you've probably consumed it often, or at least one of its cousins: System.Action and System.Predicate. I thought I'd show how I used it, in hopes that it might open up some possibilities for you.

    Ovewview

    First though, a brief overview. The three delegates above are essentially shortcuts that save you from having to write your own common delegate. The most common one is probably Predicate<T>, which returns a boolean. Predicte<T> is used extensively by the List<T> and Array classes. The most obvious is the Exist method:

    List<string> roles = user.Roles;
    if (roles.Exists(delegate(string r) { return r == "admin";}))
    {
       //do something
    }
    

    or the lambda version (which I much prefer)

    if (role.Exists(r => r == "admin))
    {
    }

    Func<T> is a lot like Predicate, but instead of returning a boolean it returns T. Also, Func<T> has multiple overloads that let you pass 0 to 4 input parameters into the delegate. Action<T> is like Func<T> except it doesn't return anything - it does an action.

    Code Decoupling

    So, how can you make use of these within your own code? Well, here's what I did. First, I'm a big proponent of caching, as well as a big fan of unit testing. However, the two don't easily go hand-in-hand because Microsoft doesn't provide an interface to their built-in cache, which leads to tight coupling (which of course makes it difficult to change caching implementation down the road, and impossible to unit test). The first thing to do is create your own interface, a simple start might look like:

    public interface ICacheManager
    {
       T Get<T>(string key);
       void Insert(string key, object value);
    }

    Next comes our first implementation:

    public class InMemoryCacheManager : ICacheManager
    {
        public T Get<T>(string key)
        {
            return (T) HttpRuntime.Cache.Get(key);
        }
        public void Insert(string key, object value)
        {
             HttpRuntime.Cache.Insert(key, value);
        }
    }

    Func Fights Repitition

    So, what does all this have to do with System.Func? Well, the above code is used in a very repetitive manner: get the value from the cache, if it's null, load it from somewhere and put it back in the cache. For example:

    public User GetUserFromId(int userId)
    {
        ICacheManager cache = CacheFactory.GetInstance;
        string cacheKey = string.Format("User.by_id.{0}", userId);
        User user = cache.Get(cacheKey);
        if (user == null)
        {
           user = _dataStore.GetUserFromId(userId);
           cache.Insert(cacheKey, user);
        }
       return user;
    }

    After a year or so of writing code like this, I figured there must be a better way, which of course is where Func comes in. Ideally, we'd like to get the value, and provide our callback code all at once. So, let's change our interface:

    public interface ICacheManager
    {
       T Get<T>(string key, Func<T> callback);
       void Insert(string key, object value);
    }

    The second parameter is the delegate we'll want to execute if Get returns null. Of course our delegate will return the same type (T) as Get would - just like in the above case where we expect a User from both Get and our data store. Here's the actual implementation:

    public T Get<T> (string key, Func<T> callback)
    {
       T item = (T) HttpRuntime.Cache.Get(key);
       if (item == null)
       {
           item = callback();
           Insert(key, item);
       }
       return item;
    }

    How do we use the above code?

    public User GetUserFromId(int userId)
    {
       return CacheFactory.GetInstance.Get(string.Format("User.by_id.{0}", userId), 
                                                                    () => _dataStore.GetUserFromId(userId));
    }

    I know the () => syntax might be intimidating (especially if you aren't familiar with lambdas), but all it is is a parameterless delegate.

    Of course, this system can easily be expanded to add additional caching instructions (absolute/sliding expiries, dependencies and so on) via overloaded Get<T> and Insert members.

    (I just noticed this example also highlights how to use generics within your own code too!)

  • Scale Cheaply - Sharding

    There are a lot of expensive ways to scale your database – all of which are highly touted by the big three database vendors because, well, they want to sell you all types of really expensive stuff. Despite what an “engagement consultant” might tell you though, most of the high-traffic websites on the web (google, digg, facebook) rely on far cheaper and better strategies: the core of which is called sharding.

    What’s really astounding is that sharding is database agnostic – yet only the MySQL crowd seem to really be leveraging it. The sales staff at Microsoft, IBM and Oracle are doing a good job selling us expensive solutions.

    Sharding is the separation of your data across multiple servers. How you separate your data is up to you, but generally it’s done on some fundamental identifier. For example, if we were building a hosted bug tracking site, our data model would likely look something like:

    Every Client is pretty much isolated from all other Clients. So if we put all of Client 1’s data on Server 1 and Client 2’s data on Server 2, our system will run just fine. This scales out horizontally infinitely well (there’s little to no overhead). Our first 500 clients can all go on our first server, at which point we can introduce a second database server and place our next 500 clients. Servers need only be added when actually needed, and there’s no need for management servers, load balancers or anything else – just straight database connections.

    One of the disadvantages of sharding is that it does impact your code. You need to figure out which database to connect to. For our simple scenario above, this isn’t too difficult:

     
    using (SqlConnection connection = GetConnection(clientId))
    {
     ...
    }
    private static SqlConnection GetConnection(int clientId)
    {
       string connectionString;
       if (clientId <= 500)
       {
          connectionString = _connectionStrings[0];
       }
       else
       {
          connectionString = _connectionStrings[1];
       }
       return new SqlConnection(connectionString);
    }
    
    This is a simplified example, but should be pretty easy to expand on. Another approach is to use a modulus to figure out which connection string to use, something like:
     
    return new SqlConnection(connectingString[clientId % _connectingString.Length]);
    

    This brings up another problem with sharding (a big one) – repartitioning your data. If we pick the above modulus algorithm with 2 servers and 2 clients then:
         Client 2 will be associated to ConnectionString[0] (2 % 2 == 0)
         Client 1 will be associated to ConnectionString[1] (1 % 2 == 1)

    If we now add a bunch of clients along with a 3rd server, then our code expects to find Client 2 on a different server (2 % 3 == 2). Essentially what this means is that you’ll need a repartitioning strategy – whether that’s an advanced connection manager configuration approach, or bulk copy scripts. The good news is that all of this should be deep inside your data layer and completely hidden from your calling code. There are many ways to handle this, pick whatever seems simplest.

    The last hurdle to overcome is actually sharding your data. Our bug hosted example was pretty straightforward, but even it has limitations. When a client creates a new account they are asked to submit their subdomain of choice. We need to check whether that subdomain is available or not – which isn’t trivial since our data is spread all around. Similarly, when a user logs in, we don’t yet know which client they belong to, therefore we can’t figure out which database server to hit for authentication. In such cases, rather than sharding data on a key, you shard on purpose. Essentially, this means you have a database dedicated to your Users table, as well as a ClientHost table which does nothing more than provide a single place to look up whether a host is available or not. Again, this is something that your data access layer must be aware of.

    Despite these issues, sharding is my preferred database scaling choice by far. All the issues can be fixed with a bit of code deep within your data layer. The performance advantage AND cost advantage make it a no-brainer. The only reason to consider clustering is for high availability scenarios, or in cases where your bottleneck is data that cannot be easily split. Also, keep in mind that sharding typically plays nice with replication or clustering, so these aren't necessarily exclusive strategies.

  • Foundations of Programming Ebook

    I'm excited to finally release the official, and completely free, Foundations of Programming EBook. This essentially contains all 9 Foundation parts including a conclusion and some typical book fluff (table of content, acknowledgement and so on). A number of spelling errors were corrected, along with some small technical changes and clarifications - largely based on feedback, so thanks for everyone who provided it! Otherwise it's exactly the same as what's been posted here over the past several months.

    Download it from http://codebetter.com/files/folders/codebetter_downloads/entry179694.aspx

    Download the Learning Application from: http://codebetter.com/blogs/karlseguin/archive/2008/07/18/foundations-of-programming-learning-application.aspx

     Foundations Of Programming 

    If the above link fails, you can also get it from http://www.openmymind.net/FoundationsOfProgramming.pdf

    Posted Jun 24 2008, 09:53 PM by karl with 84 comment(s)
    Filed under:
More Posts Next page »

Our Sponsors

Free Tech Publications