ASP.NET Performance – Part 3 – Cache Busting

In part 2 of the series we looked at ways to tweak our headers to maximize performance. One of those tweaks was the addition of far-reaching expiry headers for static files (images, css, js). What happens though when you need to change those files? We can make a change to the file on the server, but clients that got the previous version won’t request the new file since the headers that came with the initial one told them not to.

You may be thinking that a simple solution would be to set a shorter expiration time. But that only masks the problem while negating the benefits we’re after. What we’re after is a solution that’ll let us maximize caching yet allow us to update content on demand.

HTTP caching works off the requested URL. The browser makes a request for a URL, and gets a response, which contains headers as well as the body. Any subsequent request to the same URL should follow any caching headers that were returned in the original request. To make matters better (or worse depending on your point of view), any proxy between you and the client can also follow this same logic. Thus, a user that visits your site might get original content, but subsequent visits by him or anyone else behind the same proxy requesting the same file, will get a cached version from the proxy. This is great because a far reaching expiry can really turn tends of thousands of hits into 1, but also means that we really do need to come up with a good cache-busting solution.

(As a side note, since the cache is based off of the requested URL, you should really consider leveraging things like google’s javascript hosting. Instead of linking to jquery.js in your /js folder, you can link to http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js. By doing so, if a user has previously browsed a site which linked to the google version, then it’ll be in their cache when they visit your site and get faster loading. Google sets the expiry header to 1 year. On the down site, it decreases your sites availability by introducing another system that can break – google).

You may be thinking that a good solution is to simply change the name of our file. For example, if you’re currently linking to logo.png and need to make some changes, you could switch it to logo2.png. This would work, but its a very manual and thus risky. What if you forget that a particular page is linking to your logo (like that HTML email template that gets sent out every now and again)?

Instead, what’s typically done, and what we’ll do here, is to include some type of version in the querystring. And, instead of incrementing integers (which can be tricky to manage, even if we automate it), we’ll use a hash of the files content. So, instead of linking to /images/logo.png, we’ll actually link to /images/logo.png?asdaslkj3918.

To achieve this, we’ll need to do two things. First we’ll have to generate a file hash for all of our assets and make that available to our system. Second we’ll need to read the list of hashes and use it whenever we create a link to an asset.

Exactly how you implement the first part is up to you. What I’ll show here is doing it as part of the build process – much like we saw in Part 1. The exact output of our first step is simply a text file that looks something like:

/Content/css/main.css|a5feab8f
/Content/images/logo.png|fa80c7bc
/Content/images/modal_close.png|a5722737
/Content/images/modal_close_o.png|0ddd517d
/Content/js/dd_roundies.js|955e9bc5

Our program will take three arguments, the root directory of our site, the asset folder, and the output file name. So, given the following parameters:

c:\projects\coolsite\ assets\ assethash.dat

our application will recurse through the c:\projects\coolsite\assets\ folder and generate a file in the above format in c:\projects\coolsite\assethash.dat

Here’s the code:

internal class Program
    {
        private static IDictionary<string, string> _hashes;
        private static void Main(string[] args)
        {
            var root = args[0];
            var assetPath = args[1];
            var outputFile = args[2];

            _hashes = new Dictionary<string, string>();
            ProcessDirectory(root, assetPath);
            WriteFile(root + outputFile);
        }

        private static void ProcessDirectory(string root, string assetContainer)
        {            
            foreach (var directory in Directory.GetDirectories(root + assetContainer))
            {
                if (directory.Contains(".svn"))  { continue; }
                ProcessDirectory(root, directory.Remove(0, root.Length));
            }
            foreach (var file in Directory.GetFiles(root + assetContainer))
            {
                var assetName = file.Remove(0, root.Length - 1).Replace('\\', '/');
                _hashes.Add(assetName, CreateSignature(file));
            }            
        }

        private static string CreateSignature(string file)
        {
            byte[] bytes;
            using (var hash = new Crc32())
            {
                bytes = hash.ComputeHash(Encoding.ASCII.GetBytes(File.ReadAllText(file)));
            }

            var data = new StringBuilder();
            Array.ForEach(bytes, b => data.Append(b.ToString("x2")));
            return data.ToString();
        }

        private static void WriteFile(string path)
        {
            using (var sw = new StreamWriter(path))
            {
                foreach (var kvp in _hashes)
                {
                    sw.WriteLine("{0}|{1}", kvp.Key, kvp.Value);
                }
            }            
        }
    }

First we build up a dictionary of paths (relative to the root of our asset folder) and hash values. For my “hash” I’m using a CRC32, specifically Damien Guard’s C# implementation. The nice thing about CRC32 is that we end up with pretty small values. The downside is that collisions are far more likely (but still rather unlikely – I haven’t really tested this tough, so maybe using a true-collision resistant hash, like md5 (which can collide but only in extremely rare occasions) makes more sense).

Next we dump the dictionary to our file – nothing much to explain there.

This is our postbuild line:

$(SolutionDir)..\Tools\ContentSigner\CodeBetter.Web.ContentSigner.exe $(ProjectDir) assets\ assethash.dat

Now we need to use the generated file. First we’ll start off with three vanilla helper functions:

public static class HtmlAssetExtensions
{
    public static string IncludeCss(this HtmlHelper html, string name)
    {            
        return string.Format("<link href=\"/assets/styles/{0}.css\" rel=\"stylesheet\" type=\"text/css\"></link>", name);
    }
    public static string IncludeJs(this HtmlHelper html, string name)
    {
        return string.Format("<script src=\"/assets/js/{0}.js\" type=\"text/javascript\"></script>", name);
    }
    public static string Image(this HtmlHelper html, string name, int width, int height, string alt)
    {    
        return string.Format("<img src=\"/assets/mages/{0}\" width=\"{1}\" height=\"{2}\" alt=\"{3}\" />", name, width, height, alt);
    }
}

First we’ll modify our helper to load the text file back into a dictionary (this could be handled in a separate class, but for simplicity, we’ll just stuff all the logic in here):

public static class HtmlAssetExtensions
{
    private static IDictionary<string, string> _assetHashes = LoadHashes();

    private static IDictionary<string, string> LoadHashes()
    {
        var hashes = new Dictionary<string, string>(StringComparer.InvariantCultureIgnoreCase);
        using (var sr = new StreamReader(HttpRuntime.AppDomainAppPath + "assethash.dat"))
        {
            while (sr.Peek() >= 0)
            {
                var parts = sr.ReadLine().Split('|');
                hashes.Add(parts[0], parts[1]);
            }
        }
        return hashes;
    }

    ....
}

Next we need to use the data:

public static string IncludeCss(this HtmlHelper html, string name)
{
    name = GetVersionedName(string.Format("/assets/styles/{0}.css", name));
    return string.Format("<link href=\"{0}\" rel=\"stylesheet\" type=\"text/css\"></link>", name);
}
//...similar changes the for js and image method.

private static string GetVersionedName(string name)
{
    string hash;
    if (!_assetHashes.TryGetValue(name, out hash))
    {
        return name;
    }
    return string.Concat(name, '?', hash);
}

And that, my dear reader, is all it takes. There is one problem with our implementation, changing our assethash.dat isn’t going to noticed by our application. That’s something I leave to you to figure out – though it isn’t too hard to implement (hint: you could just store the dictionary in the HttpRuntime.Cache with a file dependency).

In the next [very short] part, we’ll make a single small tweak to our helper functions that’ll help address three other YSlow recommendations. Stay tuned.

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

6 Responses to ASP.NET Performance – Part 3 – Cache Busting

  1. karl says:

    @Dan:
    Great catch. I’ve said it before, I blog more to learn than to teach. I’ll add a 6th part that will address that as part of the build process. Have it working here.

  2. Dan Sargeant says:

    Its a nice idea to use the hash of a file to determine when it was last updated, however the use of the html helper extensions only covers some of the references to the assets. In most websites, most of the images are referenced within the css and the helper function are of no use there.

  3. Rob says:

    @Herman

    Totally agree. In fact it’s not part of the true HTTP caching spec. Browsers like Firefox and IE (are bad) actually ignore this and still cache the file even with a querystring in it.

    However stricter clients (Opera?) ignore the querystring and hence don’t cache the file.

    It’s always better to be a bit harder, place it in the actual filename and use a rewrite rule.

    Ours all gets generated at build time. I wrote a piece of ruby that builds a hashtable of images containing the image path, and revision number. It also keeps track of the svn revision it last checked up to.

    Then it calls svn, gets all the changes from when it last checked. For each image change, it gets the svn revision and adds it to the hashtable.

    I then use the hashtable to go through all of our css files, and using a regex find the images, and modify their path, adding in the svn revision.

    We then use a rewrite rule to do the rest. Doesn’t take long once the initial run has been done. And it gives us per image caching.

  4. av says:

    great stuff, thanks!

  5. Herman says:

    Karl,

    I am afraid simply appending query string to a static resource is not going to solve the problem. There are proxies out there that will not cache query string. See it from the author of YSlow in this post: http://www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring/

  6. Horses says:

    In the past, rather than using a hash of the file I’ve used File.GetLastWriteTime(…).Ticks. It might not be quite as accurate as your method in tracking changes to files, but it’s far simpler.