ASP.NET disk-based caching

I was recently playing with the idea of implementing disk-base caching for pages that had a high read-to-write ratio. On top of being practically read-only, these pages also require quite of bit of database work to put together, so they are ideal candidates for caching. I dismissed the idea of using the built-in ASP.NET caching because of two reasons. First, the number of pages needing to be cached is in the thousands, making it impractical from a memory stand point. Secondly, since this is hosted in a web farm, each server would need to build-up its own cache, defeating the point of caching a little as well as introducing the possibility of different servers having out of sync cache.

The final solution may be to use memcached, but I really think that doing disk-based caching for these pages to a SAN (via iSCSI) is going to be more than good enough and simple to manage.

The code hasn’t been deployed, nor has it been tested beyond the basic point of making sure that it runs as expected from VS.NET under the single load of a browser (and thus a single thread).

To make this work, we’ll need three classes: an http module to hook into the page lifecycle, a custom filter to capture the output stream, and a configuration class to help us set the system up.

We’ll take a quick first stab at our custom HttpModule:

public class DiskCacheModule : IHttpModule
{
    public void Init(HttpApplication application)
    {
        application.BeginRequest += ApplicationBeginRequest;                        
    }

    private void ApplicationBeginRequest(object sender, EventArgs e)
    {
        var context = HttpContext.Current;
        var request = context.Request;
        var url = request.RawUrl;
            
        var response = context.Response;
        var path = GetPath(url);
        var file = new FileInfo(path);                       
        if (DateTime.Now.Subtract(file.LastWriteTime).TotalMinutes < 5)
        {                
            response.TransmitFile(path);
            response.End();
            return;
        }
        try
        {
            var stream = file.OpenWrite();
            response.Filter = new CaptureFilter(response.Filter, stream);            
        }
        catch (Exception)
        {
            //todo log this error
        }
        
    }

    public void Dispose()
    {
        
    }
    
    private static string GetPath(string url)
    {
        var hash = Hash(url);
        //todo change the hardcoded value
        return string.Concat("c:\\temp\\", hash);
    }
    private static string Hash(string url)
    {
        var md5 = new System.Security.Cryptography.MD5CryptoServiceProvider();
        var bs = md5.ComputeHash(Encoding.ASCII.GetBytes(url));
        var s = new StringBuilder();
        foreach (var b in bs)
        {
            s.Append(b.ToString("x2").ToLower());
        }
        return s.ToString();    
    }
}

As you can see, we cache based on the RawUrl property of the HttpRequest object. This is just a simple example, but you could cache on virtually anything – including QueryString or Form parameters, Cookies or Server Variables. The first thing we figure out is where the file should be stored (the GetPath and Hash methods). We take our RawUrl and MD5 it. We append that to our [ugly] hard-coded folder. Next we create a FileInfo based on the path and check whether the file was written less than 5 minutes ago. If it was, we use the highly efficient TransmitFile to send the file down to the browser. If the file wasn’t there, or was stale, we let the normal ASP.NET process take place – with the only difference being that we inject our CaptureFilter into the HttpResponse.

HttpResponse Filters are nothing more than classes that inherit from System.IO.Stream, and the only important method is Write and Close (although we have to implement a bunch of other useless junk):

public class CaptureFilter : Stream
{
    private readonly Stream _responseStream;                        
    private readonly FileStream _cacheStream;

    public override bool CanRead
    {
        get { return false; }
    }
    public override bool CanSeek
    {
        get { return false; }
    }
    public override bool CanWrite
    {
        get { return _responseStream.CanWrite; }
    }
    public override long Length
    {
        get { throw new NotSupportedException(); }
    }
    public override long Position
    {
        get { throw new NotSupportedException(); }
        set { throw new NotSupportedException(); }
    }

    public CaptureFilter(Stream responseStream, FileStream stream)
    {
        _responseStream = responseStream;
        _cacheStream = stream;         
    }
    
    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotSupportedException();
    }
    public override void SetLength(long length)
    {
        throw new NotSupportedException();
    }
    public override int Read(byte[] buffer, int offset, int count)
    {
        throw new NotSupportedException();
    }
    public override void Flush()
    {
        _responseStream.Flush();
        _cacheStream.Flush();
    }        
    public override void Write(byte[] buffer, int offset, int count)
    {
        _cacheStream.Write(buffer, offset, count);
        _responseStream.Write(buffer, offset, count);
    }
    public override void Close()
    {
        _responseStream.Close();
        _cacheStream.Close();            
    }
    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            _responseStream.Dispose();
            _cacheStream.Dispose();
        }
    }
}

There really isn’t much to it, our constructor expects the main Response’s Stream as well as a FileStream to write to (which was created in the HttpModule by calling OpenWrite on our FileInfo). Whenever we are given a buffer, we write it to both streams.

In its simplest form, this is all we need, but let’s add a configuration class to make it a little more practical. Our configuration class will define regular expression patterns which we’ll match the RawUrl

against. If it’s a match, we’ll cache the file. If it isn’t, we’ll let ASP.NET do its thing.

public class DiskCacheConfiguration
{
    private readonly IList<Regex> _patterns;        
    private static readonly DiskCacheConfiguration _instance = new DiskCacheConfiguration();          
    
    private DiskCacheConfiguration()
    {
        _patterns = new List<Regex>();  
    }           
    public static void Initialize(Action<DiskCacheConfiguration> action)
    {
        action(_instance);
    }

    public DiskCacheConfiguration AddPattern(string pattern)
    {
        _patterns.Add(new Regex(pattern, RegexOptions.Compiled));
        return this;
    }
    public static bool IsMatch(string url)
    {
        foreach(var pattern in _instance._patterns)
        {
            if (pattern.IsMatch(url))
            {
                return true;
            }
        }
        return false;
    }
}

The two public methods, AddPattern and IsMatch are called from our modified HttpModule:

public void Init(HttpApplication application)
{
    application.BeginRequest += ApplicationBeginRequest;            
    DiskCacheConfiguration.Initialize(c => c.AddPattern("^/Home/About$")
                                            .AddPattern("^/default.aspx"));
}

private void ApplicationBeginRequest(object sender, EventArgs e)
{
    var context = HttpContext.Current;
    var request = context.Request;
    var url = request.RawUrl;
    
    if (!DiskCacheConfiguration.IsMatch(url))
    {
        return;
    }
            
    ....
}

Again, you could do a lot more, such as adding ignore rules, of more detailed patterns or rules based on various headers. One of the things I’m worried about is how this will work under heavy-load. TransmitFile appears to use a FileShare.Read, and OpenWrite uses FileShare.None – which seems perfect for our cause, unless the heavy reads cause the writing thread to block (which won’t happen if file access is queued, but I don’t know if they are).

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

12 Responses to ASP.NET disk-based caching

  1. Sampy says:

    You shouldn’t hook BeginRequest for this but rather ResolveRequestCache. That event occurs after AuthenticateRequest and AuthorizeRequest so if your Auth system is using those pipeline events to verify your users, you won’t be exposing a security hole.

    Velocity or memcache might also be a better solution for this as opposed to the filesystem contention issues you’d get with writing to a shared drive. If you’re getting so much traffic that you have to start caching the output of your pages like this, filesystem contention is going to be a real issue very quickly. The old Channel 9 wiki (a very, very early version of FlexWiki) used to read and write to disk for edits and this often caused the site to grind to a halt and force a reset.

  2. Victor Kornov says:

    @karl:
    http://lmgtfy.com/?q=asp.net+disk+output+caching

    Although I.m perfectly fine wiht your code 😉

  3. karl says:

    @Konstantin:
    Yup, I mentioned that at the end of the post. Definitely the biggest issue with the code. Locking will work in a single-server environment, not sure what the simplest solution is in a farm though.

  4. Konstantin says:

    What about concurrency? What if several threads would arrive at stale page simultaneously and try to write to the same file? May be some locking is needed here?

  5. karl says:

    @Victor:
    I know they are introducing such a model in ASP.NET 4.0, should be interesting.

    @Mark
    Caching this in ASP.NET’s process’ memory just doesn’t fly. The minute you introduce a 2nd webserver (which I mentioned), you run the very real risk of getting sync issues – one webserver will cache a different version than the other.

    The only possible in-memory solution is a distributed caching engine (i.e., memcached).

  6. Mark says:

    +1 for asp.net output caching.. buy more memory with the days you save from not developing this..

  7. Victor Kornov says:

    I think asp.net output caching with filesystem storage provider could work here too.

  8. karl says:

    gunnar:
    Not sure, but your comment does rise a good point. When there IS a cached file, before we TransmitFile, we should also set the cache headers so that the file is appropriately (and aggressively) cached by the client.

    nariman:
    I thought about that…definitely a bit more complicated. You are right about the headers…my initial version didn’t have pattern matching and .css files were being cached (because in VS.NET’s webserver everything is handled by the ASP.NET pipeline). These files were UTF8 encoded, which was actually causing some problems since the encoding type wasn’t being sent out with TransmitFile

  9. Nariman says:

    Nice post.

    You’re wise to suspect that reading and writing to the same file will likely produce some locking issues – please post back the details when you encounter these! In one site we worked on, APSXs were written out to disk preemptively (as opposed to the on-demand approach above). The difference with doing it preemptively is you avoid the ‘gating’ problem where all concurrent requests go through the full execution during a cache-miss, as opposed to one fetching it while the others wait. Unfortunately, I don’t remember what the remedy was, but I guess always writing to a new file is one approach.

    Another thing would look into is whether the correct headers appear in the response, whether the request was *.aspx or extension less or other types for that matter.

  10. lmk says:

    Interesting matter, could you post also some exercises to solve with such articles like this?

  11. Sergejus says:

    I like you idea of using Action delegate inside singleton Initialize method!

  12. Gunnar says:

    How are HTTP HEAD requests handled? Is it IIS who responses with 304 (not changed) automatically or is it just not implemented here?