I was recently playing with the idea of implementing disk-base caching for pages that had a high read-to-write ratio. On top of being practically read-only, these pages also require quite of bit of database work to put together, so they are ideal candidates for caching. I dismissed the idea of using the built-in ASP.NET caching because of two reasons. First, the number of pages needing to be cached is in the thousands, making it impractical from a memory stand point. Secondly, since this is hosted in a web farm, each server would need to build-up its own cache, defeating the point of caching a little as well as introducing the possibility of different servers having out of sync cache.
The final solution may be to use memcached, but I really think that doing disk-based caching for these pages to a SAN (via iSCSI) is going to be more than good enough and simple to manage.
The code hasn't been deployed, nor has it been tested beyond the basic point of making sure that it runs as expected from VS.NET under the single load of a browser (and thus a single thread).
To make this work, we'll need three classes: an http module to hook into the page lifecycle, a custom filter to capture the output stream, and a configuration class to help us set the system up.
We'll take a quick first stab at our custom HttpModule:
public class DiskCacheModule : IHttpModule
{
public void Init(HttpApplication application)
{
application.BeginRequest += ApplicationBeginRequest;
}
private void ApplicationBeginRequest(object sender, EventArgs e)
{
var context = HttpContext.Current;
var request = context.Request;
var url = request.RawUrl;
var response = context.Response;
var path = GetPath(url);
var file = new FileInfo(path);
if (DateTime.Now.Subtract(file.LastWriteTime).TotalMinutes < 5)
{
response.TransmitFile(path);
response.End();
return;
}
try
{
var stream = file.OpenWrite();
response.Filter = new CaptureFilter(response.Filter, stream);
}
catch (Exception)
{
//todo log this error
}
}
public void Dispose()
{
}
private static string GetPath(string url)
{
var hash = Hash(url);
//todo change the hardcoded value
return string.Concat("c:\\temp\\", hash);
}
private static string Hash(string url)
{
var md5 = new System.Security.Cryptography.MD5CryptoServiceProvider();
var bs = md5.ComputeHash(Encoding.ASCII.GetBytes(url));
var s = new StringBuilder();
foreach (var b in bs)
{
s.Append(b.ToString("x2").ToLower());
}
return s.ToString();
}
}
As you can see, we cache based on the RawUrl property of the HttpRequest object. This is just a simple example, but you could cache on virtually anything - including QueryString or Form parameters, Cookies or Server Variables. The first thing we figure out is where the file should be stored (the GetPath and Hash methods). We take our RawUrl and MD5 it. We append that to our [ugly] hard-coded folder. Next we create a FileInfo based on the path and check whether the file was written less than 5 minutes ago. If it was, we use the highly efficient TransmitFile to send the file down to the browser. If the file wasn't there, or was stale, we let the normal ASP.NET process take place - with the only difference being that we inject our CaptureFilter into the HttpResponse.
HttpResponse Filters are nothing more than classes that inherit from System.IO.Stream, and the only important method is Write and Close (although we have to implement a bunch of other useless junk):
public class CaptureFilter : Stream
{
private readonly Stream _responseStream;
private readonly FileStream _cacheStream;
public override bool CanRead
{
get { return false; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return _responseStream.CanWrite; }
}
public override long Length
{
get { throw new NotSupportedException(); }
}
public override long Position
{
get { throw new NotSupportedException(); }
set { throw new NotSupportedException(); }
}
public CaptureFilter(Stream responseStream, FileStream stream)
{
_responseStream = responseStream;
_cacheStream = stream;
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotSupportedException();
}
public override void SetLength(long length)
{
throw new NotSupportedException();
}
public override int Read(byte[] buffer, int offset, int count)
{
throw new NotSupportedException();
}
public override void Flush()
{
_responseStream.Flush();
_cacheStream.Flush();
}
public override void Write(byte[] buffer, int offset, int count)
{
_cacheStream.Write(buffer, offset, count);
_responseStream.Write(buffer, offset, count);
}
public override void Close()
{
_responseStream.Close();
_cacheStream.Close();
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
_responseStream.Dispose();
_cacheStream.Dispose();
}
}
}
There really isn't much to it, our constructor expects the main Response's Stream as well as a FileStream to write to (which was created in the HttpModule by calling OpenWrite on our FileInfo). Whenever we are given a buffer, we write it to both streams.
In its simplest form, this is all we need, but let's add a configuration class to make it a little more practical. Our configuration class will define regular expression patterns which we'll match the RawUrl
against. If it's a match, we'll cache the file. If it isn't, we'll let ASP.NET do its thing.
public class DiskCacheConfiguration
{
private readonly IList<Regex> _patterns;
private static readonly DiskCacheConfiguration _instance = new DiskCacheConfiguration();
private DiskCacheConfiguration()
{
_patterns = new List<Regex>();
}
public static void Initialize(Action<DiskCacheConfiguration> action)
{
action(_instance);
}
public DiskCacheConfiguration AddPattern(string pattern)
{
_patterns.Add(new Regex(pattern, RegexOptions.Compiled));
return this;
}
public static bool IsMatch(string url)
{
foreach(var pattern in _instance._patterns)
{
if (pattern.IsMatch(url))
{
return true;
}
}
return false;
}
}
The two public methods, AddPattern and IsMatch are called from our modified HttpModule:
public void Init(HttpApplication application)
{
application.BeginRequest += ApplicationBeginRequest;
DiskCacheConfiguration.Initialize(c => c.AddPattern("^/Home/About$")
.AddPattern("^/default.aspx"));
}
private void ApplicationBeginRequest(object sender, EventArgs e)
{
var context = HttpContext.Current;
var request = context.Request;
var url = request.RawUrl;
if (!DiskCacheConfiguration.IsMatch(url))
{
return;
}
....
}
Again, you could do a lot more, such as adding ignore rules, of more detailed patterns or rules based on various headers. One of the things I'm worried about is how this will work under heavy-load. TransmitFile appears to use a FileShare.Read, and OpenWrite uses FileShare.None - which seems perfect for our cause, unless the heavy reads cause the writing thread to block (which won't happen if file access is queued, but I don't know if they are).
Posted
Sat, Aug 15 2009 8:45 PM
by
karl