Collection Hosting Using Windows Azure

As I’ve mentioned in a couple earlier posts, Pivot is driven using Deep Zoom image pyramids.  This means that for a collection of trade card images, multiple additional image combinations are generated for different zoom levels.  The net result here is a LOT of generated images.  For example, in the case of the MSDN Magazine Pivot collection, there are roughly 2400 article trade cards.  Generating the deep zoom collection from this set of input images yields roughly 100,000 images with a total collection size of just under 500 MB.

As you can see, there are a ton of images generated, and a reasonable number of bytes that need to be stored, cached, etc.  For the MSDN Magazine collection, hosting the collection in our data center architecture wasn’t really a viable option for several reasons – the most notable one being that there’s not a really great story for managing large collections of images that change on a frequency or schedule that is different than the monthly platform release schedule.  Fortunately, Azure blob storage provides a really simple solution for dealing with this problem, as well as a few others.  Here’s how I setup Azure hosting for the MSDN Magazine Pivot collection.

CDN Caching

One of the nice things about hosting the collection in Azure blob storage is that enabling CDN edge caching as simple as one button click (see the upper left corner in the image below).  For more information about CDN, check out http://en.wikipedia.org/wiki/Content_delivery_network

azure-portal-storage

Silverlight Client Access Policy

I knew pretty early on that I wanted to host my cxml and Deep Zoom image pyramids in Azure blob storage.  However, my initial plan was to store the Silverlight xap file on our own Web platform and simply pull in the data.  As all of you Silverlight aficionados already know, while this is totally doable, you need to ensure that you have a client access policy XML file at the URL root of whatever resource you’re trying to access.  For Azure blob storage, you’ll want to put this file in a special $root container.  Doing this will apply the policies you declare to all containers in that blob storage account.

Like I said, my original plan was to host the collection in Azure and xap containing the Pivot viewer on our servers.  However, as I progressed in developing the solution, I realized something different.  There were no restrictions preventing me from hosting the xap itself in blob storage along with the collection data, and that by doing this, I would get 2 additional benefits:

  1. I no longer had a need for a clientaccesspolicy.xml file, since the xap and the data were now in the same domain.
  2. By simplifying things with #1, I could now make the entire magazine Pivot experience available to anybody that wanted to embed it in his or her own site – for example, check out this point on Mike Taulty’s blog.  Because everything needed for the Pivot experience is now sitting up in blob storage, the only code needed for anybody who wants to include the experience in his/her site is just the Silverlight object creation code.
<object data="data:application/x-silverlight-2," height="100%" type="application/x-silverlight-2" width="100%">
   <param name="source" value="http://az7446.vo.msecnd.net/msdn-magazine/Magazines.PivotViewer.xap">
   <param name="onError" value="onSilverlightError">
   <param name="background" value="white">
   <param name="minRuntimeVersion" value="4.0.50401.0">
   <param name="autoUpgrade" value="true">
   <param name="enableHtmlAccess" value="true">
   <param name="initParams" value="collection=http://az7446.vo.msecnd.net/msdn-magazine/msdnmagazine.cxml, defaultViewState=%24facet0%24=Issue%20Date&amp;%24view%24=2">
   <a href="http://go.microsoft.com/fwlink/?LinkID=149156&amp;v=4.0.50401.0" style="text-decoration:none;" >
      <img alt="Get Microsoft Silverlight" src="http://go.microsoft.com/fwlink/?LinkId=161376" style="border-style:none">
   </a>
</object>

Uploading

As I’m sure you can imagine, uploading 100,000 files to Azure can take a bit of time – and fortunately, from a brief bit of looking around, it seems as though there are more commercial tools to enable a performant upload experience.  At the time that I was working on the magazine collection, there was unfortunately not a ton of tools that could do what I needed – and so I wrote my own tool.  The requirements (and the resulting program) are pretty simple: recursively process a directory and all its contained files and upload all the files.  Additionally, because I’ve got a rockin’ machine with plenty of processor cores to go around, I made sure to take advantage of some of the opportunities for parallelism provided by the parallel extensions in .NET 4.  I was completely new to the Azure client libs but was fortunate enough to get help from Steve Marx (the guy on the Azure team who did the Netflix collection) – so many thanks to Steve!

At any rate, the code to do the upload looks like this:

private static void Main(string[] args) {
    if (args.Length < 3) {
        Console.WriteLine("Pivot.BlobStorageUpload.exe CollectionPath AccountName AccountKey");
        return;
    }

    var pivotCollectionFldr = args[0];
    var accountName = args[1];
    var key = args[2];

    const string magazineContainer = "msdn-magazine";
            
    var storageAccount = CloudStorageAccount.Parse(string.Format("DefaultEndpointsProtocol=http;AccountName={0};AccountKey={1}",accountName, key));
    var client = storageAccount.CreateCloudBlobClient();

    var container = client.GetContainerReference(magazineContainer);

    UploadDirectoryRecursive(pivotCollectionFldr, container);

    Console.WriteLine("\nFinished!  Press any key to continue...");
    Console.ReadLine();
}

private static void UploadDirectoryRecursive(string pivotCollectionFolder, CloudBlobContainer container) {
    string cxmlPath = null;

    // use 16 threads to upload
    Parallel.ForEach(EnumerateDirectoryRecursive(pivotCollectionFolder),
                        new ParallelOptions {MaxDegreeOfParallelism = 16},
                        filePath => {
                            if (Path.GetExtension(filePath) == ".cxml")    // save #####.cxml for last
                                cxmlPath = filePath;
                            else {
                                var blobAddressUri = Path.GetFullPath(filePath).Substring(pivotCollectionFolder.Length + 1);
                                var blob = container.GetBlobReference(blobAddressUri);
                                UploadFile(filePath, blob);   // upload each file, using the relative path as a blob name
                            }
                        });

    // finish up with the cxml itself
    if (cxmlPath != null)
        UploadFile(cxmlPath, container.GetBlobReference(Path.GetFullPath(cxmlPath).Substring(pivotCollectionFolder.Length + 1)));
}

private static IEnumerable<string> EnumerateDirectoryRecursive(string root) {
    foreach (var file in Directory.GetFiles(root))
        yield return file;
    foreach (var subdir in Directory.GetDirectories(root))
        foreach (var file in EnumerateDirectoryRecursive(subdir))
            yield return file;
}

private static void UploadFile(string filePath, CloudBlob blob) {
    var extension = Path.GetExtension(filePath).ToLower();
    blob.Properties.CacheControl = extension == ".cxml" ? "max-age=1800" : "max-age=7200";

    switch (extension) {
        case ".xml":
        case ".cxml":
        case ".dzi":
        case ".dzc":
            blob.Properties.ContentType = "application/xml";
            break;
        case ".jpg":
            blob.Properties.ContentType = "image/jpeg";
            break;
        case ".ico":
            blob.Properties.ContentType = "image/x-icon";
            break;
    }
    blob.UploadFile(filePath);
    Console.Write("*");
}

About Howard Dierking

I like technology...a lot...
This entry was posted in MSDN Magazine, Pivot. Bookmark the permalink. Follow any comments here with the RSS feed for this post.