Data and Design: The Chicken and Egg Problem

Since it seems that summer (all 3 weeks of it) is now over here in the northwestern United States, I’ll start this post with a winter sports anecdote.  When explaining the differences between skiing and snowboarding to people who are not winter-sports-minded (like my parents who live in Texas), I usually mention that while snowboarding takes a little longer to get the basics, once you get it, there’s not that much more to it (Shaun White acrobatics aside).  Skiing, on the other hand, is an activity where with a little coordination, you can get to an average level pretty quickly – but it takes a lot more practice to get really good at it.

Creating a Pivot collection is kind of like skiing in that regard.  It’s pretty easy to take some data, shove it into a tool (or through an XSLT file), and get a file that will enable Pivot to move some rectangles around on the screen.  However, to create a truly compelling and effective collection requires a good bit more thought – and at least in my case, quite a few iterations.  An effective collection follows a design path very similar to that in creating info graphics – it must be visually compelling so that people actually bother to look at it, but it should be equally data-rich, and the visual cues should serve to focus the data without being distracting.  Further, because the Pivot experience exists on multiple levels, the design challenge is even more pronounced than in a static info graphic (more on this in a bit).

A quick disclaimer: I’m not claiming to have got this 100% right with the MSDN Magazine collection – this feels like one of those things I could refine and iterate on for a long, long time.  I did, however, go through quite a few design reviews with the Pivot UX team, so I hope to at least tell you some of the feedback that they gave me along the way.  I also want to tell you the approach I took for getting to a place where I was able to iterate pretty quickly.

1) What’s the Goal | What’s the Grain?

Since the magazine collection went live, I’ve talked to several different groups who have wanted to create collections for their own content.  However, many come to the conversation without having a clear goal in mind for what they hope to accomplish with the collection.  For some things, having something cool might be in itself sufficient (though even that is debatable), but remember that a Pivot collection follows many of the same rules of info graphics – and as such serves to communicate some specific (and hopefully intended) message to your users.  So what’s your message?  As I mentioned in my first post, my goal with the MSDN Magazine collection was to enable users to explore article content in a more immersive and meaningful way than the current search or navigation facilities.  Put another way, the Pivot experience is juxtaposed with navigation and search experiences, both of which tend to be static and decontextualized in nature.

Just answering this simple question around what you’re trying to accomplish with the collection will help to drive some fundamental design design decisions, such as:

  • What is the grain of a single item (which will ultimately be your trade card)?
  • Do I actually have 2 collections that need to be composed in a linked collection?
  • How do my users generally set about to accomplish the goals that I hope to improve by my collection?

For example, in the case of the magazine collection, I knew from my goals that the grain of an item would be an article.  I also considered whether or not it made sense to pull issues out into a separate collection and have the articles collection link to it.  In the end, I decided that having an issue’s collection didn’t really add a lot of value in helping people to explore article content, so while it would have been a cool excuse for me to build a linked collection, it hit the cutting room floor.  Finally, by looking at the existing MSDN Magazine Web site, I was able to get an initial sense of how users are able to navigate to article content today.  This shaped the initial data facets that I implemented in the collection.

2) Start with the Data – Think Dimensionally

Speaking of those facets, let’s talk about the data.  We live in a world where it feels as though every data  problem is looked at through the lens of the relational database – and that’s tragic.  Don’t get me wrong – there are many great things that you can do with relational databases, and a normalized design is the right solution to plenty of data problems out there – but not all.  When we think about problems that fall into the categories of  reporting (particularly the ad-hoc type), data analysis and exploration, the more appropriate data model is many times denormalized and dimensional rather than normalized and relational.  As a quick illustration, consider how how you might model the MSDN Magazine data in a traditional, normalized SQL database.  It might look something like this:

image

Now, this kind of design might be great if I were creating a transactional system where I’m doing lots of inserts, but for something like a reporting store (or a Pivot collection) where you’re focusing on reads, this is a suboptimal data structure because getting any kind of useful data set out of it requires joining tables together.  Instead, think about how you might create this in Excel – might be something more like this:

exceldata

Note that there’s now lots of data duplication (in this example, some of the duplicates are more realistic than others) – this is just fine for reporting scenarios where what we care about most is our engine’s ability to do things like grouping, filtering, and sorting.  This is the kind of data structure that tools like SQL Server Analysis Services work best on, and it’s the kind of data structure that Pivot expects for its collections – so best that you go ahead and get into that mindset early on.

3) Get to a Working State

Many of the people I’ve talked to about the magazine collection and about building Pivot collections more generally come in with the mindset of needing to plan the entire collection, from data to UI, before they start implementing any of it.  Given the number of iterations I personally went trough with my UI (the trade card design), I can say with confidence that unless you are equal parts developer and brilliant graphic artist, if you wait until you have all parts clearly understood, you may never get anything done.  One of the great things about developing Pivot collections is that you can pass a CXML file to the viewer and if the viewer can’t find your images (your trade cards), it will render placeholder images for all of the items specified in your CXML file.  Everything in your collection will function exactly how you should expect, from filtering and sorting to the item information pane when you click on an item.

image

The moral here is simple – focus first on your data and create a CXML.  You can then view it in the Pivot viewer.  Iterate on your CXML until you’ve got your data exactly how you want it.  THEN, worry about your trade card design.

4) Conflate Dimensions Based on the Goal

In some of my initial iterations of the MSDN Magazine Pivot collection, I had aspects of my publish date spread out between 2 different filter options.  Year was one option – it was stored as an integer value – and looking back, I think that I did this primarily so that I would get the cool slider bar control for the filter UX.  I also had the issue name, which was in most cases the issue’s publication month, sorted alphabetically. 

Now there were really a couple problems to the approach which weren’t apparent until a few iterations (and feedback from the Pivot design team).  The first problem was that I was using 2 different filters to filter on the same logical bit of data from the user perspective.  Second, my approach had me storing year as a number and months as a string – and this brought a variety of additional technical challenges.  For example, even though I had formatted my years to show up as integers, the slider control was still enabling me to select fractional values – and most people don’t find a lot of value in filtering by a date range starting with 2008.12345.  The other major challenge I ran into was with months being stored as strings.  As you know, the months of the year are not exactly alphabetical – which led me to have to create a custom sort order in my CXML.  While this is absolutely doable (and not at all difficult), it was also something that caused me to revisit the goals behind having these filters there in the first place.

In the case of year/issue, the goal was to enable users to filter or pivot by the date that the issue was published.  The manner in which I had exposed this information was disjointed and not even complete as it turned out (for example, how do you know when the ‘Visual Studio 2008 Launch Issue’ was published?  After reflecting on these sorts of issues for a bit, I decided to forego the cool slider control, and conflate year and issue into a single ‘Published Date’ category of a date data type.

5) What are the 30,000 Foot Insights?

Early on in the development of the magazine collection, I was fortunate to meet with one of the main UX designers on the Live Labs Pivot team.  One of the biggest challenges that this designer said to me when we first began discussing the idea was ‘That’s great that you can put 2500 articles in the Pivot browser’, but what kind of insights can you convey when you’re looking at all 2500 articles in a single, zoomed-out view?  This was (and continues to be) a really interesting question for me – how can you show insight from such a high level view?  What tools do you even have to work with at that level?  The obvious answer to that last question (though I’m sure that there are also less-obvious answers) is color.  So then came the question – so what kinds of things can I show using color?  One option that I considered and ruled out was topic.  I ruled it out for 2 reasons – 1) I had too many topics, and too many colors would have yielded a very muddy picture when zoomed out; 2) multiple topics could be mapped to a single article – again, it just doesn’t paint a very clean picture.

In the end, I decided that viewership was a data point which would always be specific to a single article and which could be divided into however many discrete ‘viewership groups’ that I wanted to show in my design.  Therefore, I setup my data extract routine (more on that later) to pull the previous month’s page views for each article and put it into a low/middle/high group.  And that’s what the background colors mean in the magazine collection.  The previous month’s page views are conveyed from the lightest background color (least viewed) to the darkest (most viewed).

About Howard Dierking

I like technology...a lot...
This entry was posted in MSDN Magazine, Pivot. Bookmark the permalink. Follow any comments here with the RSS feed for this post.