Astoria, SSIS Adapters and BDD – Take 1

Yea I know – 3 things you never thought you would hear mentioned together.  And to be really honest, I didn’t start out trying to build this in a BDD fashion – and since I’m totally new to the disciplines of BDD, I may not have been all that successful.  But hey, it’s all about the journey, and so I wanted to at least go ahead and start posting about the work I’ve been doing here, partly to share and partly to get your input ways to improve my design/specs.

So here was the initial goal – I was studying ADO.NET Data Services (previously known as Astoria) when I remembered this article that I once read in the Architecture Journal about using RSS as a technique for surfacing data to load into an enterprise data warehouse.  That led me to think that it would be a cool opportunity to simultaneously learn Astoria and feed my BI sweet tooth by writing a SSIS data source that could consume an Astoria endpoint.  Given the fact that more and more systems are exposing data using syndication, it seemed like creating an SSIS data source was something that could be helpful beyond my simply wanting to play with the technology.

Wait – ADO.NET Data Services what??

Just in case you weren’t 100% sure what ADO.NET Data Services is, it’s technology to expose data and enable operations on data via RESTful operations.  You can find out much more about it here and here, but for the purposes of this exercise, here are the 3 things you need to know.

  • It uses all standard HTTP operations – which means that for getting data, you simply format a HTTP GET request for the service URL plus additional path information and optional querystring arguments to tell the framework how to shape the data.
  • While there’s not a dependency on the Entity Framework, there is a dependency on the Entity Data Model in that metadata is exposed as CSDL.
  • Though you have a couple formatting options when getting data, the primary one (and the one used by the data source component) is to get the data as an ATOM feed.  The data source component can then project the XML into the appropriate output buffer (or buffers as you’ll see later).

Wait – SSIS what??

Sql Server Integration Services is Microsoft’s ETL tool that ships with SQL Server.  You can start learning more about it here, but for the purpose of this exercise, here are the 3 things you need to know.

  • The unit of execution in SSIS is the package.  This is really similar nomenclature to DTS because, not surprisingly, SSIS evolved out of DTS.
  • Every package contains 2 kinds of logic: 1 “Control Flow” and zero-to-many “Data Flows”.  Control flow is meant for larger operations like copying files or running some kind of database maintenance operation (and while interesting, it’s not something I care about for this exercise).  The real meat of SSIS (IMHO) happens within a data flow – this is where data is extracted from a source, massaged, and put into a destination.  Data flows maintain the entire pipeline of operations between source and destination via in memory buffers.
  • Data flows use data source components to extract data from the source and add it to the appropriate buffers in the pipeline.  Data source components are also used to provide metadata to the SSIS design-time experience so that package designers can use it to clean up and massage the data using transformation components (** for those experienced SSIS folks, it’s worth noting that there is an XML data source component that is a part of the product, and since ADO.NET Data Services returns data as an ATOM feed, is is absolutely possible to consume it directly.  However, in order to take this approach, you will need to generate a XSD schema and associate it with the component at design time for the metadata – and every time you change the shape of your data services query, you will have to re-generate and re-associate a new schema.  This was one of the primary motivators for me to create a custom data source).

So then, the initial goals were as follows -

  • Create an SSIS data source component that could consume an ADO.NET Data Services endpoint
  • Ensure that the component participated in both the design time and runtime experiences
  • Ensure that the component was able to handle projections of ADO.NET Data services queries that took advantage of the $expand clause

As I got into the development I was pretty quickly able to accomplish the first 2 goals.  For example, given the following query configuration..

ConfigDialog

I was able to get the data shape using ADO.NET Data Service’s $metadata option and expose it back to the designer..

ColumnMappings

I was even able to run the package for a simple data services query and pipe the results to a flat file..

simpleExample

However, when I started implementing that 3rd goal of managing projections, I quickly realized my quick and dirty project need more thought in its design.  This drove me to add a few more goals.

  • Project should separate the core logic of the component from the SSIS infrastructure – which is heavily reliant on inheritance
  • Core component logic should be independently testable without requiring a running data services endpoint

Separating the logic out into a different assembly wasn’t really all that painful – just a few keystrokes of Resharper, a couple file moves, a reference added, and I was good to go.  However, because this started as a quick and dirty project (I should have known better), I hadn’t designed for data source independence and as such had some larger refactorings to do.  And in that process (** keep in mind that this started as an remains primarily a learning exercise for me – I would NEVER recommend taking this kind of “while I’m at it approach” on a real project), I decided that my tests weren’t really where I wanted them and that I had always wanted to get my hands dirty with BDD anyway, so why not?  It’s just one more goal.

  • Tests should be written as specifications

I’ve read from some that BDD is really meant more for testing larger behaviors in business applications.  However, it feels pretty good (though initially uncomfortable) at the grain at which I’m using it.  For example..

public class when_products_resource_is_queried : with_northwind_CSDL
{
    Establish context = () => queryBuilder.ResourcePath = "Products";

    Because of = () => query = queryBuilder.GetQuery();

    It should_have_8_properties = () => 
        query.DataShape.First().Properties.Count.ShouldEqual(8);

    It should_have_a_single_resource = () => 
        query.DataShape.Count().ShouldEqual(1);

    It should_have_1_key_property = () => 
        query.DataShape.First().Keys.Count.ShouldEqual(1);

    It should_have_correct_query_string = () =>
        query.ToString().ShouldEqual(string.Format(
            "{0}/Products", Constants.SERVICE_URI));
}

Ok – so those were the goals as they evolved. Going forward, my plan is to dig into the logic design and specifications, then talk specifically about projections, then finally talk about how this stuff works with SSIS (and how I would like to see the SSIS API simplified).  Sorry to not get into all that tonight, but as my friend Glenn has advised me countless times, if you wait to blog until everything is nice and tidy and completed, you may never blog again.

About Howard Dierking

I like technology...a lot...
This entry was posted in ADO.NET Data Services, BDD, SSIS. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://codebetter.com/members/hdierking/default.aspx hdierking

    Hi Todd – posted the source as a MSDN Code gallery resource at http://code.msdn.microsoft.com/ssisastoria

  • http://toddmcdermid.blogspot.com Todd McDermid

    When you get this completed, I’d love to have you post either the binaries and/or the source. If you do – please send me a note via http://ssisctc.codeplex.com where I maintain a directory of extensions for SSIS.
    Thanks!

  • http://codebetter.com/members/hdierking/default.aspx hdierking

    No – I was referring to behavior driven development (http://en.wikipedia.org/wiki/Behavior_driven_development).

  • BDD NHOI DKWII

    by your article, one can only assume that by BDD you mean business driven development, but is that accurate? You ASSUME the reader automagically knows what BDD is. It could be three other things, or whatever the end reader makes up.