Astoria, SSIS Adapters and BDD – Projecting Projections

The big idea behind ADO.NET Data Services is that it enables a data source (generally a relational data source) to be exposed via a RESTful service.  In the context of fetching data – which is the principle concern of our SSIS data source component, data is returned as either an ATOM feed or as a JSON list.  Despite the particular data format, the general principles are the same – you’re taking a relational model and projecting it in a more serial structure.  Now, if all you’re querying is a single table, it doesn’t matter all that much – you’re simply taking a tabular data set and serializing each row as an entry in the ATOM feed.  However, ADO.NET Data Services provides an expansion feature that traverses a relationship and inserts the child data inline in the resulting feed.  Keep in mind that the ADO.NET Data Services API is based on HTTP – and the expansion feature is exposed simply as a querystring parameter.

Looking At the Data

Consider the following query: http://localhost:7558/northwind.svc/Products

This will return the following ATOM feed (fragment):

<feed xml:base="http://localhost:7558/Northwind.svc/" 
   xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" 
   xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" 
   xmlns="http://www.w3.org/2005/Atom">
   <title type="text">Products</title> 
   <id>http://localhost:7558/northwind.svc/Products</id> 
   <updated>2009-11-17T22:45:22Z</updated> 
   <link rel="self" title="Products" href="Products" /> 
   <entry>
      <id>http://localhost:7558/Northwind.svc/Products(1)</id> 
      <title type="text" /> 
      <updated>2009-11-17T22:45:22Z</updated> 
      <author>
         <name /> 
      </author>
      <link rel="edit" title="Products" href="Products(1)" /> 
      <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Categories" 
         type="application/atom+xml;type=entry" 
         title="Categories" 
         href="Products(1)/Categories" /> 
      <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Order_Details" 
         type="application/atom+xml;type=feed" 
         title="Order_Details" 
         href="Products(1)/Order_Details" /> 
      <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Suppliers" 
         type="application/atom+xml;type=entry" 
         title="Suppliers" 
         href="Products(1)/Suppliers" /> 
      <category term="NorthwindModel.Products" 
         scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /> 
      <content type="application/xml">
         <m:properties>
            <d:Discontinued m:type="Edm.Boolean">false</d:Discontinued> 
            <d:ProductID m:type="Edm.Int32">1</d:ProductID> 
            <d:ProductName>Chai</d:ProductName> 
            <d:QuantityPerUnit>10 boxes x 20 bags</d:QuantityPerUnit> 
            <d:ReorderLevel m:type="Edm.Int16">10</d:ReorderLevel> 
            <d:UnitPrice m:type="Edm.Decimal">18.0000</d:UnitPrice> 
            <d:UnitsInStock m:type="Edm.Int16">39</d:UnitsInStock> 
            <d:UnitsOnOrder m:type="Edm.Int16">0</d:UnitsOnOrder> 
         </m:properties>
      </content>
   </entry>
   ...
</feed>

As you can see, for a single resource query, the resulting feed is pretty simple with a feed consisting of multiple entries.  Each entry (in addition to the basic metadata) contains properties for the actual row values and links for relationships.  To leverage the expansion functionality – in this case, we’ll expand Suppliers, we would modify the query to look like the following: http://localhost:7558/northwind.svc/Products?$expand=Suppliers

From this query, we would get the following, expanded result:

<entry>
   <id>http://localhost:7558/Northwind.svc/Products(1)</id> 
   <title type="text" /> 
   <updated>2009-11-17T23:04:46Z</updated> 
   <author>
      <name /> 
   </author>
   <link rel="edit" title="Products" href="Products(1)" /> 
   <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Categories" 
      type="application/atom+xml;type=entry" title="Categories" href="Products(1)/Categories" /> 
   <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Order_Details" 
      type="application/atom+xml;type=feed" title="Order_Details" href="Products(1)/Order_Details" /> 
   <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Suppliers" 
      type="application/atom+xml;type=entry" title="Suppliers" href="Products(1)/Suppliers">
      <m:inline>
         <entry m:etag="W/"'49%20Gilbert%20St.','London','EC1%204SD'"">
            <id>http://localhost:7558/Northwind.svc/Suppliers(1)</id> 
            <title type="text" /> 
            <updated>2009-11-17T23:04:46Z</updated> 
            <author>
               <name /> 
            </author>
            <link rel="edit" title="Suppliers" href="Suppliers(1)" /> 
            <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Products" 
               type="application/atom+xml;type=feed" title="Products" href="Suppliers(1)/Products" /> 
            <category term="NorthwindModel.Suppliers" 
               scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /> 
            <content type="application/xml">
               <m:properties>
                  <d:Address>49 Gilbert St.</d:Address> 
                  <d:City>London</d:City> 
                  <d:CompanyName>Exotic Liquids</d:CompanyName> 
                  <d:ContactName>Charlotte Cooper</d:ContactName> 
                  <d:ContactTitle>Purchasing Manager</d:ContactTitle> 
                  <d:Country>UK</d:Country> 
                  <d:Fax m:null="true" /> 
                  <d:HomePage m:null="true" /> 
                  <d:Phone>(171) 555-2222</d:Phone> 
                  <d:PostalCode>EC1 4SD</d:PostalCode> 
                  <d:Region m:null="true" /> 
                  <d:SupplierID m:type="Edm.Int32">1</d:SupplierID> 
               </m:properties>
            </content>
         </entry>
      </m:inline>
   </link>
   <category term="NorthwindModel.Products" 
      scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /> 
   <content type="application/xml">
      <m:properties>
         <d:Discontinued m:type="Edm.Boolean">false</d:Discontinued> 
         <d:ProductID m:type="Edm.Int32">1</d:ProductID> 
         <d:ProductName>Chai</d:ProductName> 
         <d:QuantityPerUnit>10 boxes x 20 bags</d:QuantityPerUnit> 
         <d:ReorderLevel m:type="Edm.Int16">10</d:ReorderLevel> 
         <d:UnitPrice m:type="Edm.Decimal">18.0000</d:UnitPrice> 
         <d:UnitsInStock m:type="Edm.Int16">39</d:UnitsInStock> 
         <d:UnitsOnOrder m:type="Edm.Int16">0</d:UnitsOnOrder> 
      </m:properties>
   </content>
</entry>

While this example demonstrates expansion of a single link resource, ADO.NET Data Services allows you to expand a single link, multiple links, or multiple nested links.  The funny thing about this is that the SSIS data flow engine takes much more of a relational view of the world with its stream architecture.  This means that we’re starting with a relational model, projecting into a hierarchical model, and then projecting back to a relational model.  The question, then, is how we go about designing (and testing) this projection.

To start, here’s a few high level design principles:

  • Every resource usage (primary and all expanded) should yield a unique output stream in an SSIS data flow which is visible in the data flow designer
  • Composition of output streams (e.g. joining) should be done via data flow designer transform components

A Unique Stream Per Resource

Every resource that is returned as part of an ADO.NET Data Services query, whether it is the primary resource (e.g. Products) or an expanded resource, should have a unique output in the data flow designer.  Moreover, if a resource is referenced in more than 1 role (e.g. http://localhost:7558/northwind.svc/Products?$expand=Suppliers/Products), an output stream should be created for each role.  The goal of having multiple output streams is that each one can then by manipulated independently using the data flow designer.

multi-output-designer

Enabling Composition

One of the primary reasons behind creating unique streams per resource *role* is to enable granular composition/branching inside the data flow designer (as shown above).  The service metadata provided by a data service provides a type definition for each resource managed by the service.  This makes it initially straightforward for determining data shape from a data services query.  However, in order to effectively project the parent/child relationship specified with the expansion feature, we need to make sure we include the identifiers of the parent record in context of the child record.  I started with the following specifications:

public class when_employees_resource_query_expands_territories : with_northwind_CSDL
{
    Establish context = () => {
        queryBuilder.ResourcePath = "Employees";
        queryBuilder.OperationsString = "$expand=Territories";
    };

    Because of = () => query = queryBuilder.GetQuery();

    It should_declare_2_resources = () => query.DataShape.Count().ShouldEqual(2);

    It should_have_1_reference_property_for_territories =
        () => query.DataShape.ElementAt(1).Properties
            .Where(p => p.GetType() == typeof (RefProperty)).Count()
            .ShouldEqual(1);

    It should_have_2_properties_for_territories =
        () => query.DataShape.ElementAt(1).Properties
            .Where(p => p.GetType() == typeof (Property)).Count()
            .ShouldEqual(2);

    It should_have_correct_query_string = 
        () => query.ToString().ShouldEqual(
            string.Format("{0}/Employees?$expand=Territories", Constants.SERVICE_URI));

    It should_have_employees_as_first_resource = 
        () => query.DataShape.ElementAt(0).Name.ShouldEqual("Employees");

    It should_have_territories_as_the_second_resource =
        () => query.DataShape.ElementAt(1).Name.ShouldEqual("Territories");
}

First, note the with_northwind_CSDL base class.  Aaron brought up in an earlier post that this “with…” convention was one that he was actually moving further away from, so in this point, I want to call out more specifically what is happening here.  The basic point of this shared context is to setup a mock reader to get the XML metadata from a file rather than from the $metadata clause, which is what is used at runtime.

public class with_northwind_CSDL
{
    const string NORTHWIND_METADATA_PATH = "svc_metadata.xml";
    protected static DataServicesQuery query;
    protected static IDataServicesQueryBuilder queryBuilder;

    Establish context = () => {
        var mockMetadataReader = new Mock<IReader>();
        mockMetadataReader.Setup(r => r.GetData(Constants.SERVICE_URI + "/$metadata"))
            .Returns(new StreamReader(NORTHWIND_METADATA_PATH));

        queryBuilder = new DataServicesQueryBuilder(new CSDLResourceBuilder(), 
            mockMetadataReader.Object) {ServiceURI = Constants.SERVICE_URI};
    };
}

Setting this up in a base class rather than in every spec class saved me a fair amount of repetitive typing – especially since my specifications here are pivoting on how the query shapes the data shape rather than on how the metadata is acquired.

So once I had my basic specs setup, all that remained was the implementation.  As you can see from the following, the basic task of creating the data shape is effectively divided among 3 types:

 

core

  • the data reader is responsible for getting the metadata from some source (in the running service, this is done using a HTTP GET request – for my specs, this was mocked with an XML file reader)
  • the resource builder is responsible for producing resources (the main elements of the data shape) from the metadata provided by the data reader
  • the query builder is responsible for knowing how to interpret the query URI and ensuring that parent/child projections are correctly constructed.

Based on this separation of roles, you can see that the bulk of the projection logic resides in the query builder class – specifically in the GetResources method.  This logic looks like the following:

IEnumerable<Resource> GetResources() {
    var metadata = _metadataReader.GetData(string.Format("{0}/$metadata", ServiceURI));
    _resourceBuilder.Initialize(metadata);

    // initialize resourcs collection with primary resource
    var resources = new List<Resource> {
                                           _resourceBuilder.GetResource(ResourcePath.Split(new[] {'/'}).Last())
                                       };

    //add resources specified by the presence of an $expand query operation
    if (!string.IsNullOrEmpty(OperationsString)) {
        var expandOperation = QueryOperation.ParseQueryOperations(OperationsString).Expand();

        foreach (var expandedPath in expandOperation.ResourcePaths)
            if (expandedPath.Contains("/")) {
                var pathAxes = expandedPath.Split(new[] {'/'});
                for (var i = 0; i < pathAxes.Length; i++)
                    resources.Add(_resourceBuilder.GetResource(pathAxes[ i ], resources.Last()));
            }
            else
                resources.Add(_resourceBuilder.GetResource(expandedPath, resources.Last()));
    }

    return resources;
}

As you can see, this method is basically getting resource names from the data services query string and then asking the resource builder to build those resources.  For expanded resources, the builder is asked to produce the resource incorporating the keys of the parent resource.

In sum, I think that dealing with $expand is one of the more interesting aspect of this component, so I wanted to spend a bit more time going through it.  To see all of the components working together, you can download the entire source code at http://code.msdn.microsoft.com/ssisastoria.  But in sum, the problem looks like: DB (relational) –> Data Services (hierarchical) –> SSIS (relational).

About Howard Dierking

I like technology...a lot...
This entry was posted in ADO.NET Data Services, BDD, MSpec, SSIS. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://codebetter.com/members/hdierking/default.aspx hdierking

    @Glenn – more like bad copy/paste on my part – just updated

  • http://codebetter.com/members/gblock/default.aspx Glenn Block

    Nice Article Howard. Are those duplicate specs above is this some mSpec magic?

    It should_have_employees_as_first_resource and It should_have_employees_as_the_first_resource

    and

    It should_have_employees_as_second_resource and It should_have_employees_as_the_second_resource