Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Once upon a time this blog was a hive of activity. Now however its pretty lifeless as you can probably tell so if are pining for more of the same you can find me over at http://sqlblog.com/blogs/jamie_thomson. I look forward to seeing you there!

Astoria for SSIS

I have been talking about Astoria on this blog for some time now. For those who don't know what Astoria is the 50000ft view of it is that its a technology that allows us to expose data over HTTP thus anything that talks HTTP can consume the data. HTTP in this context is the lowest-common-denominator because its simple and nearly everything can talk HTTP these days so there is great potential for this technology in loosely-coupled architectures.

When I first heard about Astoria it piqued my interest because I am first and foremost a data guy and moving data from one place to another is the problem that I wrestle with day-in day-out. Furthermore, my current project is attempting to come up with a new way of delivering business intelligence to the masses using data services - its an approach that we call Service Oriented Business Intelligence (SoBI). If you've read my blog before you'll know that my weapon of choice for shifting data is SQL Server Integration Services (SSIS) and I immediately saw a possible synergy between Astoria and SSIS.

SSIS is a tool that enables us to combine data from multiple heterogeneous sources (such as databases, web services, files) and typically you would put that data somewhere else (again, such as a database or a file). However, there is another scenario in which SSIS can be used; that is enabling other applications to consume data from a SSIS package via an ADO.Net datareader. I supposed then that it may be possible to have an Astoria data service consume data from one of these ADO.Net datareaders, thus enabling us to expose data from multiple heterogeneous sources via an HTTP interface.

 

The thought that this may be possible both fascinated and excited me. Well, as much as it is possible to get excited about data anyway. Hence when Astoria main man Pablo Castro posted this blog entry entitled "Astoria data sources and system layering" on 27th September 2007 which talked about different ways that the Astoria team were considering making data available to an Astoria data service I was all over it. Pablo outlined four characteristics that a data source would have to fulfill in order for Astoria to use it:

  • Surface data as reasonable units we can expose as resources. For example, we could say that each CLR object is a resource
  • Addressability: each resource (say CLR object) needs to be addressable. For us to create an address we need to be able to figure out what members are the “keys” on that resource
  • Query composition: Astoria URIs are a simple form of queries. We need to be able to formulate and execute queries against the data source without knowing the specifics of the target data source. We also need to be able to “compose” queries; e.g. to take a given query and say add sort order to it
  • Update: we need to be able to pull a bunch of resources (say, again, CLR objects), make some modifications, and then push the changes back into the data source.

Does SSIS exhibit those characteristics?  Let's investigate.

"Surface data as reasonable units we can expose as resources" No problem here. The datareader destination component is what SSIS uses to expose data via an ADO.Net datareader and that can be considered a resource in the same way that a table can be. It also returns data in a flat rowset - just like a table.

"Each resource needs to be addressable" Again, I think we are OK here. Typically the way we consume data from a datareader destination is by specifying the name and location of a SSIS package (which is just a file after all) and the name of the datareader destination within it. Thus, the datareader destination is addressable.

"compose queries; e.g. take a given query and add sort order to it" Here the story isn't so good. We can parameterize our calls to a SSIS package all right and thus change what data the datareader destination returns. However, its not possible to do this in a generic way that would allow us to always specify (e.g.) a sort order or a filter. That would require the package to be built to a pre-defined pattern that provided an interface that enabled these sorts of operations. In Pablo's blog post he talks about IQueryable as that interface and I can't fathom how it could ever be possible to consume a SSIS package in a way that conforms to IQueryable.

"Update" No way. The data from SSIS' datareader destination is consumable but that is all. You cannot issue an update against that datareader and expect that the updates are pushed all the way back to the heterogeneous sources that it ultimately pulls the data from.

So SSIS fails the pre-requisites on two out of four counts so ostensibly it looks like this is a no. I am going to have to find another way of exposing data from SSIS via HTTP data services.

-Jamie

Published 29 September 2007 19:13 by jamie.thomson

Comments

 

Business » Astoria for SSIS said:

September 29, 2007 20:12
 

HP said:

Jamie perhaps you should also look at www.snaplogic.com which again uses a restful architecture, however this is open source so if it doesn't do what you want fire up you favorite text editor and hack away.

It works cross platform as well.

October 5, 2007 21:01
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems