Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Once upon a time this blog was a hive of activity. Now however its pretty lifeless as you can probably tell so if are pining for more of the same you can find me over at http://sqlblog.com/blogs/jamie_thomson. I look forward to seeing you there!

SSIS: Libraries of tasks and transformations

If you've been reading this blog for a while you'll have realised that I think SQL Server Integration Services (SSIS) is a major major improvevement on the product that it is replacing (DTS2000) however I do think that Microsoft have missed a trick somewhere along the line. Allow me to explain.

Imagine a situation where an organisation has a central data repository. Many of my SSIS packages may need to access this data so I am going to be building the same data extraction routine multiple times.

The new SSIS design paradigm means that each package contains 1 control-flow that consists of multiple tasks (data-flows are examples of tasks); so workflow has been seperated from data manipulation whereas in DTS2000 these were both built on the same design surface. The process of building an SSIS package is to instantiate a new control-flow and then build the tasks within that. In other words a task is associated with 1 and only 1 control-flow. It is possible to use that task in other control-flows using copy and paste but of course if you then need to make changes you need to make the changes in 2 places.

This is exhibited in the example from above. I can build a task (probably a data-flow) that extracts data from a table in the data repository and share that around my packages using copy and paste but if the structure of the data repository changes then I have many tasks that I need to change.

It would be better if I could build that task once and instantiate it in each of my control-flows. That way I only have to make changes in one place and the changes would be reflected everywhere that this task is used. This is analogous to a concept that we as software engineers are all familar with ....REUSABILITY!!!! In short I would like to be able to build libraries of reusable tasks that I can then drag and drop into my control-flows.

We can take this idea a little further. Instead of just building reusable tasks let's also build reusable transformations. For example if I have a derived column transformation that extracts the requested URI from a web log record, that is a nice bit of code that I am potentially going to need many times. If I can build that transformation just once and then instantiate it in mutliple data-flows then I save myself a heck of alot of work.

I hope this concept makes sense to people. I know that the SSIS dev team have talked about doing this in the next version. Here's hoping they do it bcause this is numero uno on my wish list for SSIS200?.

-Jamie

 

Published 05 February 2005 12:50 by jamie.thomson

Comments

 

jamie.thomson said:

I see your point Jamie... However - I think that a lot of what you are looking for can be accomplished by using sub-packages. That way you can call the same sub-packages from different packages, and any modifications would only need to be done in one place = the sub-package.
February 24, 2005 12:30
 

jamie.thomson said:

Hi Michael,
Thanks for you interest.

You're right, it CAN be accomplished using sub-packages but there are definate disadvantages (and limitations in fact) to doing this, not least that you then have more packages to manage.
There is also the argument that using sub-packages rather than pre-configured objects is like using a sledgehammer to crack a nut. It over-complicates things.

In the meantime I'm doing exactly what you suggest, using sub-packages.

cheers
Jamie
February 24, 2005 12:40
 

jamie.thomson said:

Michael,
I did some testing around this and, for reasons of performance, it looks as though calling child packages for single discrete operations is not really going to be viable.

Read more here: http://blogs.conchango.com/jamiethomson/archive/2005/03/01/1100.aspx

-Jamie
March 2, 2005 16:10
 

TrackBack said:

March 22, 2005 16:55
 

TrackBack said:

March 22, 2005 16:57
 

TrackBack said:

May 9, 2005 14:40
 

TrackBack said:

May 10, 2005 15:37
 

Jamie Thomson - Life, the universe and SSIS! said:

I've been working with SQL Server Integration Services (SSIS) for quiet some time now and as time...
March 24, 2006 10:31
 

SSIS Junkie said:

Here's a little tip for you. Some database pros are skeptical about the arrival of the XML data type

January 16, 2007 18:09
 

SSIS Junkie said:

I've been working with SQL Server Integration Services (SSIS) for quiet some time now and as time goes

January 16, 2007 18:18
 

Ivan Peev said:

Jamie,  your waiting is now officially over. What MS has failed to deliver, we have successfully developed. We have implemented another reusability extension, this time for the standard Data Flow task. You can now export data flow logic and reuse it in other packages without a need to copy-and-paste. You can also implement your own setup UI for your data flow with SSIS script.

You may check our blog about it here: http://cozyroc.wordpress.com/2007/12/17/cozyroc-ssis-12-beta-2-released/

and download it from here:

http://www.cozyroc.com

December 21, 2007 14:23
 

JOE said:

MS should take a look at INFORMATICA & learn how to make a robust ETL tool.  It has all the features people want & use but are missing in SSIS.

February 14, 2008 16:07
 

jamie.thomson said:

Joe,

I couldn't agree more that Informatica is a more complete, robust ETL tool. I used Informatica long before my first exposure to SSIS and it is that experience that prompted this blog entry and others like it.

Having said that, SSIS also has a lot of features that Informatica does not (or didn't when I used it) and hence I would appreciate your comment a lot more if you listed some specific examples rather than a generalised comment like this. What features do you mean? Who are these "people" you speak of?

I personally really miss the inate ability that Informatica has of instantiating a mapping (analogous to a SSIS dataflow) in as many workflows as you want. That's the biggest gap in SSIS and I have opined about it many a time on this blog.

-Jamie

February 15, 2008 01:57
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems