Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS Nugget: Multiple outputs from a synchronous script transform

It is generally accepted that a synchronous script component outputs the same number of rows as are input. Well I have news for you, this doesn't have to be the case. If you have a synchronous script component with multiple outputs then you can choose to send each input row to many or none of these outputs. As a result it is possible to have a synchronous script component that outputs less or more rows than arrive at it.

I started to think what this could be useful for. Why would you want a row to be sent to multiple outputs, or even no output at all? Well, the following is the simplest scenario I could come up with. Imagine you work for one of the direct marketing (i.e. junk mail) companies that we all know and love. You could have thousands, perhaps millions, of names, addresses, email addresses of potential customers for which you have a plethora of demographic data. You also have a number of marketing campaigns that you want to target at certain people depending on their demographic profile. We should expect that people will fall into many of these demographic categories. For example, you may want to market:

  • Baby products to people with at least one child
  • Double glazing to people that are homeowners
  • Car insurance to those that own a car

Obviously people are going to fall into more than one of these groups, in fact quite conceivably all of them. How would you build a data-flow to determine those three pots of data normally? Well seeing as we're going to be filtering records we're obviously going to need a conditional split right? But there's a problem with that...a conditional split transform can send a row down only one of its outputs but clearly some customers are going to fall into more than one of our three categories. So, we need to employ a multicast transform and 3 different conditional split transforms, like so:

20050905CondSplitDF.JPG


It seems a bit inefficient to have three times as much data produced by the multicast and three different conditional split transforms though, doesn't it? Instead, lets use a script transform to produce the data that we need. Here you can see the three outputs employed by our script transform:

20050905ScriptOutputs.JPG

Its important that these are synchronous outputs therefore their SynchronousInputID property must have the same value as the component input's ID property. Also, the three outputs must have their ExclusionGroup property set to the same non-zero value.

Here you can see the (very simple) code that produces our data:

20050905ScriptCode.JPG

As you can see we are able to direct each row to more than one output, thus alleviating the need for three different conditional split transforms. That's the key to this scripted solution, the greater breadth of functionality afforded us by the script component means that we can satisfy a requirement in one simple component whereas previously we needed more than that (four in fact).

Running the data-flow gives us the same results as before.

20050905ScriptDF.JPG


And there you have it, its as simple as that! Yet another use of the script component that reduces the amount of functionality that we have to build into our data-flows.

I wanted to test performance of the 2 slightly different solutions so I upped the number of rows running through the data-flows from 18484 to 443616 (using some UNIONs in my source SQL statement). The scripted solution executed in 15 seconds whereas the conditional split solution executed in 17 seconds so not only is the scripted solution quicker to build, it executes quicker too.


And yes, you can download this from here: http://blogs.conchango.com/Admin/ImageGallery/blogs.conchango.com/jamie.thomson/20050905MultipleOutputs.zip. You will need access to the AdventureWorksDW database and will also need Konesans' Trash Destination Adapter installed.

-Jamie

Published Monday, September 05, 2005 9:43 AM by jamie.thomson

Comments

 

Kirk Haselden said:

As it turns out, both of these are identical under the covers. Since both transforms are synchronous, they are both producing the exact same number of buffers and rows. The additional or duplicate rows are an illusion. The Dataflow Task actually tracks what buffer columns and rows are visible to the downstream transforms, but doesn't copy any buffers or rows. It simply "exposes" them with row and column views.
Truly, the synchronous outputs only send the same number of columns as they receive on their inputs.
K
March 15, 2006 1:25 AM
 

SSIS Junkie said:

Many ex-DTS users are miffed that the SSIS expression language does have an equivalent of the VB Script

May 3, 2007 11:49 PM
 

BI Thoughts and Theories said:

It's been tough finding time to write any content recently, but this was a scenario that came up recently

July 12, 2007 3:20 AM
 

KingKong said:

Is it mandatory to handle records that do not pass throught a conditional split by sending them to a recordset for example. Are there any ill effects of choosing to just ignore non matches?

October 4, 2007 7:46 AM
 

jamie.thomson said:

King Kong,

Good question. No, you do not have to handle them and there are no ill effects other than having records in the pipeline that you don't actually require. But that's not a biggie.

If you think of conditional split as being the WHERE clause in a SQL statement then you can understand why you don't have to handle all outputs - you can leave them dangling.

-Jamie

October 4, 2007 1:36 PM
 

KingKong said:

Thanks Jamie,

My collegue though is adamant though that they need to be handled as otherwise the execution would produce warning for unhandled records. what do you make of it?

cheers

October 5, 2007 4:50 AM
 

jamie.thomson said:

KingKong,

If (Big if) I have understood the problem correctly then no, you won't get errors. Try it out and find out.

Perhaps your colleague is thinking of the LOOKUP component.

-Jamie

October 5, 2007 1:33 PM
 

KingKong said:

Thanks again Jamie, but this pedantic collegue does not want to live with the warning messages the unhandled records generate. You just cant win some.

October 8, 2007 1:18 AM
 

jamie.thomson said:

King Kong,

I don't understand. I've just built a quick and dirty package containing a Conditional Split that has an output that doesn't go anywhere. I don't get any warnig messages.

I am on SP2. What are you on?

Thanks

Jamie

October 8, 2007 11:59 PM
 

KingKong said:

Hi Jamie,

    On SP2 as well i presume. Will have to check that with the sys admin.

I am now writing a package that deals with file handling. I am to write the name of the file and the datestamp as the first line in a file and the total number of rows as the last line. Do u recommend a script task or is there an easier way to get this done?

Thanks again

KingKong

October 9, 2007 2:20 AM
 

jamie.thomson said:

October 9, 2007 2:30 AM
 

KingKong said:

Hi Jamie,

That was quite useful thanks. I am supposed to be writing the data to the same file. Do i do this by opening up the file in question via a script and insert the required information?

thanks

October 9, 2007 3:45 AM
 

SSIS Junkie said:

A message was posted on the SSIS forum earlier today asking about ExclusionGroups. BOL is a little light

January 5, 2008 11:22 PM
 

Col said:

I have a case where I want a single input row to generate 1 or more rows in the output (based on a value in the input row). How would I go about that?

I just tried using Multicast with 2 outputs to the same destination flat file but it gave an error saying that the output file was already in-use.

May 13, 2008 2:24 PM
 

jamie.thomson said:

Col,

You can't insert to the same file from two destination components. Each will have its own handle to the file.

Union the 2 data paths together and insert them using a single destination.

-Jamie

May 13, 2008 2:33 PM
 

TxJL said:

Jamie

I'm calling DirectRowToOutput() twice to generate two output rows for one input row.  I would like to write a different value for each of the output rows (see below), but when I add a data viewer after the component, both rows show "test2".  Any ideas on how to accomplish this?

       Row.errdesc2 = "test1"

       Row.DirectRowToOutput0()

       Row.errdesc2 = "test2"

       Row.DirectRowToOutput0()

I'm doing some error checking on an input row and each input row must pass multiple checks.  I'm trying to write an output row with an error description for each failed check.

Thanks,

J

May 15, 2008 12:26 AM
 

jamie.thomson said:

Hi there TxJL,

Interesting scenario you've got there. I would drect you to Kirk Haselden's first comment to this blog entry where he explains how the appearance of a row going down 2 outputs is just an illusion. Under the covers its still the same row - hopefully this helps to explain why you are seeing what you are seeing.

If you really do want to create 2 output rows from one input row then you're going to need to make this an asynchronous component.

Hope that helps.

-Jamie

May 15, 2008 9:57 AM
 

Nicolas A said:

I've done almost the same as you TxJL

Row.errdesc2 = "test1"

Row.DirectRowToOutput0()

Row.errdesc2 = "test2"

Row.DirectRowToOutput1()

And after that I use a Union All component.

It seems working

May 27, 2008 10:01 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems