It is generally accepted that a synchronous script component outputs the same number of rows as are input. Well I have news for you, this doesn't have to be the case. If you have a synchronous script component with multiple outputs then you can choose to send each input row to many or none of these outputs. As a result it is possible to have a synchronous script component that outputs less or more rows than arrive at it.
I started to think what this could be useful for. Why would you want a row to be sent to multiple outputs, or even no output at all? Well, the following is the simplest scenario I could come up with. Imagine you work for one of the direct marketing (i.e. junk mail) companies that we all know and love. You could have thousands, perhaps millions, of names, addresses, email addresses of potential customers for which you have a plethora of demographic data. You also have a number of marketing campaigns that you want to target at certain people depending on their demographic profile. We should expect that people will fall into many of these demographic categories. For example, you may want to market:
- Baby products to people with at least one child
- Double glazing to people that are homeowners
- Car insurance to those that own a car
Obviously people are going to fall into more than one of these groups, in fact quite conceivably all of them. How would you build a data-flow to determine those three pots of data normally? Well seeing as we're going to be filtering records we're obviously going to need a conditional split right? But there's a problem with that...a conditional split transform can send a row down only one of its outputs but clearly some customers are going to fall into more than one of our three categories. So, we need to employ a multicast transform and 3 different conditional split transforms, like so:
It seems a bit inefficient to have three times as much data produced by the multicast and three different conditional split transforms though, doesn't it? Instead, lets use a script transform to produce the data that we need. Here you can see the three outputs employed by our script transform:
Its important that these are synchronous outputs therefore their SynchronousInputID property must have the same value as the component input's ID property. Also, the three outputs must have their ExclusionGroup property set to the same non-zero value.
Here you can see the (very simple) code that produces our data:
As you can see we are able to direct each row to more than one output, thus alleviating the need for three different conditional split transforms. That's the key to this scripted solution, the greater breadth of functionality afforded us by the script component means that we can satisfy a requirement in one simple component whereas previously we needed more than that (four in fact).
Running the data-flow gives us the same results as before.
And there you have it, its as simple as that! Yet another use of the script component that reduces the amount of functionality that we have to build into our data-flows.
I wanted to test performance of the 2 slightly different solutions so I upped the number of rows running through the data-flows from 18484 to 443616 (using some UNIONs in my source SQL statement). The scripted solution executed in 15 seconds whereas the conditional split solution executed in 17 seconds so not only is the scripted solution quicker to build, it executes quicker too.
And yes, you can download this from here: http://blogs.conchango.com/Admin/ImageGallery/blogs.conchango.com/jamie.thomson/20050905MultipleOutputs.zip. You will need access to the AdventureWorksDW database and will also need Konesans' Trash Destination Adapter installed.