One of the most widely used components in SSIS is the Sort component.
As many of you will know, the Sort component is a fully blocking component rather than a partially blocking component or a row transform component. [N.B. This paper from Elizabeth Vitt talks about fully blocking, partially blocking, and row transform components, as does this webcast from Donald Farmer)
The Sort component actually has two functions:
- It sorts data (that much is obvious)
- It can eliminate duplicates
The reason that the Sort component has this second function is that in order to determine duplicates it must do a Sort at some point - hence it makes sense to put these two functions into the same component.
I've been using the Sort component alot today to eliminate duplicates and as I was doing so, something occurred to me. I wanted my output to have duplicates eliminated but I didn't care whether the output was sorted or not. Therefore, why do I have to wait until ALL the rows have been consumed from the upstream buffer by the Sort component until it start to pass rows to the new downstream buffer? If I don't care about the output being sorted, why can't the component pass each new row into the output buffer without waiting for all rows from upstream? In other words, why can't my component that eliminates duplicates be a partially blocking component rather than a fully blocking component?
At that point I paid a visit to Connect and submitted this suggestion . Namely, I want a new component, a partially blocking component, that eliminates duplicates but doesn't bother to sort the output for me. If you think this would be useful then click-through and add a comment to the Connect submission (12 hours after originally posting this and some people already have). We're more likely to get it that way. And reply to this post as well (you'll need to sign-up) - let me know what you think.