While I was at the PASS Summit recently I was lucky enough to meet SQL Server guru and all round nice bloke Chris Hedgate. Chris had some polite things to say about my blog but did offer some constructive criticism - I should try and put more introductory type stuff on here cos most of what i write is too in depth for people just starting out with SSIS.
Fair point I thought, and I promised that I'd come up with some such content. Well...its only taken me two months!!
I've built a package that demos how SSIS handles one of the most common ETL operations - aggregation. Let's be a bit more specific though and explain exactly what I've defined.
In this mythical scenario I am processing log files from an online retailer. I have 3 log files in total:
- Views - The number of times that a customer clicked a link to view details on a product
- Picks - The number of times a customer put a particular product in their shopping cart
- Purchases - The value of each customer's purchase of each product
Here's a screenshot of my data-flow:

The three "boxes" (properly called source components) represent my three source files.
The "Union All" component amalgamates those three streams of data into one - very similar to the UNION operator in ANSI SQL. Let me explain how the UNION ALL component works by showing you the editor (which is displayed by doubleclicking on the component):

Some things to notice here:
- My three inputs are listed along the top. You can also see a column headed "Output Column Name" which specifies the name of the columns output from the Union All component
- Columns in each of the three inputs are mapped to these output columns. Notice that each input has a Customer column and a product column. these are the columns that we are going to GROUP BY when we aggregate later on in our data-flow.
- The inputs do not contain all of the columns in the output. Where an input does not contain the column you can see <ignore> which will result in NULL values in our data-flow. We will deal with these later.
After the Unon All component comes our Sort component. This component does exactly what it says on the tin - it sorts the data. We have to tell it how to sort the data though and we do that, again, in the component editor:

More things to notice:
- We are sorting by the columns that we are going to GROUP BY when we aggregate
- We can specify whether to sort ascending or descending
Next is the all important Aggregate component. This is the thing that takes our sorted data and groups it over some defined columns and carries out an aggregate function on each of the other columns. The Aggregate component doesn't actually require the inputted data to be sorted, so the prior Sort component wasn't actually required, I just included it so as to demo some extra functionality. Here's the Aggregate editor:

Note that we specify whether to group by a column or to apply a aggregate function to it.
At this point we can view the data at runtime using the very handy Data Viewers that SSIS provides. Here is a Data Viewer that shows us what the data looks like in the pipeline when it comes out of the Aggregate component:

This is pretty good. We have aggreated our data and can easily see, per customer and product, what they are interested on our website and what they have spent their money on. Notice though that we still have some NULL values in there and in an ideal world we'd set them to be zero instead. We can do that using a Derived Column component:

Here we have used SSIS's expression language (which I promise you is very easy to learn) to check for a NULL value in the pipeline. If it finds one it replaces it with a zero. Simple.
So there you have it - an easy aggregator. I dare say that if you're still reading then you may be interested in running this for yourself. Well I've provided the package and the three source files in a zip file here: http://blogs.conchango.com/Admin/ImageGallery/blogs.conchango.com/jamie.thomson/20051116AggregatorDemo.zip
All you have to do to run the demo is place the three source files into C:\temp, add the package to an existing Integration Services project, and run it. You'll see just how easy it is to quickly build working, production-quality packages.
Please let me have feed back. Did you find this easy? Did the demo work? Should I not have bothered? :)
-Jamie