"I have a file that is a list of orders. Each record in the file contains the order number and the item that has been ordered along with information about that item (e.g. Quantity ordered). I want to split the file into an OrderHeader and an OrderDetail table with RI between those tables. How do I do that?"
I have seen the above question posed numerous times, in various derivations, on the SSIS forum. It could also be rephrased as "How do I load a dimension and a fact table when the source data comes from the same place?" Its a very common request. I'll try to articulate the question a bit more here. Basically, people want to turn a flat file like this:

which contains a list of orders (you'll notice that some order numbers are repeated) into this normalised model:

The first thing to realise about this requirement is that it cannot be achieved in a single data-flow. This is because the values in OrderHeader.OrderHeaderID will be generated (either by SSIS or by an IDENTITY) and therefore will not be available for OrderDetail until after the data-flow has completed. Hence, we need a data-flow that loads OrderHeader and a second data-flow that loads OrderDetail. This second data-flow will get the required generated OrderHeader.OrderHeaderID and put it into OrderDetail.OrderHeaderID.
OK, so accepting that we need two data-flows it could be assumed that we need to access the source file twice. In actual fact this is not the case because:
- For large amounts of data this would be detrimental to package execution
- Repeating work when you don't need to is inefficient, prone to error and inconsistency, and quite simply a bad idea.
The altenative here to extracting the same data twice is to use SSIS raw files.
Here's the steps that you would need to take to accomplish this:
- Extract data from the file inside a SSIS data-flow
- Split the data into two data-paths using a MULTICAST transform
- Get a list of all the distinct order numbers and push into the OrderHeader table (in this example we have used an identity for OrderHeader.OrderHeaderID but we could have very easily just generated an ID in the data-flow)
- Get all of the order detail records and push them into a raw file.
Steps 1-4 all occur in the first data-flow. Here is a screenshot of that data-flow:

and in the second data-flow:
- Extract data from the OrderDetail raw file
- Lookup the newly created OrderHeader.OrderHeaderID
- Insert data to OrderDetail
Here again is a screenshot of that second data-flow

And that's it. Obviously this is a very simplistic example because the text file would likely have much more data in it but the principle remains the same.
The key learning point to take away from this post is that raw files can be very powerful tools in your SSIS arsenal. I use them extensively in my daily work and have, in the past, provided a number of examples of using them:
Lots of examples there that demonstrate just how useful raw files can be!
Lastly, if you want to run the package that I built to demo what I have talked about here then download the attachment (link below).
-Jamie
UPDATE 2006-10-23: This could also be achieved by defining priorities to destinations within a data-flow and therfore being able to say "Populate destinationA before destinationB". This feature is called 'Intrinsic Flow Priority' and is not currently available in SSIS. I have requested it at Microsoft Connect and you can see the request here:
Intrinsic Flow Priority
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=178058
I am very keen that this feature makes it into a future version of SSIS so please click on this link and your weight to the request in the form of a comment and a reason why this will be useful to you. Simply voting for it won't make much difference.