I have just arrived home from an excellent BI-flavoured meeting of Tony Rogerson's UK SQL Server User Group. Aside from some great content the thing I love about these events is the change to meet up with people from around the BI community - people that I've met both in person and online in the past few years.
So Rastko Petrovic, Mat Stephen, Simon Sabin, Tony, Sanjay Nayyar...great to see you guys again. To Chris Webb, Sutha Thiru and Mark Hill...glad to finally put some faces to your names - I just wish I'd had more of a chance to have a natter with you guys!
Chris opened the evening by giving a great demo of some of the new features in MDX. That was really valuable to me due to the work I am currently undertaking in my day-to-day role. MDX Scripts are going to relieve some of my headaches I'm sure!
Mark followed by talking through his experiences of implementing a SSAS solution for a 2 billion row fact table. It was really valuable to hear some first-hand experience of the pain that he and Rastko have gone through on their current project and I'll definately be applying some of those learnings to what I'm doing now.
I was slightly disappointed to hear some criticism of SSIS's performance against none-SQL Server data sources but I can't argue with hard facts and Mark raised some serious concerns that I hope the SSIS team are going to take on board.
Simon followed up with a whistlestop demo of how to load and analyse unstructured ASP.Net exception files using SSIS. I love Simon's style of presenting. He comes across as being utterly unprepared (I'm sure he won't mind me saying that) and yet somehow manages to get his point across through sheer tenacity, wit and crazy exuberance.
Part of Simon's demo was showing how to loop over a set of files using a ForEach Loop and append the results to a raw file so that the full set of data from all source files can be processed as a whole. In order to do this Simon showed how it is necassary to, prior to the ForEach Loop execution, create an empty raw file with the same metadata as the raw file required to load the data into. His method of doing this was very simple - use a script component. All well and good.
I made a slight fool of myself by piping up in the middle of Simon's presentation that creating the empty raw file manually like that wasn't necassary - it is possible to make SSIS create the raw file on the first iteration of the loop- but when he challenged me to prove it I couldn't remember how. I was sure it was possible though.
Well, naturally, afterwards I went away to try and prove myself right and thankfully I'm not really as foolish as I might have first appeared. It IS possible to loop over files, creating the raw file on the first iteration and then appending to it thereafter. The trick is to set ValidateExternalMetadata=False on the raw file destination adapter as shown here:
If you want to see this for yourself then download a demo package from here. There are no connection managers in the package so running it will not be a problem. Here's a screenshot of the control-flow:
The only thing you need to make sure of is that it has read-write access to C:\temp because that is where it creates the raw file. Here's a screenshot of the data-flow:
So, the upshot of this is:
-You can append to a raw file within a ForEach loop
-It is possible to configure the raw file destination so that on the first iteration of the ForEach loop the raw file will get created.
Its an important point because being able to process data from multiple files all at once is an common requirement. Simon himself gave a clear example of this by taking a LOOKUP component out of the ForEach loop so that it only had to execute (and therefore cache all its data) once. Leaving it inside would be a major faux pas in terms of performance.
So all in all a good night and thanks to Chris, Mark, Simon and Tony for that. And a lesson learnt for me - don't jump headlong into someone else's demo unless you can back up what you are saying!