Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS: Loading a Kimball data warehouse using SSIS

I have just been watching Joy Mundy's webcast "Loading a Kimball method data warehouse using SQL Server Integration Services". I highly recommend it - Joy has done great job in outlining the basic skeleton for loading a data warehouse and covers some valuable concepts such as:

  • Using a parent-child package structure (or master-child as she refers to it)
  • Parent package configurations
  • The value of proper Audit techniques (something of which I'm a huge proponent)
  • The usefulness of variables especially when combined with expressions
  • Using ROWCOUNT component
  • Surrogate key lookup failures
  • The use of error flows

If I think about it there hasn't been, up until this, a webcast that covers the end-to-end build of a data warehouse solution like Joy's does. For that reason I hope any newcomer to SSIS goes and watches it.

There were a few remarks made by Joy that I'd just like to elaborate on.

1. SQL Server Destination

Joy stated that the SQL Server Destination adapter should never be used because it tries to commit all rows in a single transaction whereas the OLE DB Destination does not. That's a valuable point to make however I would warn against a carte blanche approach to your choice of destination adapter - the SQL Server Destination adapter has its place and in a heads-down race between the two adapters, SQL Server Destination would win every time. My advice would be to test the two approaches yourself and see what works best for you.

 

2. Communication between a child package and a parent package

Joy states that it is not possible to pass information from a child package back up to the package that calls it. Instead, Joy describes a useful technique for storing information such as this in an external table that can be accessed by both packages.

Unfortunately Joy's assertion that the child package cannot pass data back to the parent package is simply not true. I have previously briefly described a technique for doing exactly this right here. To put it more succinctly, a script task can "see" any variables scoped to ancenstral containers in a parent package and therefore can write values to them (and of course read from them as well).

It is important that people understand this because it can be a very very useful tecunique when building SSIS solutions. At Conchango we use it in two places:

  1. To pass values written by the ROWCOUNT component back up to a custom logging mechanism in the parent package.
  2. To concatenate the name of the child package to a variable called PackageStack in the ultimate ancestral package - again for logging purposes.

 

3. Parent Package Configuration

Joy ambiguously suggests that to use a parent package configuration you have to have identically named variables in your parent and child packages. She didn't say that explicitly but anyone unfamilar with SSIS may wrongly conclude that that is what she is alluding to. I just want to clarify that this is not the case, parent package configurations don't even need ANY variables to be in the child package, let alone identically named ones.

 

4. Constructing dynamic SQL

Joy suggests that a script task can be used to construct a SQL statement dynamically by saving it to a variable. That is true but I would argue for the case of a different method - use an expression. To do this, set property EvaluateAsExpression=TRUE and build your dynamic SQL statement using in the Expression property.

Using an expression has the advantage of less executables (i.e. tasks) appearing in your package. On the other hand, it does raise a slight risk of SQL Injection. Its your choice which method to use.

Also bear in mind that the current (in SSIS2005) maximum length of an expression is 4000 characters so if your expression exceeds that you will have to use Joy's suggested method.

 

-Jamie

 

Published Monday, August 14, 2006 9:22 PM by jamie.thomson

Comments

 

SSISGuru said:

Please Help...

Constructing dynamic SQL

Joy suggests that a script task can be used to construct a SQL statement dynamically by saving it to a variable. That is true but I would argue for the case of a different method - use an expression. To do this, set property EvaluateAsExpression=TRUE and build your dynamic SQL statement using in the Expression property.

Using an expression has the advantage of less executables (i.e. tasks) appearing in your package. On the other hand, it does raise a slight risk of SQL Injection. Its your choice which method to use.


I am using OLE DB SOURCE  in the "DATA FLOW TASK" and setting the "Data Access Mode:" as "SQL command from variable".

I assgin value to the variable (say Dynamic_Query) as "Select Person_ID,Start_Date,End_Date from Registrations where
Person_ID=? and Last_Update_Datetime=?".

I check using the script task that the variable contains the complete SQL Statement with the correct values for the parameters...  However, I am getting the following error when I run the package (in the using the SSIS Designer mode).


Warning: 0x800470C8 at Extract Student Activity, OLE DB Source [1]: The external metadata column collection is out of synchronization with the data source columns. The column "Person_ID" needs to be added to the external metadata column collection.

The column "Start_Date" needs to be added to the external metadata column collection.

The column "End_Date" needs to be added to the external metadata column collection.

The "external metadata column "end_date" (190)" needs to be removed from the external metadata column collection.

The "external metadata column "start_date" (187)" needs to be removed from the external metadata column collection.

The "external metadata column "person_id" (184)" needs to be removed from the external metadata column collection.

Error: 0xC004706B at Extract Student Activity, DTS.Pipeline: "component "OLE DB Source" (1)" failed validation and returned validation status "VS_NEEDSNEWMETADATA".


Could you please point me to the right direction as to what might be missing here?

Thanks in advance.

Rohit
August 15, 2006 2:41 AM
 

Mark said:

I had the same error. 1) Try to temporarily remove parameters from the SQL command, then enter all further data flow transformations & destination and check/set appropriate using of your DB fields (Person_ID, Start_Date, End_Date). Then restore parameters in the SQL command. If this doesn't help, 2) drag-n-drop to your data flow a new OLE DB SOURCE and configure it for using instead of the erroneous one (and delete the latter).

February 1, 2007 2:47 PM
 

Robert said:

Today is March 26, 2007.

Is there any way I can watch

Joy Mundy's webcast "Loading a Kimball method data warehouse using SQL Server Integration Services".

It was in the past.

Does Microsoft archive it?

Thanks,

Robert

March 26, 2007 5:11 PM
 

Ben Harris said:

You can click on the link and register as if it were a new webcast.  You will be directed to a page where you can replay the recorded webcast.

July 20, 2007 3:42 PM
 

SSIS Junkie said:

As pointed out by Andy , Steve Fibich has a great blog entry explaining how it is possible to pass values

October 6, 2007 6:19 PM
 

Sukumar said:

You also need to assign a value (in this case the select statement that you used in the script task) to the variable that you created at design time. Otherwise, it can't find the metadata at design time.

July 3, 2008 3:29 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems