Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS: How to load related tables

It is a very common scenario to be able to load two tables in the same data-flow that have a foreign key relationship between them. The question of how to do this gets raised very often on the SSIS forum so I figured I'd try and address it here.

Basically, its not possible to do this without using a workaround. There are two possible workarounds that I can think of:

1) Load the table containing the primary/unique key constraint within the data flow and drop the data destined for the table with the foreign key constraint to a raw file using the raw file destination adater. You can then load the second table in a subsequent data flow by pulling the data straight out of the raw file using a raw file source adapter. I have discussed this technique in a bit more detail here: http://blogs.conchango.com/jamiethomson/archive/2006/05/22/3974.aspx

2) Disable the foreign key constraint prior to loading the two tables and then re-enable it again afterwards. Disabling can be done using the following DDL:

ALTER TABLE <table_name> NOCHECK CONSTRAINT <constraint_name>

You can re-enable again after executing the data-flow with the following DLL:

ALTER TABLE <table_name> WITH CHECK CHECK CONSTRAINT <constraint_name>

Don't forget the "WITH CHECK" part when you are re-enabling the constraint otherwise the data that you have just loaded will not not be checked for referential integrity.

 

I always used to use method 1 but of late I have started to use method 2, simply because it is quicker to build and you don't have to drop data out into raw files.

Which method do you use?

-Jamie

 

Published Friday, February 17, 2006 6:02 PM by jamie.thomson

Comments

 

V. Iacoboni said:

I'm building a SSIS project that gets reloaded from a source system nightly. In order to perform a TRUNCATE TABLE on the primary key table, you've got to DROP (not just disable) the foreign key constraints. I did this in a script transform that produces DropFkScript and CreateFkScript variables, which are then executed before and after loading, respectively.
February 21, 2006 2:00 PM
 

Daniel Read said:

Hi, Jamie. Thanks for this post. However, I'm a little confused about what problem we're trying to solve.

Is the issue that the first table has an identity-based PK that needs to be known for the subsequent insert into the second table? If so, then wouldn't an OLE-DB Command transformation with an output parameter mapped to an existing field on the row do the trick? In other words:

- add placeholder int field to row using Derived Column

- use OLE-DB Command transform to call an insert stored procedure that returns the new identity in an output parameter; map the output parameter to the int field created in the previous step

- use a subsequent OLE-DB Command transfor to insert into the second table, mapping the newly initialized int field to the FK column in the table

What do you think? Or am I discussing a different topic altogether?

Dan
March 1, 2006 4:01 PM
 

jamie.thomson said:

Dan,

No, that's not the problem. The problem is that you cannot load two tables that are linked by a foreign key from within the same data-flow. Well, SOMETIMES you can but it depends on which order data gets committed - something that you have no control over.

-Jamie

January 26, 2007 9:34 PM
 

TheViewMaster said:

Good post,

It would also be helpful to add a SQL query to find out which FK's you need to disable,

Thanks

May 30, 2007 10:09 PM
 

Renato said:

Hi, ver useful blog, but i did'nt get this one right. where do i have to insert that script (ALTER TABLE <table_name> NOCHECK CONSTRAINT <constraint_name>), isn't DDL tasks supposed to be used on Analysis services.

November 22, 2007 1:16 PM
 

jamie.thomson said:

Hi Renato,

Sorry, I don't understand the question. This blog entry has no relevance to Analysis Services.

-Jamie

November 22, 2007 5:08 PM
 

Renato said:

Hi, thanks for the quick answer.

I know this post isn't about analysis services, but you mentioned about using a DDL to disable constraints before doing the insert. right? well the only place in SSIS I found something about DDL is in the toolbar in control flow, and that's an analysis services task.

I hope it's more clear now.

November 22, 2007 5:42 PM
 

jamie.thomson said:

DDL stands for "data definition language". It isn't something specific to Analysis Services and is actually more readily associated with relational databases.

-Jamie

November 22, 2007 5:48 PM
 

Judah said:

If the source tables have the relationships defined, you could do this:

1.  Use the proc defined here:  http://www.databasejournal.com/img/usr_sp_drop_constraints.sql  to drop all constraints on just the problem tables. (found via trial and exception)

2.  Then use the Copy SQL Objects to copy the tables (after step one is done).  If you include the option to copy foreign keys then, it will recreate the ones found on the source tables.

December 11, 2007 4:42 PM
 

Merl said:

Maybe a dumb question: doesn't the OLEDB destination Fast Load / Check Constraints option handle this situation? E.G. If I make a single data flow with two destinations, and the destinations are set for OLEDB with Fast Load, there's an option for Check Constraints that can be switched off, and it seems like the resulting behavior is the same as if you'd disabled constraints and then enabled them again.

January 28, 2008 5:46 PM
 

Merl said:

Ah, I think I just answered my own question: if the load into the FK constrained table commits first, before all the data in the parent table has finished loading, it'll fail. Nevermind :-).

It's really a shame that there's no way to make a precedence constraint, inside a single data flow, requiring one destination load to complete before another. It seems like that'd also solve this issue.

January 28, 2008 5:52 PM
 

jamie.thomson said:

Good point Merl, I honestly don't know the answer. Something to check out definitely.

-Jamie

January 28, 2008 5:53 PM
 

jamie.thomson said:

Merl,

"It's really a shame that there's no way to make a precedence constraint, inside a single data flow, requiring one destination load to complete before another."

I can see why that might seem to help but it doesn't really "fit" with the notion of a dataflow being an executable - i.e. a single unit of work.

Having said that, I've talked a lot in the past about something called Intrinsic Flow Priority which I think is exactly what you're talking about.:

http://blogs.conchango.com/jamiethomson/archive/2006/05/22/3974.aspx

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=178058

Feel free to click-through and add your comments to the connect submission (please :)

-Jamie

January 28, 2008 5:59 PM
 

Merl said:

I did manage to create another method (I'm sure someone else has done this too, so I'm not claiming to have invented this :-). This I find to be a little cleaner/more in the "spirit" of the data flow notion, but be warned that I'm fairly sure it would be too slow for a large quantity of data where performance is important. The integration I am working on involves collecting invoices in files from an FTP site, staging them locally and then pushing the staged data into various databases. Each has a header that is numbered using Identity in the target database, and a related table of detail rows. I found that the SSIS destination components won't allow precedence constraints as noted above, but the OLEDB Command component will, and will pass rows "though" to subsequent transformations. That means I can make a stored proc in the target database (with an output parameter to collect new row IDs) for inserting a header row, and a second for inserting a detail. In the data flow, I set up the data source as a join between incoming headers and details so I get a row for every detail, but each one contains the header data as well. I then multicast this into two pipelines, one for headers and one for details. In the header pipeline a Sort with Remove Duplicates filters the rows to unique header data. The resulting headers are then passed through an OLEDB command component to insert each one using the stored proc, and fetch the new ID from the proc's output parameter. The output of that component is then merge-joined back with the original detail rows from the multicast. The join results can then be put through a second OLE DB command component to insert the detail rows including the parent row IDs. That's basically it ( though my real process is "decorated" with some other stuff not relevant to this question. So, if you face this issue with a modest quantity of data, I think this works, and it's all one flow/one transaction, with no temp files, and the FKs remain in place in your target DB.

February 14, 2008 8:00 PM
 

bdeem said:

In the past I have used Method 2 (Drop/Add FK with check) when importing a large amount of data.  750MB worth of data that gets cleaned resulting in 120k Parent Rows with 300k Child rows.  I used a custom script component to split into two outputs (Parent and Child) going to an SQL Destination.  This loaded very fast for the amount of data involved/network traffic.  Thanks goes out to Jamie for this idea.

I have also used XML to handle the parent -child problem.  I generate a unique ID for each Parent using a custom script component.  Then, use a custom script component to form the XML and output each row uniquely with the attached XML.  Then, use a stored procedure that takes as an input the Parent Data as columns and the child data as an XML column.  (The stored procedure performs the business logic, and handles conversion of XML to the child table).

Why use the XML method?  Entity Attribute Value (EAV) scenarios.  For instance, you have to accommodate multiple import files where the number of columns keep changing.  So perform an UNPIVOT, convert to XML, and import using a stored procedure.  What rocks about this is that when it's time to do a new import you copy an existing one, update your UNPIVOT to include the new columns, and done.

May 6, 2008 5:26 PM
 

ssis check table for row said:

July 10, 2008 3:11 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems