Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS: Checking if a row exists and if it does, has it changed?

I've been writing about SSIS for quite some time now and probably one of the more popular articles post I have written is this one:

Get all from Table A that isn't in Table B
(http://www.sqlis.com/default.aspx?311)

which I wrote way back in 2004.

Its not popular because of the quality of the writing, oh no. Its because of the subject matter. It details two ways that a SSIS developer might discover whether a row already exists in the target or not:

  • Use a MERGE JOIN or
  • Use a LOOKUP

Of the two options I think its fair to say that over time, the choice of using the LOOKUP component has proved to be the most popular.

Well interestingly Greg van Mullem has defined another way of achieving the same thing and he has written about it here: http://www.mathgv.com/sql2005docs/SSISTransformScriptETL.htm

Greg's method uses a script component to do the comparison between source and target, and highly useful code it is too. I recommend you digest Greg's article fully because there is a lot to learn from it - particularly in regard to accessing relational sources from within a script component.

 

The main reason Greg defined his own method was that he determined that while the LOOKUP option was useful, it had its limitations. He figured that while this method could determine that a row was already existing in the target, it couldn't determine whether the row had changed or not. Well I beg to differ. The key to determining whether the row has changed or not is make columns from the LOOKUP dataset available in the pipeline subsequent to the LOOKUP component - that way they can be compared with the incoming values using a DERIVED COLUMN or CONDITIONAL SPLIT component.

That was a rather verbose explanation so let me try and demo this. I have built a package as so:

 

The sequence container on the left hand side merely sets up some lookup data for us. It creates it in a table called tempdb..CustomerLookupData. Here's the entire contents of that table after it has been created and populated.

 

Very simple stuff!

Now here is the real crux of this post. The data-flow that contains my LOOKUP component:

The important things to show you here are the LOOKUP component and the CONDITIONAL SPLIT. Here's the LOOKUP:

Remember that you need to setup the LOOKUP component to redirect error rows. This means that any rows that don't find a match in the lookup cache will go down the error output. It isn't an error per se but for these purposes it proves to be very very useful.

What I need to point out to you here is that I am putting a column from the lookup dataset (i.e. from the CustomerLookupData table) into the pipeline after I have done the lookup and I have given it the alias LookupAge.

In the CONDITIONAL SPLIT pictured here:

I compare LookupAge with the Age column of the incoming row. If they are different then I direct them to the "Age has changed" output, otherwise they are the same and they go to the default output called "Age has stayed the same".

I've got data viewers on the three outputs:

  • New Customers
  • Age has changed
  • Age has stayed the same

so we can see what appears in each data path. Here is the executed data-flow:

As you can see the package has successfully determined:

  • The new customer(s)
  • The existing customer(s) whose age(s) have changed
  • The existing customer(s) whose age(s) have NOT changed

 

I'm not trying to say that this is the right way and Greg's is wrong - I'm a great advocate of there being more than one way to achieve a goal within SSIS. I was slightly concerned however that anyone reading Greg's article may think that this wasn't possible without resorting to writing code - and that simply isn't the case.

If you want to try this for yourself then download the package from here. It executes against the SQL Server instance called localhost so make sure you have an instance of SQL Server running there - or just change the connection manager.

Hope that is useful

 

-Jamie

 

UPDATE 2006-10-23: Someone who goes by the pseudonym "The Viewmaster" has done a performance comparison of the various mthods of achieving this. I will let you read the results for yourself here.

Published Tuesday, September 12, 2006 9:31 PM by jamie.thomson
Attachment(s): UpdateAndInsert.zip

Comments

 

Dheepa said:

Jamie,

How could I do this for each table in the database instead of writing a seperate dataflow task or package for each table in the databasee. I know I could configure the source and destination tasks from variable, but how will i do it for the lookup transformation task. Please advice

Dheepa

October 9, 2006 8:27 PM
 

SSIS Junkie said:

For some time now I have been promising Marco Russo from sqlbi.eu that I would take a look at their TableDifference

November 28, 2006 5:57 AM
 

javan said:

Is it possible to update the blob in the way u have suggested

January 12, 2007 4:04 AM
 

Evan said:

How do you add graphical text comments to the package?

February 25, 2007 5:11 PM
 

jamie.thomson said:

Evan,

You can't (currently).

-Jamie

February 25, 2007 5:22 PM
 

Evan said:

Thanks for the response. ALthough this is unfortunate news, it mean I can stop waisting my time looking around for a solution or playing with the toolbars in BIDS ... :)

Do u know if MS intends to add this functionality? (Some things I do miss from DTS)

February 25, 2007 7:47 PM
 

jamie.thomson said:

Evan,

Don't know to be honest.

Let me just clarify exactly what you mean. Do you mean you want to be able to drop images into annotations on the design surface?

I did ask for this functoinalit once and they said they'd look at it for a future version. That was a long time ago tho.

-Jamie

February 25, 2007 8:59 PM
 

Evan said:

Jamie,

I think we're talking abt the same thing. In the DTS designer, you could click on the [A|B] icon and dorp in text annotations into the package, purely for documentation purposes. The BIDS doesn't seem to have this functionality.

Evan

February 26, 2007 12:13 AM
 

jamie.thomson said:

Evan,

Ohhhh right! I misunderstood. Don't worry - you can do that. Right-click on the design surface and select 'Add Annotation'.

-Jamie

February 26, 2007 12:24 AM
 

Evan said:

Jamie,

Thanks. I see it now...

BTW, I luv ur sight, 'specially 'cause it contains working pkgs.

regards, Evan.

February 26, 2007 4:06 AM
 

bkallich said:

Let us say that a package uses merge join option to insert new rows and it uses OLE DB Destination with fast load, table lock, and batch size (1000) options. The table lock option on the destination forces us to put a NOLOCK hint in the OLE DB Source query (on the destination table). Otherwise, they run into locking contentions (one is requesting exclusive table lock when the other is requesting shared table lock). This works most of the times but occationally I see that SQL Server does not honour the hint (one scenario: when the target table does not have a clustered index on the join key column) and runs into locking contentions. Have you seen this? Do you rely on NOLOCK hint too?

March 12, 2007 9:19 PM
 

jamie.thomson said:

Hi bkallich,

This is a good question. In practice I have always used the LOOKUP method rather than MERGE JOIN so I have never run into this problem although can see how it could happen. It won't happen with LOOKUP if you're using full caching mode because the lookup cache is charged (and the lock is dropped) prior to execution of the data-flow.

If you're experiencing this problem then an easy workaround is to drop the data into a raw file and do the insertion in another data-flow.

-Jamie

March 12, 2007 10:04 PM
 

bkallich said:

Thanks Jamie. I will try the lookup transformation instead. I asked the same question on MSDN too and Phil Brammer had the comment to the same effect. Sorry about asking the same question twice.

March 12, 2007 11:33 PM
 

Greg said:

I may be mistaken but when I attach a SQL Destination to the error output of a LOOKUP the Destination "times out" if there is no error output.  This could be a potential pitfall to be aware of.

May 3, 2007 10:30 PM
 

FO said:

Hi Jamie,

Can we or should we use this approach as an alternative to the SCD Transformation? For large volume dimension the SCD seems to be too slow and I would like to use only build-in SSIS transformations (i.e. I cannot use the TableDifference Transformation from Alberto Ferrari).

Thanks,

Great site, it helps a lot.

FO

June 7, 2007 2:50 PM
 

jamie.thomson said:

FO,

Yes, you can use this instead of the SCD. Under the covers the SCD uses a lookup anyway. I prefer to use the lookup because it gives you more control.

Perhaps you could use the SCD wizard to build the bare bones of the data-flow and then change it accordingly to use the technique above. If you wanted to.

-Jamie

June 7, 2007 4:43 PM
 

Maggie said:

Hi all - great article and comments the best i've found.i'm very new to SSIS and still trying to find my way around. Can someone tell me is there disadvantages to using SCD compared to lookups and ole command? i need to be able to update and insert from source to destination and also delete redundant records at destination no longer in the source.

thanks Maggie

June 18, 2007 5:04 PM
 

jamie.thomson said:

Maggie,

The main disadvantage of the SCD component is that you don't really have the ability to configure it like you can a LOOKUP component.

Don't forget that if you use the SCD wizard it'll probably still produce a OLE DB Command so there's no getting away from the pesky things.

-Jamie

June 18, 2007 5:19 PM
 

FO said:

Hi Jamie,

I ran into another problem using this technique, and I saw that your pointed it out in one of your discussion group (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=717965&SiteID=1).

My main problem is in the Conditional Split when my dataflow contains a NULL column as well as my Lookup Column. It generates an error because it doens`t like to have NULL value instead of a Boolean.

My concern (maybe trivial for you) is how can I achieve the proper Split to avoid this kind of error? I know it`s achieved using ISNULL functions, but I don`t seem to find the proper way to do it.

My expression looks like:

(FINAL_Name != Lookup_Name && (!ISNULL(FINAL_Nom_Mun) && !ISNULL(Lookup_Name)) but this will work only when, in fact both Final and Lookup columns are Null. What happens when only one column is NULL. How can I write an expression that will take this possibility in consideration?

Thanks again

FO

June 18, 2007 9:34 PM
 

FO said:

Sorry about the error above. Note that my expression will only work when both inputs are not null.

(FINAL_Name != Lookup_Name && (!ISNULL(FINAL_Name) && !ISNULL(Lookup_Name))

How can I achieve my proper split when only one column is NULL or both are Null?

Thanks again

FO

June 18, 2007 9:39 PM
 

jamie.thomson said:

FO,

Can you use the || (i.e. OR) operator? Like this:

(FINAL_Name != Lookup_Name && (!ISNULL(FINAL_Name) || !ISNULL(Lookup_Name))

-Jamie

June 18, 2007 9:45 PM
 

FO said:

Jamie,

No, I`ve tried to use it but with no luck. Even with only the expression

FINAL_Name != Lookup_Name, if one column of the Lookup is Null, I get an error.

Any work around?

Thanks

June 18, 2007 10:10 PM
 

jamie.thomson said:

If you told me the error message I might have more of a chance.

-Jamie

June 18, 2007 10:30 PM
 

FO said:

Jamie, sorry about that,

Here's more details,

For the following cases:

1) If the Lookup_Nom_Mun value is NULL and the FINAL_Nom_Mun is not NULL

2) if the Lookup_Nom_Mun value is not NULL and the FINAL_Nom_Mun is NULL

3) if both columns are Null

With the expression FINAL_Nom_Mun != Lookup_Nom_Mun in the Conditionnal Split

I get the error message below:

[Check for Modifications [95974]] Error: The expression "FINAL_Nom_Mun != Lookup_Nom_Mun" on "output "Modified Rows (SCD1)" (102221)" evaluated to NULL, but the "component "Check for Modifications" (95974)" requires a Boolean results. Modify the error row disposition on the output to treat this result as False (Ignore Failure) or to redirect this row to the error output (Redirect Row).  The expression results must be Boolean for a Conditional Split.  A NULL expression result is an error.

Note that both columns are varchar.

Thanks

FO

June 19, 2007 2:29 PM
 

jamie.thomson said:

Ah, yes, this is a bit of a funny. You have to explicitly cast the result of the expression to what you want it to be. Change:

FINAL_Nom_Mun != Lookup_Nom_Mun

to be

(DT_BOOL)FINAL_Nom_Mun != Lookup_Nom_Mun

Explained more here: http://blogs.conchango.com/jamiethomson/archive/2006/10/12/SSIS_3A00_-NULLs-in-expressions-gotcha.aspx

-Jamie

June 19, 2007 5:42 PM
 

FO said:

This problem is giving me a headacke...

I've changed my expression to

(DT_BOOL)(FINAL_Nom_Mun != Lookup_Nom_Mun)

but I get the same error message as before.

[Check for Modifications [95974]] Error: The expression "(DT_BOOL)(FINAL_Nom_Mun != Lookup_Nom_Mun)" on "output "Modified Rows (SCD1)" (102221)" evaluated to NULL, but the "component "Check for Modifications" (95974)" requires a Boolean results. Modify the error row disposition on the output to treat this result as False (Ignore Failure) or to redirect this row to the error output (Redirect Row).  The expression results must be Boolean for a Conditional Split.  A NULL expression result is an error.

Any more ideas?

Thanks again for your help, it's much appreciate

FO

June 19, 2007 7:43 PM
 

jamie.thomson said:

no idea, sorry.

You should check the SSIS forum. More pairs of eyes there that can help.

-Jamie

June 19, 2007 7:51 PM
 

FO said:

Thanks for your help

FO

June 20, 2007 2:57 PM
 

Scott McKechnie said:

Hi Jamie thanks your site is great I have learned a lot from you. I have tried this soultion and it works great. I am still new to SSIS and was wondering how you would check if more then one column has changed.

I would apprecate the help as I have a soultion but it is very long winded and heavy on the system. Also how would you do the updates and inserts once the conditional split has completed.

Thanks in advance Scott

July 11, 2007 2:16 PM
 

jamie.thomson said:

Hi Scott,

If you want to check for more than one column changing then you just need more conditions in the expression in your conditional split. Seperate them with the || operator (i.e. logical OR).

Use the OLE DB Command to do the updates, an OLE DB Destination to do the inserts.

-Jamie

July 11, 2007 4:56 PM
 

Scott McKechnie said:

Thanks Jamie it works like a dream. I really appreciate the help.

July 12, 2007 1:05 PM
 

Nick Van Dyk said:

Hey Jamie,

Great stuff.  I continue to use your site to answer all of my questions.  I understand your look up to compare values above.  But the table I am working with can be up to 6 million rows at some properties and I need to look for changes in 40 different columns.  I have tried implementing this lookup and it is taking all of my memory to cahce the whole table.  Is there a better way to do this when dealing with large datasets?  Any advice would be greatly appreciated.

Nick

July 17, 2007 1:59 AM
 

jamie.thomson said:

Hi Nick,

If memory is a blocker then you're going to have to do a non-cached lookup. That seems like the only option if you are stuck on using the Lookup component.

Its possible to use the Merge Join to achieve the same as the Lookup so maybe try and use that. It requires sort operations but it does mean that you don't have to have all the lookup set in memory at once. This article explains how Merge Join can be used to achieve the same as the Lookup: http://www.sqlis.com/311.aspx

-Jamie

July 17, 2007 5:07 AM
 

Nick Van Dyk said:

Hey Jamie,

I implemented the following 5 tasks merge join, cached lookup, non cached lookup, Slowly Changing Dim, and a script similar to Greg's above.  The non cached lookup worked the fastest with the script not to far behind it, but I had to stop the others due to paging.  You were right again.  Thanks for the help.

Nick

July 18, 2007 12:14 AM
 

ParkAve300 said:

Jamie,

Having formerly been employed at Conchango, I defintely had a feeling of 'Small World' after I first came across your blog a few months ago when I began developing ETL solutions with SSIS. It's defintely one of the more helpful SSIS resources online.

I've been trying to implement a package based on the ideas in this article as well as the other articles referenced a the top of this one.

Update or Insert into Table 1 in DB_A on Server_A from

Table 2 (aproximately 6 million records) in DB_B on Server_B

I've tried the MergeJoin method as well as the Lookup (no cache) method. Both methods seem to choke. The Merge Join method chokes on the sort component for Table 2, the Lookup nethod chokes at the lookup component.

Do I have any hope of implementing this in SSIS or is the data volume just too much for these types of operations to handle?

Thanks.

July 23, 2007 9:30 PM
 

jamie.thomson said:

Hi ParkAve300,

How many rows are in Table1?

Sorting 6m records in the Sort component is guaranteed to be a problem. If the data is in a table already, why not sort it using an ORDER BY clause when you extract the data?

cheers

Jamie

July 24, 2007 3:31 AM
 

Mark W said:

Hi Jamie

Nice site. I like the Lookup method, but my destination table (Table B) uses a unique identity key (Is Indentity=Yes, Increment=1) which will not exist in my source table (Table A). Table A can contain updates to existing records and new records.

So after detecting which records in Table A are new and which exist already in Table B - using the Lookup task - then working out which columns in the existing records have changed using the conditional task,

how can I get the existing record data output to UPDATE rather than INSERT into the final destination (Table B).

Will the destination data flow always do a SQL INSERT or will it do a SQL UPDATE - if so how do I tell it?

The Table B is one of the lookup tables, so I can't dump it at the start otherwise all history will be lost, and there will be nothing to merge.

I guess I could make a duplicate of the Table B first - to Table C, then truncate Table B so it removes ALL records, then use the above data flow tasks to load it all back in again from joining Table A and Table C. This will populate Table B will new identity keys for all rows - old and new.. But that seems clumsy.

July 24, 2007 3:17 PM
 

jamie.thomson said:

Mark,

The OLE DB Command transform can be used to do updates. Check it out here: http://search.live.com/results.aspx?FORM=QBJK&q1=macro%3Ajamiet.ssis&q=ole+db+command

There is a problem with it though. It does row-level operations so it is slow. So, another option is to ush all data into a temporary table and issue a set-based update from an Execute SQL Task.

-Jamie

July 24, 2007 7:17 PM
 

ParkAve300 said:

Jamie,

Thanks for the reply. During the first run through Table 1 will have maybe up to 100,000 records. After the first run it will have 6M+. I tried the Merge Join method first and when it choked, I tried the Lookup method which also choked. So, I just didn't think to go back and refine the first attempt.

But after your post I did that and it does work though it's not exactly fleet of foot. It takes about 45 minutes to run through the first time. On a subsequent run through when it is making no inserts/updates it takes about 25 minutes to run.

Though the speed is less than ideal, it will be running in the middle of the night as part of a scheduled process so it isn't the end of the world.

Thanks Again.

July 24, 2007 7:33 PM
 

jamie.thomson said:

I'd be interested to see how the 45mins/25 mins is broken down. Im guessing that on the first execution its the insertion that is taking the bulk of the time hence its a perf tuning task to get this down. I would sugget the bottleneck is outside of the pipeline. Just a thought.

On subsequent executions the bottleneck is likely to be fetching data from source (6m from both places if I understand correctly). That will take time although again, could be improved.

Headline: Find out where the bottlenecks are. And read this: http://blogs.conchango.com/jamiethomson/archive/2006/06/14/SSIS_3A00_-Donald-Farmer_2700_s-Technet-webcast.aspx

-Jamie

July 24, 2007 7:43 PM
 

ParkAve300 said:

Thanks for the link. I'll defintely give it a look, though I probably won't have a chance till the weekend since it's over an hour long. Too bad there isn't a transcript available.

Most of the extra time in the 45 minute run is during the INSERT (There are very few UPDATES), the MERGE JOIN also apears to take longer, possibly since on the first run there is a huge difference between the 2 source tables and on the second run there is nothing being merged.

July 24, 2007 8:36 PM
 

SSIS Junkie said:

I work predominantly in the data integration space and I suspect most of you reading this do too. One

August 8, 2007 1:56 AM
 

alvon said:

hi Jamie,

it a very useful help you got there. im sure many who read it would find it a relief, like me. however, how about involving two different database?? MySQL and sql server mobile edition?? you see, i have a dekstop system (web-based actually) tht uses MySQL and i have been asked to build its mobile version in a pocket pc, which is a TOTALLY new to me. i am having trouble of what to do or where to start. what do u suggest? i have installed the sql server into my pocket pc and done the normal sync between dsktop n ppc tht uses same dbase. but i'm lost when it is involving two different dbase. should i install the MySQL mobile edition instead??

thank you in advance. i really do appreciate it.

August 14, 2007 5:51 AM
 

Roopa said:

Hi,

I have created the package in the same manner. But in oledb command i am using Oracle Provider but its showing error while mapping the columns.

Error at Data flow task[oledb command[8717]]: An oledb error has occured. Error Code:0x80040E51. An Ole db record is available. Sourece:"OraOledb" Descrition:"Provider cannot derive parameter information and SetParameterInfo has not been called"

Unable to retreive destination column descriptions from the parameters of the sql command

Any suggestiion to this case??

Regards,

Roopa

September 5, 2007 6:13 AM
 

Arthur said:

Hi,

I started with the SCD component. Since a short time i use the lookup en split. But the trouble with both of these options are NULLS.

In the Conditional split u case use the ISNULL command. But in the SCD this is not possible and thats where the trouble starts. when NULL is compared with something else, by default the condition is FALSE. when i use ANSI NULL OFF, it seems to work in Management Studio.

set ansi_nulls off

if NULL <> 's' select 'bout' else select 'test' (RESULT is 'test')

set ansi_nulls off

if NULL <> 's' select 'bout' else select 'test' (RESULT is 'bout')

I would like to know if its possible to use ANSI NULL OFF in SSIS. I cant find a place where is can set this command. When i use a SQL-task before the update its still does not work.

Can u help me out here?

September 5, 2007 1:04 PM
 

jamie.thomson said:

Arthur,

Simple answer...there is no equivalent of ANSI NULL OFF in SSIS.

-Jamie

September 5, 2007 4:56 PM
 

Arthur said:

Jamie,

Thanks for your reply. Now i can stop searching for it :-)

But does this mean i simply cant use the SCD component because we have NULL values? Or is there another way to work around these NULL values?

with regard,

Arthur

September 6, 2007 12:06 PM
 

Michael Kramer said:

Ok so simple question, I've been using the LOOKUP to see if the row needs to be changed but my tables have like 100 columns and using the ole db command is driving me nuts.  Because there is no nice GUI for it.  No build SQL.  

So I've been using the SCD to build the sql.  Is this correct or I'm I missing something?  

I mean doing an UPDATE or INSERT to a table is so standard that why is this such a hard thing to find out how to do it?  I feel like I'm doing the wrong thing.

Here is what I'm doing, please tell me I'm doing it right.

data comes in -> if the mod date changed then -> update all columns else -> insert new row.  I use a LOOKUP if the mod date != LOOKUPmod date.  

Simple right. Now with LOOKUP a new row is an ERROR? This does not sound right.

Michael

September 20, 2007 2:56 AM
 

jamie.thomson said:

Michael,

By default the new row goes down the ERROR output, correct. You can reconfigure the LOOKUP so that the row does not get diverted to the error output and then later use a conditional split to get your inserts and updates (records to be inserted will have a NULL in the lookup column).

The OLE DB Command isn't great, you are right. One other way around it is to push the data into a temporary table and issue a set-based update from the Exec SQL Task.

By the way, you're not using OLE DB Command to do inserts are you? That's wrong!

-Jamie

September 20, 2007 7:29 PM
 

Michael Kramer said:

No, I use Insert Destination to do inserts.  Now is OLE BO Command is it just slow for UPDATES and I should write a set-based update (which I don't know what they are I'm assuming it's like writing a stored Proc to update rows.  Cursoring throught to do the update).  Or it's the same just not very user friendly.

September 20, 2007 8:47 PM
 

jamie.thomson said:

Michael,

By set-based update I mean just a normal UPDATE statement that you would run from any query tool. i.e.

UPDATE destination

SET....

FROM destination

inner join temp_table

on ...

You'll still have to write 100 SET predicates but at least you won't have the nasty job of matching question marks.

-Jamie

September 20, 2007 8:56 PM
 

Michael Kramer said:

THANK YOU!  I knew I was missing something simple!  Things are working much better now.

-Michael

September 21, 2007 8:48 PM
 

Ravish said:

Guys.. I am new to SSIS... I tried understanding this package.. and ashamed to ask this simple question... Is customerlookup data destination or source... I mean from where this package will read data and against whom it will validate data and finaaly where it will modify data.. Please help...

September 28, 2007 12:11 PM
 

jamie.thomson said:

Ravish,

Entirely depends on where the lookup data is. its impossible for anyone else to answer this for you.

-Jamie

September 28, 2007 10:54 PM
 

Bob said:

Jamie,

Excellent info.  I have a question, though.  How do you handle a check for a change after the age check?  Let's say you want to check if the name changed as well.

Thanks,

- Bob

October 4, 2007 3:24 AM
 

jamie.thomson said:

Bob,

Put an extra condition in the filter expression. Seperate them with the OR operator.

-Jamie

October 4, 2007 4:53 AM
 

Ranz said:

I cannot use lookup with the column type float (double-precision  float DT_R8)

The lookup has a red X on it and it said "input column "columnName" (xxxxx) has a datatype which cannot be joined on.

I checked datatype of input the 2 tables(which I want to lookup them for conflict rows) and they are the same type DT_R8.

Any suggestion please?

October 31, 2007 8:10 AM
 

7 said:

I cant believe it is so difficult to insert or update data with SSIS ,I have sold a company SQL2005 on the basis that it is easy to do.

guess who looks like a complete and utter dodo now !!!!!!!!!

November 21, 2007 5:15 PM
 

jamie.thomson said:

Hi 7,

What exactly are you finding difficult? Perhaps someone here can help.

-Jamie

November 21, 2007 5:36 PM
 

DF said:

Hi Jamie,

How can I identifying the updates using SQL Task.

I think BINARY_CHECKSUM is not the correct way to doing this. Right?

If im using SQL Task for updates, Can I use the same thing for Insert? OR better to follow ur approch?

Cheers

DF

November 22, 2007 11:04 AM
 

jamie.thomson said:

DF,

I can't really answer that question. Its entirely up to you and your circumstances to determine how you should solve it.

Solving it in T-SQL is pretty easy:

update t1

set t1.attribute_col = t2.attribute_col

from t1

inner join t2

on t1.identifying_col = t2.identifying_col

where t1.attribute_col <> t2.attribute_col

Something like that anyway.

-Jamie

November 22, 2007 5:07 PM
 

Neil Qiu said:

Could you show me how to configure the lookup error output? And how to do the inserting? In my case, suppose I lookup on column named "column1", and it could correctly route to the error output when a row doesn't exist on destination table. However the inserted row in DB is with null value for column "column1". I don't know what's wrong. Any help on this?

mail: niel_qiu@msn.com

November 29, 2007 3:40 AM
 

Free.spirit.on.air said:

Hi Jamie,

In a lookup operation,if we have more than one value in the reference table corresponding to the lookup column,then does it return the combination of all the values OR does it stop retrieving after the first value is encountered.

For eg: In the above blog there are two records corresponding to Susan

Susan 26

Susan 29

What if there is a record with values   "Susan 30" coming as input and a lookup is done on the reference table with customer name as lookup column and age is retrieved from the reference table.

What will be the output for the lookup??

I tried it for a similar scenario and am getting output as

Susan 30 26

Only one record is returned.

December 24, 2007 12:19 PM
 

jamie.thomson said:

That is expected behaviour, only on will be returned.

If you want both then you're looking at a Merge Join.

-Jamie

December 28, 2007 4:53 PM
 

Ashish Sinha said:

Jamie,

How can I update a row in source in the same data flow? Example - Source is table A, which has 'Id' and 'clubId' as fields. I pass  'clubId' field through derived column transformation and want to update 'clubId' back in the source A for the same 'Id'. How can I do that in the same data flow?

January 26, 2008 8:19 PM
 

jamie.thomson said:

Use the OLE DB Command component. Mind you, if this is all going on in the same database, why not just use the Execute SQL Task?

-Jamie

January 26, 2008 8:25 PM
 

Ashish Sinha said:

1. Is there any other way apart from using Execute SQL Task and OLE DB Command?

2. I have never used OLE DB Command Component. Could you please give me some pointers?

January 26, 2008 8:53 PM
 

JK said:

Is there any way to only pull out the key identifier along with the field that is different instead of the entire record that is different?

Thanks for your help!

February 21, 2008 12:21 AM
 

jamie.thomson said:

JK,

I'm not sure what you mean. What do you mean by "pull out"? What are you going to do with the key identifier column and the differentiating column(s) that you are not going to do with all the others?

-Jamie

February 21, 2008 2:11 PM
 

Keith Lewis said:

Jamie,

What about rows that got deleted from the source tables? Is there a better way of doing it other than reversing the source and Lookups and going through all the rows to see what doesnt match and delete?

I am using this type of solution to update a datawarehouse and this problem keeps catching me. A user will input a temporary order into the system and my dtsx package will pick it up and bring it over to the warehouse. Then the user deletes the order and there is nothing in my package to check for now non-existant rows within the datawarehouse.

Thanks,

Keith

February 29, 2008 2:56 PM
 

jamie.thomson said:

Keith,

There's no "one size fits all" answer to this because if a row in the destination is not in the pipeline data then there could be a number of reasons, most likely one of two:

1) The row has disappeared

2) You're only fetching deltas from the source.

If 1) is true then yeah, you basically have to reverse the process and find everything that's in the destination that isn't in the source.

-Jamie

February 29, 2008 3:07 PM
 

MOin said:

Hi,

Jamie,

 I am new to SSIS can u help me out in this i have a source(Flat File) and target SQL Server 2005. Source has got 2 columns- Col1 and Col2 even target has got 2 col's Col1  being the PK. I have created a package that checks if target and if the record exist it updates and if it does not it inserts. My package looks like

Source - lookup on target- Conditional split- Derived Column and then 2 OLE DB Destinations 1 for inserts and 1 for updates.

I have created a relationship in lookup with col1 from source and col1 from target and col2 as lookup col and connected red output to 1 OLE DB for inserts and redirect rows to it. Green out put i have taken to conditional split and gave condition like Col2 from source is not equal to lookup col2. I run the package with this it is inserting new records but not updating it.

I tried to add derived column after conditional split but lacks in writing expression that says update col2 records if they changed at source. Can u help me out in this scenario. I would appreciate if you could get back to me ASAP.

Thanks  

April 24, 2008 5:11 PM
 

jamie.thomson said:

MOin,

You'll need an OLE DB Command compoennt in order to do the UPDATE. Do you have one of those?

-Jamie

April 25, 2008 10:32 AM
 

MOin said:

Jamie,

 No i don't have but if i place that it will slow down the performance and as it do updates row by row and where should i place before OLE DB destination as my package consists of

source(ff)- Lookup(target)- conditional split- derived column- OLE DB destination. Please get back to me ASAP.

Thanks

April 25, 2008 2:52 PM
 

jamie.thomson said:

Moin,

There are 2 ways to do updates in SSIS:

1. Use an OLE DB Component in the dataflow. This is slow because, as you correctly observe, it does it row-by-row

2. Push the data into a temporary table (by temporary I mean one that exists purely for this purpose - it doesn't necassarily mean #table or ##table) and use Execute SQL Task to issue a set-based UPDATE. This will be much quicker for larger volumes but suffers from added complexity.

-Jamie

April 25, 2008 3:01 PM
 

Moin said:

Jamie,

 I used OLE DB Command with update table set col2 = src.col2 where col1=src.col1 and in the control flow and i added execute sql task and issued the same command also i added data viewers to see the data movement everything is working fine & it says success but no rows have been updated. Any suggestions why is it not updating the records.

Thanks

April 25, 2008 4:40 PM
 

jamie.thomson said:

Not really, no. Try using SQL profiler to capture the SQL statement that gets issued.

Not really sure why you have OLE DB Command AND Execute SQL Task. It should be one or the other.

-Jamie

April 25, 2008 4:48 PM
 

MOin said:

Jamie,

  I got it using Lookup, Conditional Split, OLE DB and execute SQL task in control flow. Thanks a lot for sharing information.

April 25, 2008 9:10 PM
 

Mukunda said:

Hi jamie

Now we converting the dts packages to SSIS packages. but i am facing a problem when reading data from the flat file. The problem is I have n number of rows in my flat file and some of the rows is missing one column delimeter but its working fine in dts packages with same flat file. but in SSIS its not. The row which is having one less column delimeter will taking next row also. I think row delimeter will work only when all the column delimeter should present. give some suggestion regarding the same. Thanks in Advance

Mukunda

April 30, 2008 10:30 AM
 

jamie.thomson said:

Mukunda,

You've stumbled on what has probably been the biggest complain from people migrating from DTS to SSIS. SSIS is much stricter about the format of files that it will accept - it expects column delimiters where you tell it there are going to be column delimiters.

Take a read here to learn more and to find how you handle this situation in SSIS:

Flat File Connection Manager issues

(http://blogs.conchango.com/jamiethomson/archive/2007/05/15/SSIS_3A00_--Flat-File-Connection-Manager-issues.aspx)

-Jamie

-Jamie

April 30, 2008 10:53 AM
 

TimB83 said:

Hello,

I use MS SQL Server 2005 for a long time, but I never used SSIS before. So since some days I try to use SSIS for transfer of data and it works very very well. But in my source database are some datasets which are existing in my destination database. So SSIS couldn't insert the datasets because there is a Primary Key constraint. SSIS should check if the datasets are existing:

- if they are existing and they are exactly the same, then they should be ignored

- if they are existing and the data is changed, then it should be changed/ updated in the destination database

- if the dataset isn't existing, it should be inserted in the destination database

So I saw you example but my problem is to create this workflow. So I don't know which settings I should do and what I should do exatcly in each menu.

Do you know if there is anywhere a little tutorial or a help for this problem? Well I searched for websites, forums, books, pdf-files and so one - but nowhere was a really helpful example for me :-(

I hope you can give me a hint?

Best regards,

Tim

May 6, 2008 3:10 PM
 

jamie.thomson said:

Tim,

Everything you need is linked to above. Including an example package.

-Jamie

May 6, 2008 5:01 PM
 

Bruno Gomes said:

Jamie,

Please consider changing your scenario like this: maintain Names and Ages but add Genders. And now consider that you only want to update the Age (in those cases where it has changed) if the Gender of that record is the same of the previous record where an Age update was performed.

Something like this:

SRC TABLE

KEVIN | 23 | M

MIKE | 24 | M

SUSAN | 29 | F

LKP TABLE

KEVIN | 22

MIKE | 23

SUSAN | 28

We want to update KEVIN's Age from 22 to 23 and MIKE's from 23 to 24 (cos MIKE's Gender is the same as KEVIN's); but we don't want to update SUSAN's Age (although it has changed) because SUSAN's gender is not the same as KEVIN's.

Brgds,

BG

May 8, 2008 2:03 PM
 

PK said:

Hi Jamie thanks your site is great I have learned a lot from you.

my source table contains 3 records

1    PATRICK  25

2    JANE 30

3    PATRICK  25

While the first time the package is running it's inserting PATRICK 2 time in the dest table.

how can i avoid this?

May 20, 2008 2:32 PM
 

Alejandro (gustavo.a.martinez@gmail.com) said:

Jamie,

Me it has been quite useful east article, I help myself to include/understand better the SSIS. The problem that I have at the moment is that I made the update of my local base of data otherwise. " generates a variable; MaxFecha" where I obtain the value maximum of a field date of my base (local in SQL), soon in query SQL that use for the base of which under the data (Oracle 10g) I include in where that brings all the fields when a field (date) is major that this variable. The problem is that when trying to mapear the parameter in origin OLEDB it throws an error to me of call to " SetParameterInfo" , that according to I saw in the MS site is an error that brings with MS OLE DB, something that nonuse, since I am using Oracle for OLE DB… _Me you will be able to give a hand with this? It pardons the language, I am using a tradutor one half (babelfish). Thank you very much!

June 17, 2008 5:10 PM
 

Alejandro (gustavo.a.martinez@gmail.com) said:

I add in addition that the origin table has to the date 1300000 registries, which makes very ineffective when having update it like minimum 2 to 3 times per day. I only need that it adds the last registries that were added, that is to say, those that they have in a field of type date and hour a superior to which I have in table local (I recognize the new ones there)

Tnks!

June 17, 2008 9:40 PM
 

Narayan Pavgi said:

I do not see this solution to be scalable at all in the big bad real world out there. I have to perform a Add-if-Update-else on a table with 25 tables and 250 columns in all (or say 50 with 500). And I do not see myself adding Lookups + Conditions using an advanced tool like SSIS - I am surprised an "ETL" solution does not have a simpler solution to this problem. I think that has led to so many folks asking for the solution. Call for an effecient drag-and-drop transform.

June 29, 2008 5:34 AM
 

jamie.thomson said:

Narayan,

Perhaps you could outline how it could be improved and then submit it to http://connect.microsoft.com/sqlserver/feedback

Could you explain the "problem" as you see it?

-Jamie

June 30, 2008 10:13 AM
 

ssis check table for row said:

July 10, 2008 3:07 PM
 

parke said:

thank youu

September 4, 2008 2:30 PM
 

rüya tabiri said:

thank youu

September 4, 2008 2:31 PM
 

Gregg Robertson's Blog said:

In early August I signed-up with Adcenter Analytics , a free service that can provide analytical information

October 29, 2008 1:25 PM
 

SSIS Junkie said:

In early August I signed-up with Adcenter Analytics , a free service that can provide information about

October 30, 2008 9:16 AM
 

Merrill Aldrich said:

I see and grapple with this question a lot - how to incrementally merge a small source file into a large destination table, using upsert, where the data is all in nullable columns (read "from Excel"). I like and use this general solution, but the beastly part is that the tables always have MANY columns, not one or two. "Bob" asked exactly this question above. The hard/time consuming bit is composing the expression in the conditional split to compare a long list of nullable columns. I generally opt to write an expression that finds the identical rows, which can be discarded. The default output is then the rows with differences, which go to the OLEDB update command. Posting a sample of the expression next...

November 21, 2008 11:10 PM
 

Merrill Aldrich said:

OK, so the expression to compare nullable columns requires combining the isnull() function with the basic column value to test both possibilities. For this, an "inline if" works. For example, if you have a string datatype, and null or empty string is equivalent, then you can do

(ISNULL(existingVal) ? "" : existingVal)

This will substitute the empty string for any nulls

Once the null is replaced, you can do a comparison

(ISNULL(existingVal) ? "" : existingVal) == (ISNULL(newVal) ? "" : newVal)

This will compare the results, which are now null-free

November 21, 2008 11:16 PM
 

Merrill Aldrich said:

Tricky bit: the magic value you substitute for null (like the empty string above) has to be the same data type as the column. For example:

(ISNULL(existingDATE) ? (DT_DBDATE)"1900-01-01" : existingDATE) == (ISNULL(newDATE) ? (DT_DBDATE)"1900-01-01" : newDATE)

Same idea for numbers, etc.

November 21, 2008 11:18 PM
 

Actualizar una tabla con novedades. | hilpers said:

January 20, 2009 7:41 PM
 

If record exists, then update, else, insert record | SSISBI.com said:

July 4, 2009 8:08 AM
 

SSIS Junkie said:

In October 2004 I was in Orlando airport returning home from the annual SQL PASS summit and I happened

August 29, 2009 11:30 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems