Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS: Data flows or T-SQL

Update below

There has been, and still is, quite alot of conjecture in Conchango as to whether one should use SSIS data-flows or stored procedures for ETL. Traditionally we have implemented ETL solutions using stored procedures. DTS was used but as little more than a workflow engine, occasionally using the data-pump for importing data from files. I dare say that many people took this approach with DTS.

Now that SSIS has come along you have much more powerful data movement engine than the DTS data pump and also alot more inherent functionality. But is that compelling enough to stop using T-SQL stored procedures and only use SSIS as a workflow engine? Well here's a comparison of the two approaches. These are only my opinions and I dare say I've forgotten some important things but I think I've raised some salient points:

Advantages of using Stored Procedures

  • A SQL statement will outperform a SSIS data-flow when the data transform is table-to-table on the same server
  • No new learning curve as ETL developers will already know how to write SQL code. SSIS data-flows are a completely new technology.
  • Utilise transactions within the database rather than use MSDTC
  • Easier to generate stored procedures from known metadata than it is with a data-flow (at the time of writing it is anyway)

Advantages of using a SSIS data-flow

  • Handle data from heterogenous sources in the same place
  • Consume data from sources that can't be accessed using a SQL statement
  • Data can be transformed without the need for an intermediate staging area
  • If data does need to be persisted temporarily it can be to a raw file whereas T-SQL requires temporary tables for which there may be a management/security overhead
  • Extra transformation functionality (e.g. Fuzzy logic, data mining, text mining, insert to Analysis Services)
  • Visual representation of the "work"
  • Bad data can be captured to a different data sink for examination later
  • Exception Handling
  • Use .Net Data providers as an alternative to OLE DB Providers (e.g. mySAP Business Suite .Net Data Provider)
  • Data-flows are, to some extent, self-documenting
  • User-controlled parallel execution of data-flows is possible where it isn't really in the inherent batch operated world of stored procedures
  • "Heavy-lifting" of data can occur on a different to machine to that storing the data (thanks to John in the comments section)
  • By default, events containing very pertinent information such as "component "<component name>" (5824)" wrote 2398156 rows" are raised

 

These are essentially feature lists and should all be considered. Remember though that in situations with complex transformations requirements you are more likely to "hit the wall" with T-SQL because there are things that it simply cannot do. For simpler implementations the lines become more blurred and you have a choice to make.

What is clear is that there is no "one size fits all". If you have to make a decision about which approach to take then consider these points and decide which approach best suits your requirements.

And let me know what approach you prefer. I'm interested to know.

-Jamie

UPDATE, 9th April 2006: I've just come across a very useful webcast that explains how to integrate hand-coded ETL scripts into SSIS. It is here: http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032276873&EventCategory=5&culture=en-US&CountryCode=US. What makes this relevant to this blog post is that the use of T-SQL and SSIS components is compared in place so will help you to make a rational judgement on which method you should go for.

 

Published Tuesday, March 14, 2006 11:26 PM by jamie.thomson

Comments

 

Roman Podlinov said:

Hello! I found out your blog extremely useful for me. Thank you very much. But I have a question. Could you answer me please? What is better to use SSIS package, Replication or T-SQL if I need a copy of table (actually, only some columns) on the other linked server; I'm going to update the copied table requrently (e.g. every 15 minutes) . The copied data will be used only for read (for join), but the table with copied data must be available for the queries all the time. Actually, I have a several such tables with 100K rows each. Which way do you prefer?
March 15, 2006 12:57 PM
 

John said:

How about the performance impact on the source server, since the StoredProc's have to do all the work on the server itself, while the data flow can be pulling data (async, as per your last post) and doing the heavy lifting on a different machine?



March 15, 2006 5:35 PM
 

plafromboise said:

While I agree that the learning curve for SSIS can be steep, I think the benefits of using SSIS in the ETL far out weigh the few benefits that you listed for stored procedures. So, personally, I have to vote for SSIS.
March 16, 2006 8:22 PM
 

Roman Podlinov said:

>how about the performance impact on the source server
Yes, but I can pull data from the destination server. Actually, I keep an open mind about; I want to find the best solution for my task. :)
>So, personally, I have to vote for SSIS.
Thank you very much. I've selected SSIS. In any case, it will be useful to study this service.
Regards,
Roman

March 17, 2006 6:22 AM
 

Marco Russo said:

I disagree with some of your points. While SSIS can eliminate a staging area, often this is a bad idea: we tried to use the no-staging approach importing a lot of XML log files and at the end we decided that it was a bad idea. Error handling, debugging and complex transformations are a lot easier when you have the source table in a relational store.
SCD components are another gray-area of SSIS, we improved performance eliminating SCD components with our custom-made components and/or scripts.
But at the end we definitely like SSIS against stored-procedure transformation: SSIS are a lot easier to manage when they become complex, and documentation is a lot simpler with SSIS graphical paths; if you see ETL development as a thing that needs maintenance after months or years, you definitely will appreciate SSIS visual representation of the "work".
March 19, 2006 10:08 PM
 

jamie.thomson said:

Hi Marco,
Thanks for your comments.

Allow me to just clarify one thing. I'm not making any recommendations here, nor am I advocating a particular methodology or technology. The intention was to give people food for thought when considering how to produce a solution to a problem. If I have misled in this regard then I apologise, perhaps I'll address the wording of the post.

In regard to your specific example, I would never advocate a carte blanche approach of eliminating a staging area. Indeed, I expect that in most circumstances a staging area most certainly WOULD be necassary. Note the use of the word "can" in that bullet point.

If you have any experiences to share re: XML files, SCDs, ETL methodologies etc... in regard to SSIS then I would love to read about them on your blog (which I, of course, have got bookmarked).

Regards
Jamie
March 20, 2006 9:16 AM
 

Marco Russo said:

Hi Jamie,

I understand you were not recommending, my use of english could be not very appopriate sometimes... You don't have to excuse, don't worry!

I'm currently involved in several projects and I'm dedicating my spare time to a document about SSAS modelling techniques based on many-to-many relationship. I will try to post more, I promise :-) , we had some experience with processing large XML files that worths to share.

Hi.

Marco
March 20, 2006 6:33 PM
 

ken ambrose said:

I didn't see listed on the SSIS side what I consider to be TWO huge benefits of using SSIS vs. T-sql: Structured Exception Handling, and Free Auditing/Logging.

Granted if you change all your SPs to .net code as you now can in 2005, you do get the structured exception handling- but then again, that is not t-sql...

March 31, 2006 11:04 PM
 

jamie.thomson said:

Hi Ken,
Great points. I added the Exception Handling to the list but not auditing/logging. That's more an advantage of SSIS itself rather than the data-flow -v- T-SQL!

-Jamie
April 1, 2006 10:56 AM
 

Dev said:

Hi Jamie

I have to do a data migration task from an old db to a new db.The schema of the new db is different. Eg a table in the old db that had a primary key (say id) is now split into 2 new tables tbl1 and tbl2. Also the old table had say 10 columns which need to be split among the 2 new tables (master and some child tables with id as a reference key.) Also some columns in the old table are no longer required in the new tables and some new columns are introduced in the new tables.

I have read up on SSIS and i feel that my task does not require much data transformation. I just need to insert data into the new database.

I dont find any particular task in SSIS that will help me significantly as opposed to write an SP or script and include in it several insert statements.

It would be really helpful if u can guide me whether i should be using script or use the SSIS for this task.

If you recommend using SSIS, then what are the tasks i should be using (eg.dataflow task or Execute SQL task)

The databases are on two different servers and this migration will be a one time activity.

Thanks,

Dev

May 30, 2007 1:48 PM
 

jamie.thomson said:

Dev,

If this is a one-time task and the databases reside on the same server then I would probbaly just write some SQl to do it.

-Jamie

May 30, 2007 7:40 PM
 

Dev said:

Hi Jamie

Thanks for the reply.

May 31, 2007 2:50 PM
 

Jack said:

Nice post. Thanks.

June 6, 2007 1:05 PM
 

Garry said:

Really important.

June 6, 2007 2:45 PM
 

Harold said:

How about a case where it will be better to use a stored procedure as a source for a Data flow. I do not find a way to do this with SSIS. Any ideas.

Thanks

June 26, 2007 10:24 PM
 

jamie.thomson said:

June 26, 2007 10:56 PM
 

awilkes said:

Not being a DTS dweeb at the start of my BIS project, I used VB.Net to run a bunch of stored procedures. This turned out to be a good thing when I had to put a variable in the FROM statement of several procedures. I just built the procedures from inside VB, passing in the variable I needed.

I've read elsewhere about ETL systems in general. The general problem is that they handle generic problems well but specific, painful, beat your brains out problems not so well.

I'm going to use SSIS for my incremental loads. I'm looking to use the SPs from the VB loader. That's a quick and easy solution. I'm not sure how much of the SP code will be converted along the way.

July 16, 2007 6:47 PM
 

Saad Ahmad said:

1. The performance issue is a major one.  I was involved in a project where we were pulling data from AS/400 to SQL-2005.  The data was to be fetched into a staging area first.  SSIS job never finished but the stored procedure finished in 15 minutes.  The basic reason being that SSIS fetches data then transforms  whereas using T-SQL you can do select * into from <source> - and that will out perform SSIS greatly.  This will remain the case until SSIS is optimized to look at the logic and translate it into direct SQL calls.

2. Multiple data sources - Here too I find standard SQL approach of using linked servers much easier.  With SSIS data flow tasks the connection information was embedded in the task and that causes issues when moving stuff from dev to prod.  With linked servers that is an outside entity and the movement between environments was easier.

3. I have not tried it yet, but SQL-2005 allows stored procedures to be written using .NET languages as well.

SSIS is definitely a good thing as long as the user employs it properly.  Its sequencing and documentation features are great but if someone thinks that they can do a real data warehousing project without needing SQL/T-SQL - they would be disappointed.

August 15, 2007 3:46 PM
 

jamie.thomson said:

Saad ,

Thanks for your comments. Very interesting, tho I must take issue witha  couple of things.

1) If the SSIS job never finished then this points to another problem. Once you have solved tht problem then you can compare performance. If all you are doing is staging data then yes, there is no reason why SSIs would be quicker but it is not right to base a premise that "T-SQL is quicker" on this simple scenario. T-SQL will be a quicker option in many many scenarios, it is up to the developer to find out if it is optimal.

2) I certainly can't argue with your opinion about using Linked Servers. I personally prefer to use data-flows but of course its a matter of opinion. I do disagree though that connection info is embedded in the task. Firstly, it is in the connection manager. Secondly, if you are using configurations (which you ALWAYS should be) then the connection info is taken outside of the package and moving through environments is a breeze. Again, just my opinion which anyone rading can take or leave.

3) Very true.

Thanks again.

-Jamie

August 15, 2007 4:23 PM
 

Data_Guy said:

Hi Jamie,

How about implementing a cursor inside a data flow, is that possible? I would like to do arow by row processing and based on conditions update or insert into tables. How would I go about doing that? Also I personally I feel that if u have to implement case logic in ur transformations it would be easier to do that in TSQL rather than in data flows. But the above comments are really helpful.

Thanks

August 23, 2007 3:06 PM
 

jamie.thomson said:

Data_Guy,

I suppose the most analogous thing to what you're talking about would be to use an asynchronous script component to loop over the rows.

Another option would be to load the recordset into a recordset destination and then loop over it using a ForEach loop. That's a really powerful concept in SSIS and there's plenty of examples around of doing it (including on my blog).

Why do you think case logic is easier in T-SQL? I'm not disputing your view, just interested to know that's all.

-Jamie

August 23, 2007 4:30 PM
 

Data_Guy said:

Hi Jamie,

I once tried implementing case logic using SSIS, sometime back, but didnt find any transformations within data flow that would do it, finally I had to give up and ended with using Tsql to do that, may be there is a way but im not aware of it. would be helpful if you can let me know.

Thanks

August 23, 2007 5:30 PM
 

jamie.thomson said:

Data_Guy,

The Conditional Split component or Derived Column component both provide the ability to implement case logic.

Derived Column component can introduce a new column with a new value in it based on conditional logic. Alternatively it can change values in an incoming value based on some conditional logic.

Conditional Split component will take one incoming data path and split it into many data paths. You use conditional logic to determine which data path each row goes down.

-Jamie

August 23, 2007 8:42 PM
 

Data_Guy said:

Hi Jamie,

I havent tried that yet, will do so soon, one question I had was in the Derived Column or Conditional Split tranformation I dont see any expression for case, will I have to use something else instead of case? how do I go about doing that, can you please point me to an example , im sure there must be something that I can use, thanks for your input.

Thanks

August 23, 2007 10:06 PM
 

jamie.thomson said:

Data_Guy,

The operator you're after is the 'conditional operator' (trust me, its there). To get the equivalent of a case statement you just need to nest them.

-Jamie

August 24, 2007 12:16 AM
 

Data_Guy said:

Hi Jamie,

Thanks for the input, I tried it and it works great, may be the only reason I never tried it before was that I was just being lazy or just ignorant, I think this is really kool and thanks for the enlightenment :) .

Thanks

August 24, 2007 2:53 PM
 

Martin Smith said:

Anyone know if I'm missing something on the dataflow task here?

I've got a source csv file with no header row and around 150 columns to go into a SQL database. The SQL database has all the datatypes properly defined and the column order is in the same order as the csv columns.

I had 2 major annoyances whilst setting this up in a dataflow task.

1) I could not see any way of mapping the csv columns to the database columns by index. I had to do it all manually by pressing the down key and the enter key 150 times in the mappings box.

2) I had loads of validation errors where the inferred datatypes for the CSV flat file connection manager did not convert to the mapped SQL datatype. What I wanted at this point was a way of setting up the CSV datatypes from the mapped SQL data types but I could not see a way of doing this and again had to do it manually.

Could I have avoided the manual approach and done either of these things automatically?

September 22, 2007 10:51 PM
 

Martin Smith said:

This query helped save my sanity and make things a bit less tedious when manually setting up field lengths where the inferred field length was shorter than I actually wanted to be allowed. I found the whole user experience of setting up data flows unbelievably poor. Please tell me that I am missing something here.

declare @tBLaLL TABLE

(

RANK INT IDENTITY (0,1),

length INT,

xtype INT

)

INSERT INTO @tBLaLL

(LENGTH, xtype)

SELECT length,xtype  FROM syscolumns

WHERE Object_name(id) = 'Fidessa_TRADING_POSITIONS'

order by colorder

select * from @tBLaLL where xtype = 167 and length>0  

order by length, rank

September 29, 2007 4:23 PM
 

jamie.thomson said:

Hi Martin,

I'm confused. Where are you running that query and what has it got to do with setting up a dataflow?

If the component you are using incorrectly guesses at field lengths (and remember, it is only a guess so its perfectly plausible that it may be wrong) then simple change the column lengths within the component.

-Jamie

September 29, 2007 4:36 PM
 

Emrys said:

"Secondly, if you are using configurations (which you ALWAYS should be) then the connection info is taken outside of the package and moving through environments is a breeze."

This is surely some kind of joke. Although package configurations are much better than trying to alter all your configuration by hand when deploying to a new environment, there's no way anyone could reasonably call them 'a breeze', given that there a nightmare to get working, often stop being loaded for no apparent reason when you deploy to a new machine, and more importantly don't really cope well with the (fairly common) situation where you have multiple environments on the same machine. (try setting up an indirect configuration when the only thing you have available to indirect with is a machine wide property(i.e. environment variable or file in known location). It can all be dealt with sure, but only after a lot of head-wall interaction.

October 1, 2007 12:30 PM
 

jamie.thomson said:

No, its no joke. I can only talk about my own experiences and I've never had any difficulty using configurations (that I can remember anyway).

I have NEVER seen them "stop being loaded on a new machine".

Multiple envronments on the same machine is definitely a problem that's true. I never run environments like this for that very reason.

-Jamie

October 1, 2007 1:58 PM
 

Martin Smith said:

Hi Jamie,

Yes the component is incorrectly guessing field lengths and data types.

As this is being imported into a SQL database and the columns are mapped what I wanted was a way to set the flat file datatypes to correspond with the SQL datatypes according to the mapped columns rather than having to set up the whole thing manually.  

I found setting up the String Lengths in the flatfile connection manager mind numblingly tedious to do when dealing with large numbers of columns but at least the above query allowed me to order the DT_STRING columns so those with the same length were next to each other and I could select them all in the UI using the CTRL key.

Now I see that I have another problem though where numbers are being rounded due to incorrectly guessed data types so it looks like I'm going to need to manually review every numeric datatype as well in 10 different CSV files with up to 150 columns. *sigh*

October 1, 2007 2:10 PM
 

Sumit said:

Hi Jamie,

Does the scenario (Data Flow vs T-SQL) change if you are on a 64-bit platform (both OS and SQL Server) and you have 8GB RAM (with a possibility of adding more RAM if need be)?

I suppose Data Flows use memory as against T-SQL using the physical disk and accessing memory would always be faster than accessing physical disk.

Thanks

Sumit

October 9, 2007 4:18 PM
 

jamie.thomson said:

Sumit,

That's entirely up to you to decide. Really all I'm doing here is pointing out some things to consider. I'm not recommending you go with one approach or the other.

-Jamie

October 9, 2007 4:23 PM
 

Sundar said:

Hi Jammie,

My thoughts which might bring smiles to TSQL Lovers Vs SSIS.

you mentioned.

Advantages of using a SSIS data-flow

Handle data from heterogenous sources in the same place

You could do this with TSQL too with SQLCMD(New built in utility from 2005.)

Consume data from sources that can't be accessed using a SQL statement

There is nothing like this.if SSIS can access, TSQL can also access it. just matter of knowing the proper linked server kind of aprocach or DOS commandline utility and parameters so that u can accesss it with combination of SQLCMD again

Data can be transformed without the need for an intermediate staging area

I think people have already discussed about his.not a good idea all the times. just to add to that, you could use table variables & Temporary tables to get the same effect of avoiding Staging

If data does need to be persisted temporarily it can be to a raw file whereas T-SQL requires temporary tables for which there may be a management/security overhead

i dont know how SSIS does this, but i guess it uses table variables only to get it but then question is why microsoft not lifting out the concept of temporary tables altogether when they can not server our needs in any case by becoming over head.

i think temporary tables are very useful if u use them properly. u can put indexes on them to get performance.

just open thought.

Extra transformation functionality (e.g. Fuzzy logic, data mining, text mining, insert to Analysis Services)

This is just buildup.no body would do this in automation.but definitely helpful just during development but not in once gets into production.

Visual representation of the "work"

100% agree

Bad data can be captured to a different data sink for examination later

I never tried this.but good thought but also could be done in TSQL.

Exception Handling

TSQL also has exception handling.try/catch blocks

Use .Net Data providers as an alternative to OLE DB Providers (e.g. mySAP Business Suite .Net Data Provider)

just verify how SQLCMD works if on OLEDB or .net providers to connect to heterogeneous data sources.

Data-flows are, to some extent, self-documenting

good agreed.but TSQL is easy to manage technically.

User-controlled parallel execution of data-flows is possible where it isn't really in the inherent batch operated world of stored procedures

not to object, but parallel processing /batch procesing are trade offs.depending on situations most of the time batch processing is realiable.

"Heavy-lifting" of data can occur on a different to machine to that storing the data (thanks to John in the comments section)

i did not understand this to comment.

Cheers

Sundar

November 14, 2007 9:08 PM
 

Dmitry said:

This post was made in 2006... Its Jan 2008 and two people next to me are talking about this topic right this second :).

I find SSIS to be a good way of doing data imports but like anything else its a tool and many approaches might work. The best approach is always arguable and depends on the details.

What I don't like is switching technologies in the middle of a project when the differences are arguable details.

January 17, 2008 4:59 PM
 

djedgar said:

TSQL  rules !

Change Management – it’s not easy to check  impact changes with SSIS implementation. In very complex project SSIS (many packages) and even well documented SSIS project is hard to recognize all dependent  tasks. In TSQL I only need query dictionary table to find potential objects, it’s beautiful !

February 4, 2008 3:16 PM
 

vivekananda said:

Hi,

If possible can anyone please provide me some idea on this.

I have an Oracle 8i DataSource. And I have to migrate data from some 100 tables with some containing around 1 million & some with 1 billion records to an SQL Server 2005 database. So how should I proceed with this. Whether using a Linked Server and some T-SQL statements would solve this, or an SSIS package would be more useful.

Please provide me the details of how the usage will increase my performance and minimize time.

April 24, 2008 1:26 PM
 

jamie.thomson said:

vivek,

Really the only person that can tell you is you. Test and measure is the only advice I can give.

-Jamie

April 24, 2008 2:21 PM
 

IanA said:

One question - does SSIS need linked databases to enable the data transfer directly between one SQL2005 database and another?  We are trying to migrate PROD data back to TEST and the DBAs wont allow linked databases outside PROD. We had written T_SQL to do the migration but relies on linked DB.

May 7, 2008 3:49 PM
 

jamie.thomson said:

IanA,

No, SSIS doesn't require you to define linked servers (I presume that's what you mean).

-Jamie

May 7, 2008 3:56 PM
 

Malcolm Waring said:

After facing the task of setting up an SSIS project for 350 pipe delimited flat files, I may have found a way to make setting up Flat File Connections slightly less painful.

Start by doing it in reverse by creating an OLE DB Source on the SQL table and a creating a Flat File Destination.  That way the Flat File Connection will be based on the table instead of on a guess at the file contents (and setting all the strings to 50!).

So just save your Flat File connections and use them with your real Flat File Sources.  Throw away the rest.

You still may have to tweak a few things but it's much faster.

June 16, 2008 9:59 PM
 

Dee said:

I have to import some data from oracle to SQL and i know i can do this using ssis. Once the data is imported I need to do some cleanup of the data. Currently i do it using a storedproc which has a bunch of update statements. I know i can execute a stored procedure from ssis. So which method is better / faster ? 1) execute the stored proc from ssis after importing data. 2) in the dataflow task, do some transformation to clean up the data?

July 7, 2008 9:12 PM
 

jamie.thomson said:

Dee,

The answer, as with so many things in this business, is "it depends". Its impossible to say which will be faster without knowing things like data volumes and the actualy operations that you will be doing.

My best answer is to try it and find out. test and measure test and measure test and measure!!!

-Jamie

July 8, 2008 9:36 AM
 

Dan_B said:

Jamie,

First off thanks for this excellent source of SSIS info you've created, its helped me loads over the past few months when coming to learn more about SSIS.

I do however, have a question for you regarding consuming XML data and importing to a SQL2005 database. I have no problem importing xml data into SQL2005 using the XML Source adapter and OLE DB Destination Data flow tasks, however how do you maintain the relational state of the DB when certain fields (such as titles) are the same?

With T-SQL you are able to parametise SPs to import data without a problem, however, i'm not sure how I should be doing this regarding XML data. I understand this maybe a question for another blog you have but after reading this one it seemed most appropriate to me.

Thanks,

Dan

November 18, 2008 11:11 AM
 

jamie.thomson said:

Dan,

Apologies but I don't really know what you mean by "how do you maintain the relational state of the DB when certain fields (such as titles) are the same? "

Perhaps you could give me a "for instance..."

-Jamie

November 18, 2008 11:43 AM
 

Dan_B said:

Jamie,

Apologies, I wasn't initially clear re-reading my post. Thanks for the quick response.

Basically, there is a lot of repeated data to consume when importing the XML data to SQL as we are attempting to grab articles of news that may well have the same title, the same source URL etc time and time again. For instance 'Yahoo! Japan' has been supplied as the original source over 844,000 times but only exists in the source table once with an 'appearance' count of ~844,000 to keep entity integrity and avoid duplicates.

In the previous .net Applications/SQL DB T-SQL was carried out to check if the record of data already existed in the DB before it was attempted to be inserted, and if it did exist the appearance count and other fields would be updated accordingly. But I am unsure of how to do this in SSIS? I understand you can parametise the import using the SQL Command in the OLE DB Destination task, but how do you make sure it references the correct fields from the XML Source Adapter? Is there a better way to be doing this?

Thanks again,

Dan

November 18, 2008 11:56 AM
 

jamie.thomson said:

Dan,

So basically you want to identify duplicates, right? Moreover, you want to know if an incoming value already exists in the destination?

Take a read of this:

http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx

-Jamie

November 18, 2008 12:03 PM
 

Dan_B said:

Jamie,

Its not really about identifying duplicates, but just keeping the data normalised as poss so that the tables do not become monstrous when they don't need to be. As well as giving us an idea of what particular sources/countries/regions are sourced more regularly than others.

Thanks for the link on checking if a row exists!

Dan

November 18, 2008 12:16 PM
 

jamie.thomson said:

Dan,

I guess I'm still not understanding. If you need to find out how many times a particular value occurs then will the Aggregate component not help you?

Your use of the word "normalised" means I'm obligated to point you to my Normaliser component :)

http://blogs.conchango.com/jamiethomson/archive/2007/09/25/SSIS_3A00_-Normaliser-Component-bug-fix.aspx

-Jamie

November 18, 2008 12:48 PM
 

Dan_B said:

Your understanding is probably based on my poor explanation!

You've been most helpful and provided lots of info as to how I could potentially normalise the data after inserting. But I need to get the data from the original XML source file into the SQL DB to then normalise it, all the examples I have seen so far have been based on a SQL table as the source. Initially I thought I could normalise the XML data upon insert, it doesn't look like this possible anymore.

I think what I need to do is to stage the data in a DB first. I've just watched the video for normalising the data and think I could use maybe use this down the line :-) that or re-use TSQL code that has previously been used and I can then use this in SSIS.

Thanks again,

Dan

November 18, 2008 2:25 PM
 

jamie.thomson said:

Dan,

I fear you may be a bit confused. Once the data is in the SSIS pipeline (i.e. in the data flow) then its irrelevant whether the source is an XML file or a SQL table.

The whole point of the SSIS pipeline is to give you the ability to manipulate/transform the data in memory prior to inserting it. As this blog entry explains that is one option or, as you correctly observe, you can simply move the data from source to destination and then transform it after loading it into the DB. The choice is yours.

Perhaps this will help: http://www.sqlservercentral.com/articles/SQL+Server+2005+-+SSIS/thenewetlparadigm/1719/

-Jamie

November 18, 2008 2:34 PM
 

Dan_B said:

Ah, well in that case I was confused. I've recently starting learning SSIS so a lot of this is new ground to me!

Thanks for your patience to explain,

Dan

November 18, 2008 3:26 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems