Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Informatica -vs- SSIS pipeline throughput

A long time ago my colleague The Galloping Data Architect and I wrote a paper entitled "A Comparison of Enterprise Class Data Integration Tools. SQL Server Integration Services for the Informatica Customer". You can get it from here and read what I had to say about it on this blog here.

I guess that at least one person has read it because today I got asked the following question via email:

I'm working on a conversion project and I'm trying to compare performance of SSIS with Other ETL Tools, especially Informatica PowerCenter. Which one do you think is better ETL performer, when source and destination being SQL Server databases?  Is there any benchmark available?

I'll presume that the questioner was referring to the throughput performance of an Informatica Mapping versus a SSIS Dataflow Task - for the purposes of this blog entry I'll refer to this metric as "pipeline throughput". The short answers to the two questions posed there are "I don't know" & "Not that I know of". That's not very helpful though so its only fair that I expand and explain a bit more.

As anyone that has worked on benchmarking in this industry will tell you there are many many factors that may influence benchmark tests and hence producing a single definitive answer is a difficult thing to do and in a lot of cases proving that your benchmark tests are trustworthy is more difficult than actually performing the tests. Also bear in mind that there are so many "knobs to tweak" these days I suspect that the "winner" in a performance comparison often comes down to the skill of the person performing the test rather than the abilities of the software. The scenario in which you are testing is a factor; some tools may perform better in some scenarios than others. Furthermore, it is rare that a published benchmark will be truly unbiased - usually you will find that a benchmark is affiliated with the "winner" in some way so I generally take those benchmarks with a proverbial grain of salt. The old adage that you can prove anything with statistics is one that tends to hold true.* 

Anyway, I digress. Back to talking about ETL tools. I stated earlier that I didn't know whether Informatica or SSIS had quicker pipeline throughput and to that I would add "...and I don't really care either". The main contributors to pipeline throughput will be the hardware that you run it on and after that you may be able to tweak the software for extra performance gains. At the heart of most (any???) performance metrics on a computer system is the number of CPU cycles that you can eek out in a given time period and that holds true no matter what test you happen to be running and which software tools you happen to be testing. Hence, its eminently possible that I could take the same software, same operating system, same tests and run them on two different hardware configurations and derive completely conflicting conclusions.

I'm still no closer to helping the guy who asked the question above to decide which is quicker. And I'm not going to get any closer either - partly because I don't want to say something that someone could later prove to be false (remember - you can prove anything with statistics). I will say this though, I suspect that the difference in performance will be so minutely negligible that it does not even warrant consideration. There are bigger differentiating factors to be taken into account when evaluating ETL tools, primarily cost. For example, Ab Initio routinely claim that their ETL tool is quicker than any other out there and given that none of their competitors ever challenges them on that claim I suspect that its true for the scenarios in which Ab Initio want to be quickest. That's why companies like Vodafone use Ab Initio to process their call records and why Capital One use it to process their credit card transactions. Those companies need the raw speed that Ab Initio can provide; they can also afford to pay the big BIG bucks to get it. Other factors that should influence purchasing decisions include existing hardware/software infrastructure, existing skills base, ease of administration, feature set, ongoing support costs, developer productivity and many many more.

I suspect I haven't much helped the guy who asked the questions above but I hope I've explained WHY I haven't helped him. The simple truth is that I don't know whether SSIS or Informatica is faster and, more than that, I suspect I'll never know.

-Jamie

*Note that not all benchmarks are subjective, the Transaction Processing Performance Council benchmarks are the most objective that I know of in the database world and Kevin Kline has talked about those a lot if you are interested in knowing more about them from a SQL Server perspective.

Published Tuesday, August 21, 2007 3:23 AM by jamie.thomson

Comments

 

Shiva said:

Here is my opinion on this topic -

Assuming that Informatica has an inbuilt component to use SQL Native Client to connect to SQL Server 2005 databases much like SSIS to pass data in and out, we are left wondering if the transformation components in SSIS outperform the ones in Informatica. My gut feeling would be that the components in Informatica are more mature and perform faster right now. The gap may then be bridged in the subsequent releases of SQL Server. SSIS components are written in .NET framework. I have also heard that Perl compiler outperforms the .NET compiler on things such as transformation that are based on string manipulation for instance even on Windows O/S. Lets for argument sake say that Informatica uses C++ to write its transformation components, we are now pitting C# against C++ assuming that the quality of code on both sides are equivalent. Not sure if this argument is heading somewhere, but I guess you get the picture.

Once again, there may not be a clear cut answer to the question - Is Informatica faster than SSIS or vice versa?

Thanks,

Shiva

August 21, 2007 5:40 AM
 

jamie.thomson said:

Shiva,

The stock SSIS components (i.e. those provided in the box) are NOT written in .Net. Neither is the pipeline engine.

-Jamie

August 21, 2007 5:51 AM
 

Shiva said:

Thanks for providing the correction there Jamie.

I now seem to recall Erik mentioning to me that the SSIS components were written in C++. Sounds about right? How about the pipeline engine though?

August 21, 2007 7:12 PM
 

jamie.thomson said:

Shiva,

Pipeline engine is the same. Its all native (i.e. not managed) code.

-Jamie

August 21, 2007 8:28 PM
 

James Chen said:

I have done comparison between Informatica and DTS. We did the test with straight loading of a few tables. Informatica is several times slower than DTS. I think part of the reason is as suggested - there's no proper driver/native client for SQL Server from Informatica. The other part is due to the way Informatica uses bulk insert mode. I could never get Informatica to do bulk insert as fast as we want so we end up using SQL command like "bulk insert" in Informatica.

BTW, I'm another person who read your white page :-)

August 23, 2007 6:23 PM
 

jamie.thomson said:

Hi James,

Thanks for the comments.

I think this kinda proves the point that its entirely dependant on the scenario that you happen to bes testing. Its fair to expect that SSIS will do a better job of inserting to SQL Server than any other tool will. Would the same be true of Oracle? i suspect not.

-Jamie

August 23, 2007 8:48 PM
 

vs ten pipeline said:

June 29, 2008 6:45 AM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems