Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS: SELECT *... or select from a dropdown in an OLE DB Source component?

The OLE DB Source component allows a number of methods for extracting data from an OLE DB Source. The two most obvious are:

1) Write a SQL statement to extract data

2) Select a required table from a drop down box.

I have previously recommended that using a SQL statement is the preferred option because selecting from the drop down box of tables is akin to doing a SELECT * FROM tablename - and that is readily recognised as bad practice. If you are selecting all columns anyway though then it doesn't matter which option you pick because they are both the same, right? Wrong! Let me explain.

I received a call from a colleague today enquiring about a strange occurence he was seeing when using the OLE DB Source component. He was extracting data from a view that, when run in Management Studio, returned approximately 100000 rows in about 60seconds.

The problem was that when he used the view in the OLE DB Source component SSIS appeared to hang. He waited 15 minutes before crashing out of the job.

When I spoke to him it transpired that he had chosen option 2 - selecting the view from the dropdown box. We changed it to option 1-using a SQL statement, using SELECT *...- and that fixed the problem. The data-flow executed in about 60seconds as previously expected.

 

This really interested me. Up to now I had assumed that selecting a table from the dropdown box and issuing SELECT * FROM table would result in exactly the same behaviour - obviously this is not the case. I've been looking into the differences between the two by using SQL Server profiler to capture the SQL that is fired at the data source.

Here's the output in Profiler when selecting the table from the dropdown box:

20060221dropdown.JPG

 

And here's the output in Profiler when using a SQL statement:

20060221sql.JPG

 

As you can see the Profiler output is radically different in both cases so my earlier assumption that both options would result in the same actions in these circumstance was completely wrong. We can observe that when a SQL statement is issued the execution plan uses sp_prepare & sp_executesql but this is not the case when the alternative option is taken.

 

This observation helps to explain why my colleague was seeing such disparate execution times. Thankfully we found the problem and applied a simple fix. I can't explain why the execution times were so far apart - that requires someone who knows much more about SQL performance tuning than I do. I've no doubt that the fact that the object being selected from was a view (and a very computationally intensive one at that) contributed to the problem.

This whole episode merely serves to reiterate my recommendation (and Simon's) to always use a SQL statement in your OLE DB Source components rather than just selecting a column.

-Jamie

UPDATE 20th June 2006: Donald Farmer explained this in his recent webcast "SQL Server 2005 Integration Services: Performance and Scale (Level 400)". In there he stated that if a table is selected then SSIS issues an OPENROWSET, if a SQL statement is usedit issues sp_executesql. So there's your answer!

 

UPDATE 2006-10-17: If you want another reason why it is a good idea to always use a SQL statement then just read this.

UPDATE 2007-01-24: Here is another instance of someone experiencing the same problem with selecting a table or view from the dropdown box: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1149085&SiteID=1

Published Tuesday, February 21, 2006 9:22 PM by jamie.thomson

Comments

 

Professional Association for SQL Server (PASS) SIG said:

February 22, 2006 9:12 PM
 

Jamie Thomson - Life, the universe and SSIS! : SSIS: Development Best Practices and naming conventions said:

March 28, 2006 1:21 PM
 

Steve said:

Just a thought about the caveat to not use "select *" (or table option) in SSIS packages.  Sometimes DB Schemas change and not everybody is alerted.  So, would it be better that the first time a column is added, the developer learns about it -or- the data is not updated until some customer alerts the developer that the data has not been updated for weeks?  I realize that in a perfect world communication is perfect, but....Just a thought.

December 6, 2007 10:40 PM
 

jamie.thomson said:

Steve,

That's why you SHOULDN'T use SELECT *. If that were to happen then your package would error due to unexpected metadata.

-Jamie

December 7, 2007 12:24 AM
 

BI Thoughts and Theories said:

It’s generally accepted* that when you are using an OLE DB Source or an ADO.NET Source, you should specify

June 30, 2009 12:02 AM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems