I recently had cause to look into the various ways that I could load data from the SSIS pipeline into SQL Server. There are three destination adapters for doing this:
[OK, the first two are actually the same destination adapter just with differing configurations but for the purposes of this post we'll treat them as two different ones]
We all hopefully know that in a straight side-by-side comparison OLE DB Destination is the slowest option and SQL Server Destination is the quickest.
Hopefully we also all know by now about the SQL Server Destination - its an adapter for getting data into SQL Server incredibly quickly but it does have the disadvantage that it can only be used when the SSIS package is being run on the same server that you are inserting into (by the end of this post you should understand a little about why).
I wanted to understand a little bit more about what happens under the covers with these three adapters. To that end I knocked up a SSIS package containing three data-flows. Each data-flow inserted the same data into the same table in SQL Server using the same SSIS Connection Manager, the only difference being the destination adapter being used.
The "Setup" sequence container puts my source data into a raw file - Konesans' Data Generator is used to generate my data- and creates my destination table if it doesn't already exist. I initially generated 1000 rows of data.
The other three sequence containers just separate out my three data-flows along with any annotations that I attribute to each.
I used the ever-brilliant SQL Server Profiler to capture what commands were being sent through to SQL Server in order to realise the insert.
The OLE DB Provider used in the Connection Manager was SQL Native Client (SNAC).
OLE DB Destination
When using OLE DB Destination the access mode for getting data into the table is OLE DB OPENROWSET. But how does that manifest itself when doing the insert? Well here's the commands that I caught with Profiler:
exec sp_cursoropen @p1 output,N'select * from [dbo].[Destination]',@p3 output,@p4 output,@p5 output
exec sp_cursorclose 180150003
There were 1000 exec sp_cursor commands - one for each row of data. Here is that Profiler trace.
I had never heard of sp_cursoropen, sp_cursor, sp_cursorclose before so I looked them up in BOL and I found this page. It turns out that they are provided only for internal use by SNAC and are not supported if you call them directly.
Well OK, that makes sense because its SNAC that is doing the work. Straightaway we can see why this is the slowest option though right? It executes a statement for every row of data in the pipeline.
OLE DB Destination with FastLoad
Where OLE DB Destination uses an access mode of OPENROWSET, OLE DB Destination with FastLoad uses OPENROWSET with FastLoad.
FastLoad means that the adapter uses BULK INSERT for the insertion. Here is the insert command as captured in Profiler:
[GeneratedStr_1] varchar(5) collate SQL_Latin1_General_CP1_CI_AS,
[GeneratedStr_2] varchar(5) collate SQL_Latin1_General_CP1_CI_AS,
[GeneratedStr_3] varchar(5) collate SQL_Latin1_General_CP1_CI_AS,
[GeneratedStr_4] varchar(5) collate SQL_Latin1_General_CP1_CI_AS,
[GeneratedStr_5] varchar(5) collate SQL_Latin1_General_CP1_CI_AS,
There was only one insert command issued because I had FastLoadMaxInsertCommitSize=0 on the destination adapter which means everything is issued in a single commit. Changing this property results in more insert commands.
Here is the Profiler trace:
Its pretty clear to see from the Profiler trace that using the OPENROWSET with FastLoad option results in a bulk insertion but I wanted to know more about it and after hunting around in BOL I got a surprise. BULK INSERT command is there but INSERT BULK is undocumented. Not only that but I wanted to know why the Profiler trace didn’t show where the data was coming from - all we get is a column list.
I turned the question over to the SQL Server community at Conchango and my colleague Joe Pollock found this article by Ken Henderson which makes mention of it. Not a lot though, all it says is it is a “special TDS packet type” for bulk loading data. Well that didn’t help too much but another one of our guys (and fellow MVP) David Portas was able to provide a bit more information. TDS stands for Tabular Data Stream and is the protocol that OLE DB uses to talk to SQL Server.
So the upshot of all that is that the OLE BD Destination with FastLoad uses OLE DB’s implementation of BULK INSERT to get data into SQL Server.
SQL Server Destination
Reading BOL will tell you that the SQL Server Destination also uses BULK INSERT, just as the OLE DB Destination with FastLoad option does (as we have just learned). So why is it any better? Well perhaps there is a clue in what we see in the Profiler trace:
BULK INSERT [dbo].[Destination]
FROM 'Global\DTSQLIMPORT 0000000000000c1c000000000096c5a8'
WITH (DATAFILETYPE = 'DTS_Buffers', CODEPAGE = 'RAW', CHECK_CONSTRAINTS, TABLOCK)
Here’s the full screenshot:
BOL is very light on what goes on under the covers with the SQL Server Destination (and I have requested that the BOL article be heavily expanded) but as far as I can discern an in-memory object called Global\DTSQLIMPORT is created. The fact that it is an in-memory object helps to explain why SQL Server Destination is so quick – any data stored in memory will be accessed quicker than anywhere else.
The DATAFILETYPE = 'DTS_Buffers' part is also very interesting. Again it isn’t documented in BOL (the documented options are 'char' | 'native'| 'widechar' | 'widenative') and again I have requested that it is in the future. I am assuming that this undocumented DATAFILETYPE was created specifically for the SSIS SQL Server Destination adapter – I hope this will be clarified in the future.
One thing I do know about the SQL Server Destination from reading BOL is that the user executing the SSIS package requires permission to create global objects. This permission is defined in Windows’ “Local Security Policy” tool. In the screenshot here I have highlighted the permission that needs to be set
This is more interesting. This is a setting within Windows itself, not within SQL Server. It looks as though the SQL Server Destination is creating an in-memory object in the memory space of the Destination server. That would explain why it has to be run on the same server as the destination, BULK INSERT will not be able to access the memory space of a remote machine.
Clearly, understanding how data moves between SSIS and SQL Server is integral to debugging and performance tuning our applications so hopefully this post will have given you some useful insights. I have been wanting to write this blog post for a long long time because I am a real advocate of understanding the components that are available so you can make them work effectively. And aside from that I just wanted to know some of the answers herein for myself!
There is still a lot more to learn I’m sure as for the purposes of this post I have only investigated use of SNAC. There are a number of other OLE DB Provider options available to you depending on your source system:
If there is anything here that you would like to follow up on then please get in touch. General comments, criticisms, corrections or additional information are all very much welcomed.
I have made the demo package that I built for this available here in case any of you want to try this out for yourself.
UPDATE 2007-12-19: Lara Rubbelke provides some useful supplementary information here: http://blogs.digineer.com/blogs/larar/archive/2006/08/20/385.aspx. if you want to read more of Lara's missives she now blogs here: http://sqlblog.com/blogs/lara_rubbelke/archive/2007/10/16/we-are-leaders.aspx