Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS Nugget: Engine Threads

There is a property of each data-flow task called EngineThreads which dictates, quite simply, the number of threads that run in the data-flow pipeline. But what does that mean exactly and how can it affect your data-flow? Well BOL doesn't have much on the subject simply saying "An integer that specifies the number of threads that the data flow task can use during execution". Well that doesn't help much does it? It doesn't tell you what an engine thread actually is so by way of clarification I set about trying to find out more about them.

So I think we can make some simple assumptions. Like, a thread can execute in parallel with other threads and the maximum number of threads that can execute in parallel is determined by the EngineThreads property. Let's show that. here's a data-flow at design-time.

20051002SeperateThreadsDesignTime.JPG

You can see that there are 8 completely independant data paths here right? Each data path has an arbitrary number of rows produced by the source script component so that they all execute for varying lengths of time. So what happens when you run this thing? As you've probably guessed by now the EngineThreads property determines how many of these run in parallel. On the next screenshot you can see what happens at run-time when EngineThreads=5.

20051002SeperateThreadsRunTime.JPG

There are only 5 data paths executing in parallel. Which is what I expected to be honest.

So then I started to wonder if you can have more than 1 thread when the data paths are not independant. In other words, can you have multiple threads in the same execution tree. Here's my first attempt at looking into that (shown at run-time):

20051002CombinedThreadsRunTime.JPG

Here I've got a single source adapter and a multicast component that splits that into 8 seperate data paths. I have EngineThreads=4. As you can see from the annotated rowcounts the value of EngineThreads does not affect the 8 data paths executing in parallel so obviously these are all still in the same thread.

So what do we know at this point? Well, if I have 1 source component that that results in 1 thread, no matter what you do with it downstream. So then I thought, what happens if we have more than 1 source component and downstream we combine them into 1 data-path. How many threads are executing then? Well here's an attempt to investigate exactly that:

20051002MergeThreadsRunTime.JPG

On this data-flow I had EngineThreads=2.

Now OK, there's alot of yellow boxes on here but the important thing to look at is the rowcount annotations. There are 5 source adapters here but only 2 of them are producing rows. 2 you say? Well hey, that's how many EngineThreads I've specified this thing to have!!! From this data-flow we can deduce that each source adapter produces a seperate thread regardless of whether the data paths are combined downstream or not.

 

So what have we learned here? Well quite simply every source adapter results in a seperate thread, nothing more that that. This is kind of backed up by the following statement that I have just found in BOL as well: "The source threads in an execution plan represent the source components in the data flow". This article actually talks about source threads and work threads as seperate things in the data-flow so I think there's more to be discovered here - but I'll save that for another day.

So the ultimate question to be answered here is "What should I set EngineThreads property to?" Well as usual the answer is "It depends" so the only advice I have is to test and measure, test and measure, test and measure... see what works for you.

You can download the package that I built to demo all this from here: http://blogs.conchango.com/Admin/ImageGallery/blogs.conchango.com/jamie.thomson/20051002EngineThreads.zip

-Jamie

 

 

 

 

Published Sunday, October 02, 2005 4:26 PM by jamie.thomson

Comments

 

Professional Association for SQL Server (PASS) SIG said:

October 3, 2005 5:17 AM
 

Scott Barrett said:

Jamie...

Thanks for that great piece of information! I was wondering what this could be used for, but didn't have time to test. In one of my SSIS packages, I have 22 sources from Excel. This change has greatly increased the performance.

Scott Barrett
October 3, 2005 3:32 PM
 

jamie.thomson said:

Scott,
What change did you make? Increase or decrease EngineThreads?

-Jamie
October 5, 2005 8:21 PM
 

SimonS' SQL Server Stuff said:

No this isn't about sewing and nature but about how SSIS processes packages. Jamie Thomson has a post...
October 15, 2005 11:18 PM
 

Simon said:

A bit more on worker threads
October 15, 2005 11:19 PM
 

simonsabin said:

No this isn't about sewing and nature but about how SSIS processes packages. Jamie Thomson has a post...
December 2, 2005 8:41 PM
 

Simon Sabin SQL Server Blog said:

No this isn't about sewing and nature but about how SSIS processes packages. Jamie Thomson has a post...
December 4, 2005 9:26 PM
 

fcuppens said:

Jamie,

I found this a verify usefull thread. However, I ran into a new parallelism issue. I have a package with a number of parallel data flow tasks. For each task, I created a post execute event handler for logging purposes, as you described in one of your other articles. However, data written to the Log - tables is incorrect. Not for each data flow task a record is written to the database table, and for those records that are logged, the source name and number of records affected are not correct.

Regards
January 3, 2006 8:49 AM
 

SimonS said:

No this isn't about sewing and nature but about how SSIS processes packages. Jamie Thomson has a post...
February 9, 2006 12:57 AM
 

SimonS said:

No this isn't about sewing and nature but about how SSIS processes packages. Jamie Thomson has a post...
February 9, 2006 1:01 AM
 

Geof said:

Good article.  

I've run into an odd problem with EngineThreads...

Simple package, A ForEach loop with a single data flow...
XML data source, 2 branches, one side synchronous transformations, the other branch doing an asynch script which does (essentially) an conditional aggregate.  
The branches brought back together via a merge join and sent to DB
Processes under ten rows per data flow execute.

On the workstation machine (2 x 2.6GHz, full VS.NET 2005 installed) it works fine against the dev SQL box.
When installed/run directly on the SQL box (1 x PIII 550, only SQL 2005 + tools installed), it will hang on the Merge Join step, UNLESS I set EngineThreads to 2.  3 or more, and it just stops at the Merge Join step without exiting.

Any Thoughts?
April 4, 2006 4:03 AM
 

SimonS' SQL Server Stuff said:

No this isn't about sewing and nature but about how SSIS processes packages. Jamie Thomson has a post...
May 23, 2006 10:27 AM
 

TJ said:

How do these threads relate to TCP port connections?  I have a package that loops a configurable amount of times bringing in customer data via a check out stored proc.  When this package is run it creates thousands of TCP port connections and eventually gives a "Login time expired" error since the maximum number of port connections is used up.  I cannot figure out if this is a package configuration problem or a SQL Server SSIS problem.

Any thoughts?

July 29, 2008 8:05 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems