Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Once upon a time this blog was a hive of activity. Now however its pretty lifeless as you can probably tell so if are pining for more of the same you can find me over at http://sqlblog.com/blogs/jamie_thomson. I look forward to seeing you there!

SSIS Nugget: The difference between synchronous and asynchronous script components

UPDATE: Check out kdog's useful comments at the bottom as well!

Script components can either be synchronous or asynchronous. There are a number of definitions about the difference between the two including a description that I posted back in February.

I thought it would be useful to demonstrate the difference between synchronous and asynchronous script components. I have built a data-flow that contains 2 script components. They both do nothing more than pass the input data to the next component adding one extra column using a simple calculation, the difference being that one does it synchronously, the other does it asynchronously.

The data I'm using to demo this comes from the AdventureWorks.Sales.SalesOrderDetail table. I want to multiply OrderQty by UnitPrice to get the total price of the OrderDetail. [N.B. This information is already available in the table as an explicit column but for demo purposes it doesn't matter.] Here's a screenshot from SQL Server Management Studio showing, in raw SQL, what I'm going to do in my data-flow:

20050722RawData.jpg

Here's my data-flow. As you can see I have used a multicast component to send identical data to the two script components.

20050722DataFlow.JPG

Here's the code in the synchronous script component:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
   Inherits UserComponent

   'This method is executed for every row in the input buffer
   Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
      'Calculate the total
      Row.OrderDetailTotal = Row.OrderQty * Row.UnitPrice
   End Sub
End
Class

And here's the code in asynchronous script component. Remember that they both do exactly the same thing, just in different ways:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
   Inherits UserComponent

   'This method is executed for every row in the input buffer
   Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

      'Explicitly add a row to the output buffer
      Output0Buffer.AddRow()

      'Add rows from the input buffer
      Output0Buffer.OrderQty = Row.OrderQty
      Output0Buffer.UnitPrice = Row.UnitPrice

      'Calculate the total
      Output0Buffer.OrderDetailTotal = Row.OrderQty * Row.UnitPrice
   End Sub
End
Class

So what's the difference?
Basically the synchronous code does not have to explicitly reference an output buffer. That's because in a synchronous script component only one buffer is in use, we simply have to add values into that buffer. Notice that in the synchronous code there is no requirement to manipulate the incoming columns - they will naturally flow out of the component onto the next downstream component. An asynchronous script component requires us to explicitly place values into an output buffer including any that are simply being passed from the input buffer.

It may also be of interest that I selected 5823216 rows from the source (by use of many UNION ALL statements). Due to the multicast component this meant that 11646432 rows were flowing into a destination. This executed in 111seconds which I think is fairly impressive considering we have 2 script components in here which are managed, not native, components. I then ran it without debugging and it was even quicker...82seconds. ~11million rows processed in 82 seconds is impressive by any stretch. Try and imagine doing THAT in a DTS ActiveX script!!!

I have made the package used herein available here: http://blogs.conchango.com/Admin/ImageGallery/blogs.conchango.com/jamie.thomson/20050722Synchronous_v_Asynchronous.zip. You will need to have AdventureWorks installed and will also need to install Konesans' Trash Destination Adapter which you can get from here: http://www.sqlis.com/default.aspx?56. It is very very easy to install and full instructions are provided.

-Jamie

 

Published 25 July 2005 12:24 by jamie.thomson

Comments

 

KDog said:

You know all this Jamie, but possibly useful comments for others:

* Selecting only the 2 columns actually used (instead of select *) increases performance by several times since both the database engine & the SSIS package has less to do

* By running the sync & async as two tasks sequentially, you can see that the sync task is ~3 times faster than the async task - makes sense since the async task has to do more work by creating new rows

Here's a package with this (and the quite important DefaultBufferMaxRows = 100000) Defaulthttp://erectum.dyndns.org/files/Synchronous-Asynchronous-Speedo.dtsx configured.

Cheers,
July 25, 2005 16:09
 

Jamie Thomson's Blog said:

I hate commuting. There's no two ways of saying it, I despise it. Every day I face the 70 minute grind...
July 25, 2005 16:10
 

Jamie Thomson - Life, the universe and SSIS! said:

I hate commuting. There's no two ways of saying it, I despise it. Every day I face the 70 minute grind...
July 27, 2005 16:04
 

SSIS Junkie said:

December 13, 2006 23:52
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems