Welcome to EMC Consulting Blogs Sign in | Join | Help

Dave Morris' Blog

Zombies and Convoys

Just so you know, because I wouldn't want to lead anyone on, I’m talking BizTalk 2004 here and not the somewhat dodgy 1980’s horror film by Jesus Franco, Oasis of the Zombies. 

I’ve recently spent some time looking into a Singleton pattern for orchestrations to apply to the current project I am working on.  Basically we have a lot of interfaces that are scheduled in particular time slots but can only run once and must act as singletons.

This sounds initially very easy to do with a sequential convoy but is prone to zombies – i.e. the orchestration completes with discarded messages.  This occurs when messages that are part of the convoy set are queued for the orchestration but not consumed by it.

The topic of zombies itself is much wider than this narrow scenario I am looking at here but there are some very good details on the subject in the BizTalk Core Engine’s WebLog.

Since the length of processing time for any particular singleton is indeterminate the challenge has been to make sure all messages are consumed in every case.  Take the following example:

An interface is “kicked off” by its schedule every 15 minutes and its typical processing is within one minute so it’s never likely to receive multiple “kick off” messages in a single instance.  However, at month end, the volume is much higher and it can take as much as 4 hours to process, maybe a bit longer, maybe a bit shorter.  Here all subsequent “kick off” messages need to be consumed while it is running to stop multiple instances from running.

Here’s the basic solution that uses a parallel action in its processing, one branch consuming convoy messages and the other performing the singleton’s processing.

It looks pretty complicated but is actually very straightforward and involves synchronised scopes and a completed flag.  The processing branch sets a flag to say it is finished and the next time the receive branch times out it spots this and breaks out of its loop.

Why synchronised scopes?  Basically this is needed to allow parallel branches to update the same variables without trampling all over each other - it makes the orchestration thread safe.  The BizTalk development environment is good enough to not allow you to be unsafe and if you try and use the same variable in multiple branches you will get a build error.  Essentially it serialises access to the data.

So why shout about something so simple?  Well there are a few gotchas here. 

First of all by only checking for completion in a timeout situation, you are guaranteeing not to leave zombies if there are several queued up – obviously your timeout needs to be much shorted than the scheduling frequency.  Here I’ve got a 10 second timeout with a likely scheduling frequency of around 15 minutes.  This is a race condition but by being sensible about the timeout versus schedule frequency, it should never have an effect on the processing.

Next is the use of two variables, one for the loop condition and one for the finished state.  This is so the size of the synchronised scope in the receive branch can be as small as possible to limit the locking of the variables.  If only one variable is used, this means that a scope is needed around the entire loop.  The problem this causes is that is locks the loop condition variable so the processing branch cannot set it.  Essentially everything deadlocks.

Also be aware of the scopes themselves.  Here I’ve only got 2, one for checking the finished state and one for processing.  Both are non-transactional.  However, if they are transactional, then their locking of variables is moved out to the scope level rather than the individual shape level and again we can block.  The checking blocks until the main processing scope completes.  It then immediately sees the finished state set true and falls out the loop.  Since its been blocking on the check, there may well be queued messages for the orchestration that it has not consumed – zombies again.  I have a scenario where the main processing is in a transaction where the setting of the finished flag has been moved out into its own non-transactional scope after the main processing scope.

The last issue I saw was quite unexpected and occurred only because I was being overly anal about my testing for some reason – must have been bored!  If you leave the orchestration stopped but enlisted and queue up messages for it everything looks just fine.  However once you start up, the instance always completes with discarded messages.  The first message submitted is always received twice – once to activate the instance and then again as the first correlated message.

After a bit of head scratching it was straight onto Microsoft support since this is a blatant bug, although one we are unlikely to hit in this particular scenario.  It is the case for any sequential convoy though.  Here all subsequent messages are just swallowed to prevent further processing but imagine if you were actually processing them and adding up data!  There will be a fix available for this soon - once it has been verified.  Be aware it has not made it into SP1 though.  (See my colleague Matt Hall’s Blog on the Contents of SP1.)

I’ve packaged up the sample solution that can be downloaded from here.

 

Published 16 February 2005 14:22 by dave.morris

Comments

 

Damir Dobric said:

There many business scenarios, which require parallel processing of the data. Imagine, there are many...
April 21, 2006 11:39
Anonymous comments are disabled
Powered by Community Server (Personal Edition), by Telligent Systems