I've been looking into issues caused by using BizTalk disassembler pipeline components to split out records in a large flat file for processing in a project I am currently working on and have seen some interesting issues with it.
Technically this is a very easy thing to do and both the XML and Flat File disassembler components can be easily used to publish multiple messages into the Message Box from a single input document. There is a very good discussion of how this is achieved technically on Scott Woodgate's Blog. This is a very easy and consequently very tempting way of processing through records from an input file. For example, each line of the file is published as a separate message and processed individually by its own orchestration instance.
So why Blog about this then? Well what I'm really interested in here is the gotchas around doing this, particularly once the inputs start to get relatively big. All worth thinking about, but not obvious when you initially look at using this sort of processing.
On the project I am working on, the input file contains over 300,000 records and each is published as a seperate message and each is processed by a separate orchestration. This particular piece of processing was implemented on a previous phase of the project and has suddenly become an issue, although the volumes involved have not actually changed.
Where the issues occur are at the SQL Server level and concurrency with other processing at the same time - particularly that requiring relatively low latency. When the file is picked up, it is processed into the Message Box as a single transaction - rightly so since we do not want some of the records being published and others not. Obviously this is a very large transaction and uses up a lot of SQL Server resources while it is happening. The TLOG grows significantly and there is blocking of other processing occurring too.
Looking at the application server running the pipeline, it also saw a lot of resource issues, with it CPU pegged at 100% for some time (over an hour).
Both these resource issues cause other processing to slow down and even fail. Interestingly, once all the records are published, the 300,000 odd orchestrations process through without any real issue.
There are plenty of patterns around for how to process high-volumes in BizTalk so I am not going to go on about it here. For those who are interested, we now BCP the file into a database table and work through those in "manageable" batches using BizTalk and the SQL adapter. Both the application and database servers are now very happy!