Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Once upon a time this blog was a hive of activity. Now however its pretty lifeless as you can probably tell so if are pining for more of the same you can find me over at http://sqlblog.com/blogs/jamie_thomson. I look forward to seeing you there!

SSIS Nugget: Bitwise Checkpointing

Checkpoint usage in SSIS is a topic that is not well understood at all. The mechanism for defining checkpoint files and configuring containers to use them is "fiddly" at best and isn't helped by a bug in the Sequence Container (of which you can read more about here). I hope to find the time to write a definitive article in the near future that explain all the nuances of checkpoint files.

One thing is true however and I can state this categorically. In order for a checkpoint file to exist after the package has finished executing, the package MUST fail execution. Note that by default, if any task in your package fails then the package will still succeed. If you want to prove this, follow these steps:

  1. Start a new package
  2. Drag on a data-flow task
  3. Set the data flow task property ForceExecutionResult='Failure'
  4. Put a breakpoint on the data flow task.
  5. Set package properties CheckpointFileName='c:\temp\chkpnt.xml', CheckpointUsage='IfExists' & SaveCheckpoints='True'
  6. Execute the package

The package will hit the checkpoint. At this point check 'c:\temp' and you will see that the checkpoint file 'chkpnt.xml' has been created. Run the package through to completion and you will see that the data flow task fails and the checkpoint file disappears. This is because even though a task in the package failed the package did not, hence no checkpoint file is created. You will also see the following messages in the output window:

Note the warning near the end of the output. Even though the task failed the package will continue running. And it ends, exiting with code 0 - meaning success.

This can sometimes be a problem. What if I want my package to always succeed but on the next execution I only want the previously failed containers to execute? Unfortunately, as just proven, SSIS checkpoint files are not an option to us so we need another solution. And here it is - bitwise checkpointing.

The basic premise is simple. Each container in the package is preceded by a conditional precedence constraint that will cause the container to be skipped (or not). Question is, what do we base the expression on?

Well, if we conceptually assign each container a value which is some power of 2 then we can maintain an integer variable (scoped to the package) that the conditional precedence constraint can check, using the bitwise '&' operator, to see if the power of 2 value of the container that follows it has been added to the variable value. A script task following each container adds the container's power of 2 value to the integer. At the end of the package we store the variable value somewhere so that we can use it on the next execution.

That was a heck of a mouthful so I've built a package to demo what I am talking about. Here is the package:

Some things to note from this picture:

  1. Variable called CheckpointVariable that is initialised to zero
  2. Script task Store checkpoint value external to package that stores the value of CheckpointVariable in an operating system environment variable called BitwiseCheckpoint. Note that this container executes last.
  3. Script task Retrieve checkpoint value from environment variable that retrieves the value of BitwiseCheckpoint and stores it in CheckpointVariable. Note that on the first execution BitwiseCheckpoint will be empty so CheckpointVariable will remain initialised at zero.
  4. Seven dataflow tasks. For convenience I have named them according to the power of 2 that I have conceptually assigned to them. These names are only a visual aid and are not used in any way.
  5. Each dataflow tasks is preceded by a conditional precedence constraint. The expression on each precedence constraint is
    (@[User::CheckpointVariable] & X) != X
    where X is the name of the subsequent dataflow task.
  6. Each dataflow task is followed by a script task that adds its power of 2 value to CheckpointVariable.
  7. Script task Store checkpoint value external to package is outside of the sequence container hence it will execute regardless of whether all the dataflow tasks succeed or not.

What you cannot see from the picture above is that the dataflows don't actually do anything (they don't contain any components) and dataflow task "4" has got property ForceExecutionResult='Failure' so as to simulate a failure.

When we execute the package for the first time, here is what happens:

There's nothing out of the ordinary there. Task "4" has failed and the package has run through to completion. I now set property ForceExecutionResult='None' (i.e. the default) on task "4" and execute again. Here is what happens.

Task "4" has executed (successfully this time) but none of the other dataflow tasks have. Remember, there is no checkpoint file at work here - we've done it all through our makeshift bitwise checkpointing mechanism.

That's all there is to it. As ever I've made the demo package available here. The Live SkyDrive team have done an upgrade and annoyingly none of the old links work anymore. The new link is here:

If there are any more problems with this link, please reply with a comment and let me know.

-Jamie

Published 19 July 2007 00:11 by jamie.thomson

Comments

 

Professional Association for SQL Server (PASS) SIG said:

July 27, 2007 15:54
 

Professional Association for SQL Server (PASS) SIG said:

July 27, 2007 15:54
 

SSIS Junkie said:

I've spoken about Live Folders quite alot on here lately and some big news from that product team

August 9, 2007 19:00
 

Stuart Lawrence said:

Hi Jamie

I've been following threads and investigating issues with checkpoints.  Can you tell me, does the example you provide work if you have 2 points of failure.  For example, 4 and 8 both fail.  My interpretation of writing to a single variable is that the variable will only contain one value (either 4 or 8, whichever fails last) and so therefore on the rerun will only execute one of the failed tasks rather than both...is this correct?

thanks, Stuart

December 12, 2008 22:13
 

jamie.thomson said:

Hi Stuart,

Yes, this method will still work if you have 2 points of failure. I'll draw your attention to the statement above:

"A script task following each container adds the container's power of 2 value to the integer"

I perhaps haven't explained it very well so I recommend you download the package and try it for yourself.

-Jamie

December 15, 2008 12:39
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems