What are checkpoints?
SQL Server Integration Services (SSIS) contains a fantastic new feature called Checkpoints which enables us to restart a package from a previous point of failure rather than run the package from the start. Details about a package's execution are stored in a checkpoint file which will be read upon the next execution so that the package can determine which task it should start execution from.
Why should we use them?
The benefits of this are obvious:
- No need to re-run tasks that have already completed - invaluable if these tasks took a long time to complete.
- We can effectively group together tasks into logical units of work that should complete or fail as one. In fact I would recommend this as a best practice.
How should we use them?
Checkpoints should be designed as part of your ETL process - do not assume that a checkpoint can be dropped into any point in your package and the package will automatically recover itself upon failure as you would expect. The use of checkpoints should be planned and should be placed at logical places in the package control-flow.
Here is a "for instance". Your package contains multiple sequence containers that run in parallel. Each container has an Execute SQL Task that truncates a table and then a data-flow that populates that table from a flat file. Each of the sequence containers has an OnSuccess precedence constraint going to a task that logs the fact that all of the data staging has occurred successfully. See the next screenshot:
It wouldn't make sense to put a checkpoint on any of these parallel sequence containers. They execute in parallel so we want them to complete or fail as one logical unit of work so we would put the checkpoint on the "Log Staging Completion" task. This means that:
- If any of the file staging routines fail, they effectively all fail. Next time round we run the package and all files will be staged again (without error this time one would hope). Note that he package has been designed accordingly by ensuring that the destination tables are truncated each time. This ensures that the tables do not not contain the same data twice.
- If any downstream tasks fail, we won't need to re-stage the files
How do we configure a package to use checkpoints?
I have recently had some fun and games trying to get checkpoints to work for me and BOL wasn't too particularly helpful so I thought I'd put the information up here.
3 properties of the package must be set for which the effect is fairly self-explanatory
- CheckpointFileName - Name and location of the checkpoint file
- CheckpointUsage - should the package use a checkpoint file or not
- SaveCheckpoints - Should the package save checkpoints or not.
In my tests I set:
- CheckpointFileName="c:\MyCheckpoint.txt"
- CheckpointUsage=IfExists
- SaveCheckpoints=TRUE
There is a fourth property called FailPackageOnFailure that can affect behaviour and BOL is a little ambiguous about its affects. I'll explain...
I tested behaviour using a simple package containing 3 script tasks executing in series. The first always executes successfully, the second with failure and the third successfully.
[You can download this test package from here]
The FailPackageOnFailure property is a property of the package and of each container (including the TaskHost container) in your package. I tried various combinations of settings of this property on both the package and the "Failure" task. Here are the results.
| Package.FailPackageOnFailure |
"Failure".FailPackageOnFailure |
Resultant behaviour |
| False |
False |
No checkpoint file is created
Subsequent executions begin at task "Success1" |
| False |
True |
The checkpoint file is created
Subsequent executions begin at task "Failure" |
| True |
False |
The checkpoint file is created
Subsequent executions begin at task "Success2" |
| True |
True |
The checkpoint file is created
Subsequent executions begin at task "Failure" |
First thing to realise, if the FailPackageOnFailure property of your package and the failing task are set to FALSE then you won't get a checkpoint file, regardless of CheckpointFileName, CheckpointUsage, SaveCheckpoints.
I especially want to draw your attention to the third row in this table. On first execution the package stopped at task "Failure" (as expected) but on the subsequent execution this task was ignored and execution continued merrily on its way from the next task. Effectively the failed task has been ignored. The hard and fast rule is if you want execution from begin from a task following a failure on that task, set its FailPackageOnFailure property to true. This isn't immediately evident from reading BOL and is the main thing I wanted to draw your attention to here.
I hope I've managed to exhibit what I set out to. That is, ensure you understand exactly how the configuration of your package and the tasks within that package will affect execution in the event of a failure. If you want execution to start from a given failed task, set that task's FailPackageOnFailure property to true (the default is false) or you won't get expected results.
Feel free to download the test package and try for yourself. Checkpoints are wonderfully powerful but must be understood fully before you begin to use them!
-Jamie
P.S. Thanks to Nick Berezansky from the SSIS dev team for the help he has given me in understanding checkpoints.