Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS: Suggested Best Practices and naming conventions

I thought it would be worth publishing a list of guidelines that I see as SSIS development best practices. These are my own opinions and are based upon my experience of using SSIS over the past 18 months. I am not saying you should take them as gospel but these are generally tried and tested methods and if nothing else should serve as a basis for you developing your own SSIS best practices.

One thing I really would like to see getting adopted is a common naming convention for tasks and components and to that end I have published some suggestions at the bottom of this post.

This list will get added to over time so if you find this useful keep checking back here to see updates!

  1. If you know that data in a source is sorted, set IsSorted=TRUE on the source adapter output. This may save unnecassary SORTs later in the pipeline which can be expensive. Setting this value does not perform a sort operation, it only indicates that the data it sorted.
  2. Rename all Name and Description properties from the default. This will help when debugging particularly if the person doing the debugging is not the person that built the package.
  3. Only select columns that you need in the pipeline to reduce buffer size and reduce OnWarning events at execution time
  4. Following on from the previous bullet point, always use a SQL statement in an OLE DB Source component or LOOKUP component rather than just selecting a table. Selecting a table is akin to "SELECT *..." which is universally recognised as bad practice. (http://www.sqljunkies.com/WebLog/simons/archive/2006/01/20/17865.aspx). In certain scenarios the approach of using a SQL statement can result in much improved performance as well (http://blogs.conchango.com/jamiethomson/archive/2006/02/21/2930.aspx).
  5. Use SQL Server Destination as opposed to OLE DB Destination where possible for quicker insertions I used to recommend using SQL Server Destinations wherever possible but I've changed my mind. Experience from around the community suggests that the difference in performance between SQL Server Destination and OLE DB Destination is negligible and hence, given the flexibility of packages that use OLE DB Destinations it may be better to go for the latter. Its an "it depends" consideration so you should consider what you prefer based on your own testing.
  6. Use Sequence containers to organise package structure into logical units of work. This makes it easier to identify what the package does and also helps to control transactions if they are being implemented.
  7. Where possible, use expressions on the SQLStatementType property of the Execute SQL Task instead of parameterised SQL statements. This removes ambiguity when different OLE DB providers are being used. It is also easier. (UPDATE: There is a caveat here. Results of expressions are limited to 4000 characters so be wary of this if using expressions ).
  8. If you are implementing custom functionality try to implement custom tasks/components rather than use the script task or script component. Custom tasks/components are more reusable than scripted tasks/components. Custom components are also less bound to the metadata of the pipeline than script components are.
  9. Use caching in your LOOKUP components where possible. It makes them quicker. Watch that you are not grabbing too many resources when you do this though.
  10. LOOKUP components will generally work quicker than MERGE JOIN components where the 2 can be used for the same task (http://blogs.conchango.com/jamiethomson/archive/2005/10/21/2289.aspx).
  11. Always use DTExec to perf test your packages. This is not the same as executing without debugging from SSIS Designer (http://www.sqlis.com/default.aspx?84).
  12. Use naming conventions for your tasks and components. I suggest using acronymns at the start of the name and there are some suggestions for these acronymns at the end of this article. This approach does not help a great deal at design-time where the tasks and components are easily identifiable but can be invaluable at debug-time and run-time.  e.g. My suggested acronymn for a Data Flow Task is DFT so the name of a data flow task that populates a table called MyTable could be "DFT Load MyTable".
  13. If you want to conditionally execute a task at runtime use expressions on your precedence constraints. Do not use an expression on the "Disable" property of the task.
  14. Don't pull all configurations into a single XML configuration file. Instead, put each configuration into a seperate XML configuration file. This is a more modular approach and means that configuration files can be reused by different packages more easily.
  15. If you need a dynamic SQL statement in an OLE DB Source component, set AccessMode="SQL Command from variable" and build the SQL statement in a variable that has EvaluateAsExpression=TRUE. (http://blogs.conchango.com/jamiethomson/archive/2005/12/09/2480.aspx)
  16. When using checkpoints, use an expression to populate the CheckpointFilename property which will allow you to include the value returned from System::PackageName in the checkpoint filename. This will allow you to easily identify which package a checkpoint file is to be used by.
  17. When using raw files and your Raw File Source Component and Raw File Destination Component are in the same package, configure your Raw File Source and Raw File Destination to get the name of the raw file from a variable. This will avoid hardcoding the name of the raw file into the two seperate components and running the risk that one may change and not the other.
  18. Variables that contain the name of a raw file should be set using an expression. This will allow you to include the value returned from System::PackageName in the raw file name. This will allow you to easily identify which package a raw file is to be used by. N.B. This approach will only work if the Raw File Source Component and Raw File Destination Component are in the same package.
  19. Use a common folder structure (http://blogs.conchango.com/jamiethomson/archive/2006/01/05/2559.aspx)
  20. Use variables to store your expressions (http://blogs.conchango.com/jamiethomson/archive/2005/12/05/2462.aspx). This allows them to be shared by different objects and also means you can view the values in them at debug-time using the Watch window.
  21. Keep your packages in the dark (http://www.windowsitpro.com/SQLServer/Article/ArticleID/47688/SQLServer_47688.html). In summary, this means that you should make your packages location unaware. This makes it easier to move them across environments.
  22. If you can, filter your data in the Source Adapter rather than filter the data using a Conditional Split transform component. This will make your data flow perform quicker.
  23. When storing information about an OLE DB Connection Manager in a configuration, don't store the individual properties such as Initial Catalog, Username, Password etc... just store the ConnectionString property.
  24. Your variables should only be scoped to the containers in which they are used. Do not scope all your variables to the package container if they don't need to be.
  25. Employ namespaces for your packages
  26. Make log file names dynamic so that you get a new logfile for each execution.
  27. Use ProtectionLevel=DontSaveSensitive. Other developers will not be restricted from opening your packages and you will be forced to use configurations (which is another recommended best practice)
  28. Use annotations wherever possible. At the very least each data-flow should contain an annotation.
  29. Always log to a text file, even if you are logging elsewhere as well. Logging to a text file has less reliance on external factors and is therefore most likely to contain all informatoin required for debugging.
  30. Create a new solution folder in Visual Studio Solution Explorer in order to store your configuration files. Or, store them in the 'miscellaneous files' section of a project.
  31. Always use template packages to standardise on logging, event handling and configuration.
  32. If your template package contains variables put them in a dedicated namespace called "template" in order to differentiate them from variables that are added later.
  33. Break out all tasks requiring the Jet engine (Excel or Access data sources) into their own packages that do nothing but that data flow task. Load the data into Staging tables if necessary. This will ensure that solutions can be migrated to 64bit with no rework.  (Thanks to Sam Loud for this one. See his comment below for an explanation)
  34. Don't include connection-specific info (such as server names, database names or file locations) in the names of your connection managers. For example, "OrderHistory" is a better name than "Svr123ABC\OrderHist.dbo".

 

 


The acronymns below should be used at the beginning of the names of tasks to identify what type of task it is.

Task

Prefix

For Loop Container

FLC

Foreach Loop Container

FELC

Sequence Container

SEQC

ActiveX Script

AXS

Analysis Services Execute DDL

ASE

Analysis Services Processing

ASP

Bulk Insert

BLK

Data Flow

DFT

Data Mining Query

DMQ

Execute DTS 2000 Package

EDPT

Execute Package

EPT

Execute Process

EPR

Execute SQL

SQL

File System

FSYS

FTP

FTP

Message Queue

MSMQ

Script

SCR

Send Mail

SMT

Transfer Database

TDB

Transfer Error Messages

TEM

Transfer Jobs

TJT

Transfer Logins

TLT

Transfer Master Stored Procedures

TSP

Transfer SQL Server Objects

TSO

Web Service

WST

WMI Data Reader

WMID

WMI Event Watcher

WMIE

XML

XML

 

 These acronymns should be used at the beginning of the names of components to identify what type of component it is.

Component

Prefix

DataReader Source

DR_SRC

Excel Source

EX_SRC

Flat File Source

FF_SRC

OLE DB Source

OLE_SRC

Raw File Source

RF_SRC

XML Source

XML_SRC

Aggregate

AGG

Audit

AUD

Character Map

CHM

Conditional Split

CSPL

Copy Column

CPYC

Data Conversion

DCNV

Data Mining Query

DMQ

Derived Column

DER

Export Column

EXPC

Fuzzy Grouping

FZG

Fuzzy Lookup

FZL

Import Column

IMPC

Lookup

LKP

Merge

MRG

Merge Join

MRGJ

Multicast

MLT

OLE DB Command

CMD

Percentage Sampling

PSMP

Pivot

PVT

Row Count

CNT

Row Sampling

RSMP

Script Component

SCR

Slowly Changing Dimension

SCD

Sort

SRT

Term Extraction

TEX

Term Lookup

TEL

Union All

ALL

Unpivot

UPVT

Data Mining Model Training

DMMT_DST

DataReader Destination

DR_DST

Dimension Processing

DP_DST

Excel Destination

EX_DST

Flat File Destination

FF_DST

OLE DB Destination

OLE_DST

Partition Processing

PP_DST

Raw File Destination

RF_DST

Recordset Destination

RS_DST

SQL Server Destination

SS_DST

SQL Server Mobile Destination

SSM_DST

 

 

 

Comments

 

Raymond Lewallen said:

You have to edit the CommunityServer.config file and tell it that you don't want specific attributes removed.
January 5, 2006 3:04 PM
 

Nick Barclay said:

Best gem (er, nugget) yet, Jamie. Thanks!

Just when are they announcing the next round of MVPs?

Cheers,
Nick
January 5, 2006 9:21 PM
 

Professional Association for SQL Server (PASS) SIG said:

January 7, 2006 7:18 AM
 

Martin said:

On the BP, you said use SQL Server Destination.
Is it possible to use SQL Server Destination when i design and run in VS? I get following error when I run/debug the package.

[SQL Server Destination [8775]] Error: An OLE DB error has occurred. Error code: 0x80040E14.
An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80040E14
Description: "Could not bulk load because SSIS file mapping object 'Global\DTSQLIMPORT
' could not be opened. Operating system error code 2(The system cannot find the file specified.).
Make sure you are accessing a local server via Windows security.".
January 9, 2006 11:27 PM
 

jamie.thomson said:

Martin,
Read BOL. There are limitations to using SQL Server Destination which is why I said use it where possible.

-Jamie
January 16, 2006 1:04 PM
 

Jamie Thomson - Life, the universe and SSIS! said:

The OLE DB Source component allows a number of methods for extracting data from an OLE DB Source. The...
February 21, 2006 10:13 PM
 

Jamie Thomson - Life, the universe and SSIS! said:

The OLE DB Source component allows a number of methods for extracting data from an OLE DB Source. The...
February 21, 2006 10:15 PM
 

Professional Association for SQL Server (PASS) SIG said:

February 22, 2006 9:12 PM
 

Darren Gosbell's Random Procrastination said:

April 12, 2006 3:19 PM
 

Darren Gosbell's Random Procrastination said:

April 17, 2006 12:36 AM
 

SSIS Junkie said:

In this blog's history I have suggested naming conventions for tasks and components and just lately

November 9, 2006 5:57 AM
 

CodePosta said:

Just for those that made it here looking for help with the error:

"Microsoft SQL Native Client" Hresult: 0x80040E14 Description: "Could not bulk load because SSIS file mapping object 'Global\DTSQLIMPORT"

I found that you must leave the checkbox checked when you edit the Connection String of your Package's Connection Manager for the local host.  Unchecking will use the default package settings, even though your edits remain in the Connection String of the Connection Manager.

Hope it helps,

Posta

February 8, 2007 6:02 PM
 

subhash said:

Hi Darren,

My DTS Package that has now been migrated to SSIS  has the following tasks:

1) An ActiveX Script

opkg= DTSGlobalVariables.Parent

DTSGlobalVariables("_PackageLogName") =  oPkg.Name & ".dts"

How can I convert this part

2) A Dynamic Properties Task

how I can replace this task to set value of each parameter

3) A Source and Destination connection

It has lost the passwords while migrating, how can I preserve these passwords while migrating

March 6, 2007 4:36 PM
 

jamie.thomson said:

Please send these queries to the SSIS forum.

My name is not Darren.

March 6, 2007 4:56 PM
 

SSIS Junkie said:

Someone recently left a comment on my blog about Package Template Locations asking if I could share my

March 11, 2007 3:36 AM
 

SQL Server tools said:

I was looking for few notes and information for SSIS due to a recent development at a client's site.

March 27, 2007 11:49 PM
 

Professional Association for SQL Server (PASS) SIG said:

April 4, 2007 3:00 PM
 

SSIS Junkie said:

For reasons that I'll save until another post , I always deploy my packages as files rather than

April 20, 2007 6:10 PM
 

AiM said:

Regarding the limitation on the SQL Server Destination go to:

http://msdn2.microsoft.com/en-us/library/ms141095.aspx

April 29, 2007 8:12 PM
 

SSIS Junkie said:

I've been blogging on this site for just over 2 and a half years now and thought now would be a good

June 27, 2007 3:50 AM
 

Bill said:

SSIS sucks

August 7, 2007 9:47 PM
 

Peter said:

Great resource Jamie, thanks.

However, I came into some trouble that may also apply to others, and perhaps warrants a caveat in your best practices. The issue lies in the combination of 5 (using SQL native client as the destination for data transfer tasks) and 21 (keep packages in the dark : location unaware). If you use SQL native client as the destination, the package will only work when being run on that destination server. The primary problem with that is that in development, I cannot develop from my laptop, then test with a quick right click execute package on a test server. I would have to deploy the package, or develop on the server via remote desktop.

In the end, I think I will just use OLEDB as destinations, and retain the flexibility that the dynamic destinations offers me.

Unless I am missing something... (wouldn't be a first) !?

Many greetings, Peter, Bahrain

September 4, 2007 10:11 AM
 

jamie.thomson said:

Peter,

Yes, you are missing something :)

#5 doesn't mention anything about SQL Server native client. it mentions SQL Server Destination - that's something different.

Simple answer is that SQL Server Destination only works if you execute the package on the same server that you are inserting to. There's lots of information elsewhere on this blog that explains why! That is why #5 says "where possible".

-Jamie

September 4, 2007 4:41 PM
 

andrew said:

September 6, 2007 4:33 AM
 

andrew said:

September 6, 2007 4:34 AM
 

Bill Caruthers said:

Jamie,

I'm having issues using connection strings with the /SET option of the command line DTEXEC utility.  The issue seems to be caused by the excessive use of semicolons to seperate the different parts of the connection string.  Semicolons are also used in /SET to seperate the name of the property from the value.  I'm including an example here:

DTEXEC /DTS "\MSDB\DBTransfer" /SERVER MYSERVER /MAXCONCURRENT " -1 " /CHECKPOINTING OFF  /REPORTING V /SET \Package.Connections[OLE_DST_ContractMaster].Properties[ConnectionString]

;%SOURCE_SQLSERVER%

where %SOURCE_SQLSERVER% is an environement variable that I have pre-configured in the batch file that calls this DTEXEC.  The Connection string get set to the following prior to this call:

@SET SOURCE_SQLSERVER=Data Source=SRCSERVER;Initial Catalog=master;Provider=SQLNCLI.1;Integrated Security=SSPI;Auto Translate=False;

Is there any way to "escape" the semicolons in a connection string when used in this way?  

Bill

September 27, 2007 2:46 AM
 

SSIS Junkie said:

SQL Server MVP Matthew Roche has been interviewed by Greg Low (another MVP) on SQLDownUnder.com . You

November 23, 2007 7:53 AM
 

Jon said:

You are the man.

Thanks for sharing all of this. You blog continues to be a great resource!

December 3, 2007 9:37 PM
 

Jerry said:

Regarding to 7: Where possible, use expressions on the SQLStatementType property of the Execute SQL Task instead of parameterised SQL statements. This removes ambiguity when different OLE DB providers are being used. It is also easier

There is no SQLStatementType property. Do you mean SQLStatementSource or SQLStatementSourceType? I guess you mean SQLStatementSource. Am I correct? Thanks

December 10, 2007 3:42 AM
 

Tod means Fox | SSIS Debugging ??? Find in Files, Naming Conventions, and a Problem Solved! said:

March 3, 2008 7:22 PM
 

pritesh said:

Hi Jamie , should we have a single config file or table for all the packages running on a server ? I want to store values like mailing addresses in send mail task , log file destination folder path , or all the values which are common in all the packages . I think it makes sense to have it in one place .. want to know what do you feel about it ..

March 29, 2008 2:08 PM
 

pritesh said:

Hi Jamie , should we have a single config file or table for all the packages running on a server ? I want to store values like mailing addresses in send mail task , log file destination folder path , or all the values which are common in all the packages . I think it makes sense to have it in one place .. want to know what do you feel about it ..

March 29, 2008 2:08 PM
 

jamie.thomson said:

Pritesh,

I agree that it makes sense for something to be defined in only one place. You're in dangerous territory if you do anything else.

-Jamie

March 31, 2008 7:48 AM
 

pritesh said:

Thanks Jamie , i must say that your blogs are fantastic. You have great depth of knowledge about SSIS and i appreciate your efforts for sharing your knowledge with everyone here.

March 31, 2008 10:11 AM
 

pritesh said:

Hi ,

  I am trying to implement indirect configurations in the packages. I have my configurations stored in a table. I need to add the path to the config table in a config file (xml). At runtime i need my package to read the config file to get the path to the table where the configurations are stored.

   I need to know how can i store the config table information in another config file.

thanks in advance

April 2, 2008 5:33 PM
 

jamie.thomson said:

Pritesh,

You can't.

Think about it. How can you define the location of one configuration using another configuration? That makes no sense at all.

Actually in theory this would work by defining them in a certain order in your package because configurations are evaluated in the order that they are declared. However, NEVER DO this. There is no guarantee that this will be true in the future.

-Jamie

April 3, 2008 1:38 AM
 

pritesh said:

Thanks Jamie,

         I had reached to the same conclusion , just wanted your expert opinion on it.Definately dosent make sense.

April 3, 2008 4:04 AM
 

pritesh said:

What according to you is a better way of storing packages ? on MSDB or on FILE SYSTEM ? what are advantages and disadvantages of it.

April 5, 2008 5:28 AM
 

jamie.thomson said:

April 5, 2008 11:24 AM
 

Geir Morten Allum's MS application platform hvor, hva, når, hvorfor, osv... said:

Det viktig at man tidlig i prosjektene tar hansyn til navnestandarder og får dette på plass i prosjekthåndbøker

April 30, 2008 3:44 PM
 

DaveH said:

I'd hoped to see some guidelines to best-practice managing SSIS processes.....

I want to structure my data loads so that every input table stores a record count and hash total. This should give a base value for controls in the restructured DWH to match. This is something I used to do in DTS2000.

Apart from the fact its taking me ages to get this working - would you think this is a good approach - or is there some better method that SSIS uses for these types of record and value controls?

May 20, 2008 3:59 PM
 

jamie.thomson said:

Dave,

Not really sure what you mean. Do you want an audit table that holds information about number of rows loaded?

What isa  hash total in this context?

-jamie

May 20, 2008 4:05 PM
 

DaveH said:

I was trying to say I would have a table that I update on each run with, for example..

Source         RecordCount    HashTotal

---------------------------------------------

Employees    200                 150000

Sales           15000              500000

Invoices       45000              3750000

Products      700                  22000

After the data has been restructured (say into sales fact tables) I want to check that I still have the same number of employees, value of sales, invoices etc and matching amounts (i.e. no bad joins doubling values) - does this make sense?

I'm looking at this from the point of auditting - i.e. all that is in the source system makes it into the DataWarehouse

Do you do this sort of thing?

May 20, 2008 5:19 PM
 

jamie.thomson said:

DaveH,

yeah, I do very similar things. Have you explored the use of OnPostExecute EventHandlers? They are ideal for this scenario.

-Jamie

May 20, 2008 5:27 PM
 

Daily del.icio.us for May 20th through May 24th — Vinny Carpenter’s blog said:

May 24, 2008 11:40 PM
 

Sam Loud said:

Jamie, I've just come across a pretty common scenario that you may wish to add to your very useful list of best practices.

Backstory:

I had designed a fairly straightforward SSIS solution to load a Data Warehouse. As is usual, some of the data was external reference data, held in Excel sheets, that needed to be loaded in.

A typical package would have a connection manager to the Excel file. The Excel file is pulled into a staging table, then later on in the workflow, the contents of that staging table are merged into the dataflow, and end up in the destination table.  Imagine there were several packages using that basic setup.

All works fine. Then I need to port it all to a new x64 server. As there is no 64bit Jet driver, all of the Excel connection managers break. To get around it I created a set of new packages that contained just the tasks that required Jet. Effectively, these packages did nothing but load the staging tables. I also removed all the Jet tasks from the main packages. Now the 'PreLoad' Packages that require Jet can be run with the 32bit DTExec.exe. Then the main packages can be run in 64bit.

Now, all of this is a pretty well trodden path. But sorting it out wasted a fair bit of time, and I was annoyed I didn't think of it in advance. My best practice advice would be something like this:

"Break out all tasks requiring the Jet engine (Excel or Access data sources) into their own packages that do nothing but that data flow task. Load the data into Staging tables if necessary. This will ensure that solutions can be migrated to 64bit with no rework."

Does this make sense?

June 25, 2008 12:18 PM
 

jamie.thomson said:

Sam,

It makes perfect sense. Thank you very much for your input - I've added it to the post.

Its great to see other people contributing to this. Thanks again.

-Jamie

June 25, 2008 12:31 PM
 

Denis Gobo said:

I asked for some names of people who you would like to see interviewed here at Sqlblog and Jamie Thomson's

July 2, 2008 3:57 PM
 

SirClutzz said:

In relation to points 17 and 18 using RAW files, I have been unable to use variables, expressions, or any other variable path with the RAW file source or destination, or have I been able to find anything online about it being supported. Do you have any more information on how to do this?

Thanks

SirClutzz

July 3, 2008 3:58 AM
 

SirClutzz said:

I could slap myself, I just found how to do this, thanks anyway.. ;)

July 3, 2008 4:05 AM
 

bfilppu said:

when using templates, don't forget to generate a new ID for the package along with a new name.  If you record the packageID in your logs and don't change it, all of your packages will have the same ID and you'll never be able to figure things out...

July 22, 2008 1:16 AM
 

bill said:

More of question in getting started.  Do you store the packages on the same server as integration services engine?  Any utility to having them located on a different server?

August 26, 2008 2:31 PM
 

jamie.thomson said:

Bill,

I generally store my packages on the File System.

-Jamie

August 26, 2008 4:04 PM
 

Strate SQL said:

Jamie Thompson posted some best practices and naming conventions for SQL Server Integration Services a bit back. Interesting ideas and for some reason I hadn't ever really thought about naming conventions within SSIS until I read these articles. ...

November 14, 2008 3:56 AM
 

Best practices | keyongtech said:

January 22, 2009 7:44 AM
 

SSIS Resources said:

January 29, 2009 8:40 AM
 

drowned in code said:

Beberapa waktu yang lalu saya melakukan kesalahan besar dengan project SSIS yang saya kerjakan. Yang

March 19, 2009 4:22 AM
 

SSIS Junkie said:

In October 2004 I was in Orlando airport returning home from the annual SQL PASS summit and I happened

August 29, 2009 11:30 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems