Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS: Data Profiling Task: Part 2 - Column Length Distribution

In this blog entry I am going to outline the function of the Column Length Distribution profile request in the new Data Profiling Task in SQL Server 2008. BOL states about this profile request:

Reports all the distinct lengths of string values in the selected column and the percentage of rows in the table that each length represents.

This profile helps you identify problems in your data, such as values that are not valid. For example, you profile a column of United States state codes that should be two characters and discover values longer than two characters.

 

I am testing this out on the [HumanResources].[Department].[GroupName] column in the AdventureWorks sample database. Here are the contents of that column:

image

Here is how I have configured the task:

image

Note that you have some additional options (highlighted) that let you define the behaviour of this profile request.

 

After running this task, here is the output:

image

OK, there's a few things to notice about the output here:

  • There are 6 distinct lengths of values in the [HumanResources].[Department].[GroupName] column
  • Minimum length and Maximum length are both called out explicitly
  • We can order the results by value length by clicking on a column heading
  • The number of rows that have a particular length for the value in [GroupName] are shown as a percentage of all the rows in the table.

 

This concludes my brief summary of the Column Length Distribution profile request in the Data Profiling Task that is being introduced with SQL Server Integration Services 2008. For more posts in this series please refer to SSIS: Data Profiling Task: Part 1 - Introduction

-Jamie

 

Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.

Comments

 

SSIS Junkie : SSIS: Data Profiling Task: Part 1 - Introduction said:

March 2, 2008 20:20
 

Tod means Fox | ETL Subsystem 1: Data Profiling said:

March 18, 2008 11:57
 

Douglas Laudenschlager said:

Microsoft Research contributed the Data Profiling Task and Data Profile Viewer to Integration Services

January 20, 2009 23:46
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems