Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Once upon a time this blog was a hive of activity. Now however its pretty lifeless as you can probably tell so if are pining for more of the same you can find me over at http://sqlblog.com/blogs/jamie_thomson. I look forward to seeing you there!

SSIS: Data Profiling Task: Part 9 - Value Inclusion

In this blog entry I am going to outline the function of the Value Inclusion profile request in the new Data Profiling Task in SQL Server 2008. BOL states about this profile request:

Computes the overlap in the values between two columns or sets of columns. This profile can determine whether a column or set of columns is appropriate to serve as a foreign key between the selected tables.

This profile also helps you identify problems in your data, such as values that are not valid. For example, you profile the ProductID column of a Sales table and discover that the column contains values that are not found in the ProductID column of the Products table.

Or to put it another way, it tells you if one set of values is a subset of another.

Again, I don't really have any good test data for this so I'm going to create some of my own. If you've read the other blog entries in this thread then you'll know that the [HumanResources].[Department].[GroupName] field has got 6 distinct values in it. I'm going to load 4 of them into a new table:

select    top 4 GroupName
into    ValueInclusionTest
from    HumanResources.Department
group    by GroupName

Here's how I configure the task:

image

I've highlighted the important bits. We're finding out which values in [HumanResources].[Department].[GroupName] do not exist in [dbo].[ValueInclusionTest].[GroupName]. Also note that I've set InclusionThresholdSetting='None' in order to ensure that we get some meaningful data back. You can fine tune that and other properties to make sure that the task works for you. Here's the results:

image

So, we know that not all of the values in [HumanResources].[Department].[GroupName] will exist in [dbo].[ValueInclusionTest].[GroupName] and hence that's why we've got an inclusion strength of 68.75%. We also get an 'Inclusion Violations' section showing us the offending values.

 

This concludes my brief summary of the Value Inclusion profile request in the Data Profiling Task that is being introduced with SQL Server Integration Services 2008. For more posts in this series please refer to SSIS: Data Profiling Task: Part 1 - Introduction

-Jamie

 

Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.

Comments

 

SSIS Junkie : SSIS: Data Profiling Task: Part 1 - Introduction said:

April 16, 2008 16:25
 

Douglas Laudenschlager said:

Microsoft Research contributed the Data Profiling Task and Data Profile Viewer to Integration Services

January 20, 2009 23:46
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems