Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSIS Nugget: Extract last word from a sentance using regular expressions

About a month ago I wrote this very simple blog post explaining how one might go about stripping off the last word from a sentance in the pipeline. My Conchango colleague James Saull thought that this would be ripe for a regular expression (and he's right) and asked if we could we apply regular expressions in SSIS.

The answer is yes, regular expressions can be applied easily inside SSIS. it involves writing some .Net code and sometimes that can scare off database type people but I figured it would make a great little demo for two reasons:

  • Show how easy it is to apply .Net functionality into SSIS
  • Demonstrate that there is often more than one way to solve a problem in SSIS.

 

I've used the trusty script component to demo this although you could write your own custom component assembly. Here's the code:

    1 Imports System

    2 Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    3 Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    4 Imports System.Text.RegularExpressions  'We need to reference the Regex namespace

    5 

    6 Public Class ScriptMain

    7     Inherits UserComponent

    8 

    9     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

   10         Row.LastWord = GetLastWord(Row.Sentance)

   11     End Sub

   12 

   13     Private Function GetLastWord(ByVal sentance As String) As String

   14         'Here's my regex string

   15         Dim regexStr As String = "\b\w*$"

   16 

   17         'Here's where the regular expression is applied

   18         Dim match As Match = Regex.Match(sentance, regexStr)

   19 

   20         Return match.Value

   21     End Function

   22 End Class

There's really not much to it. All I've done is:

  • Referenced System.Text.RegularExpressions
  • Write a function GetLastWord() that takes a sentance and returns the last word in that sentance
  • Defined a regular expression (\b\w*$) that will pick out the last word    (James supplied the expression for me)
  • Passed the returned value from GetLastWord() back into the pipeline

Here's the obligatory screenshot of it working: 

This isn't meant to be a tutorial for writing regular expressions so I'm sure you can come up with something more sophisticated than the one used here. The aim is to demonstrate the extensibility of the core pipeline functionality.

If regular expressions in the SSIS pipeline interests you don't forget to check out Darren and Allan's Regular Expression component add-in for SSIS at http://www.sqlis.com/. I also have an old post here that gives another example of using regular expressions, this time with post codes.

I've attached the demo package that I built for this post here.

-Jamie

Published Tuesday, December 19, 2006 8:45 AM by jamie.thomson
Attachment(s): 20061219lastWordUsingRegex.zip

Comments

 

Justin B said:

I've just plugged this into a dataload of approx 5.5 million rows for which I needed to strip the last word from a row as it holds a package code, unfortunately if the last word is wrapped in brackets etc it's not loaded, also "1.42.5" returns "5".

A very simple way to do it (though poss more load) would be to string split by space character into an array then return the last element of the array, which would guarantee the last word by spacing would be returned.

July 1, 2008 3:54 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems