Welcome to EMC Consulting Blogs Sign in | Join | Help

Dave Morris' Blog

Scalar User Defined Functions and Computed Columns

Came across an interesting SQL problem where a customer has a table with a primary key and X numerical fields.  For each record the maximum value in these X fields was needed.

The issue was how to accomplish this efficiently?  Or more accurately the least inefficient way of doing it is.
One option suggested was to have to UNION the table with itself X times and then do a MAX aggregation since there was a lot of data.
 
The more interesting option, which it turns out not many people are aware of, is to use a Scalar User Defined Function (UDF) to calculate the maximum value.
 
Here's an example of this in action:
 
CREATE FUNCTION CubicVolume
-- Input dimensions in centimeters
   (@CubeLength decimal(4,1), @CubeWidth decimal(4,1),
    @CubeHeight decimal(4,1) )
RETURNS decimal(12,3) -- Cubic Centimeters.
AS
BEGIN
   RETURN ( @CubeLength * @CubeWidth * @CubeHeight )
END
CREATE TABLE Bricks
   (
    BrickPartNmbr   int PRIMARY KEY,
    BrickColor      nchar(20),
    BrickHeight     decimal(4,1),
    BrickLength     decimal(4,1),
    BrickWidth      decimal(4,1),
    BrickVolume AS
              (
               dbo.CubicVolume(BrickHeight,
                         BrickLength, BrickWidth)
              )
   )
 
Basically the scalar UDF is called for each row returned from a SELECT.  The column itself is not actually stored in the SQL Server table.  Any INSERT and UPDATE on the table do not calculate the value for the computed column.  Now obviously this is an overhead that may not be acceptable if the table is very static and read a lot.  Here, it may be more efficient to perform the calculation once on INSERT or UPDATE of the table with INSTEAD OF triggers or another similar method.
 
 
 
Published 07 March 2005 15:40 by dave.morris

Comments

 

dave.morris said:

Whilst I would agree that this is a very neat approach in most cases the performance impact will be so great that this approach should be avoided, since the function is called on a row by row basis and unfortunately this comes with a significant overhead.

In more complex examples you will be surprised what can be achieved with the CASE statement. It might look ugly but it almost certainly out performs any other approach.

What will be interesting to see is how a 2005 CLR function will compete against the good old CASE statement for example to determine the largest value across multiple columns.

What will be interesting to see is how a CLR function will compete against the good old CASE statement



March 9, 2005 12:42
 

dave.morris said:

I do appreciate the overhead of performing anything on a row basis vs. a set operation so as with anything in SQL be aware of what its doing and the overhead it introduces.

We did a bit of testing here, trying 3 methods - scalar function, UNION and CASE statement. The first two consistently completed in around 800ms, the CASE statement in around 140ms. As we would expect.

I will try this using a 2005 CLR but I suspect it will be similar to the scalar function since its a row based operation again. The interesting option in SQL 2005 will be the T-SQL enhancement for CTE (common table expressions).
March 10, 2005 08:56
 

KyawAM said:

Hi Dave,

I am not sure whether you considered the "Is Persisted" option in computed column spec. I hope this will improve the table performance, by rather computing the value upon SELECT, it might compute the value ahead in UPDATE or INSERT.

Tell me if I am wrong.

February 3, 2007 05:49
Anonymous comments are disabled
Powered by Community Server (Personal Edition), by Telligent Systems