Welcome to EMC Consulting Blogs Sign in | Join | Help

Christian Wade's Blog

SQL Server 2005: Recursive Hierarchies to XML - CTEs vs. UDFs

Suppose we have a recursive hierarchy stored in a relational database and we want to write it to XML.  This might be for a variety of reasons – e.g. as a pre-cached input to a UI control, to export to another system using a pre-defined format, etc.

 

In SQL Server 2000, in order to get it straight to XML using FOR XML EXPLICIT, we would have to know the depth of the lowest node beforehand (without doing some very ugly dynamic SQL), so this does not help us.

 

It would be useful to access the data in the same order that it will appear in the XML.  I.e.

  • Node1
    • Node2
      • Node3
      • Node4
    • Node5

 

Getting at the data in this order will allow us to iterate through the nodes in sequential order.  This avoids using the DOM and is significantly quicker and more efficient as it avoids loading the whole structure into memory.

 

We could achieve this in SQL Server 2000 using a recursive table-valued UDF.  In SQL Server 2005, we also have the option of using a recursive Common Table Expression (CTE) to achieve the same functional result.  Let’s compare the two ways of doing it.

 

A CTE is a temporary named resultset referenced by a subsequent “outer query”.  They can provide similar functionality to views and derived tables, but their real value is in recursive queries.  Recursive CTE’s contain an “anchor member” and a “recursive member”, which are connected by a UNION ALL operator.  They can be encapsulated by UDFs for reusability.

 

 

 

Data Preparation

Let’s create a table and insert hierarchical values.

 

CREATE TABLE Employees

(

  empid   int         NOT NULL,

  mgrid   int         NULL,

  empname varchar(25) NOT NULL,

  CONSTRAINT PK_Employees PRIMARY KEY(empid),

  CONSTRAINT FK_Employees_mgrid_empid

    FOREIGN KEY(mgrid)

    REFERENCES Employees(empid)

)

CREATE INDEX idx_nci_mgrid ON Employees(mgrid)

 

SET NOCOUNT ON

INSERT INTO Employees VALUES(1 , NULL, 'Nancy')

INSERT INTO Employees VALUES(2 , 1   , 'Andrew')

INSERT INTO Employees VALUES(3 , 1   , 'Janet')

INSERT INTO Employees VALUES(4 , 1   , 'Margaret')

INSERT INTO Employees VALUES(5 , 2   , 'Steven')

INSERT INTO Employees VALUES(6 , 2   , 'Michael')

INSERT INTO Employees VALUES(7 , 3   , 'Robert')

INSERT INTO Employees VALUES(8 , 3   , 'Laura')

INSERT INTO Employees VALUES(9 , 3   , 'Ann')

INSERT INTO Employees VALUES(10, 4   , 'Ina')

INSERT INTO Employees VALUES(11, 7   , 'David')

INSERT INTO Employees VALUES(12, 7   , 'Ron')

INSERT INTO Employees VALUES(13, 7   , 'Dan')

INSERT INTO Employees VALUES(14, 11  , 'James')

 

 

Recursive Table-Valued UDF Example

  

CREATE FUNCTION dbo.fn_EmpRec

(

      @empid      int,

      @depth      int

)

RETURNS      @Emps TABLE

(

      empid       int,

      empname      varchar(25),

      mgrid       int,

      depth       int

)

AS

BEGIN

      -- insert current employee into working table

      INSERT INTO @Emps

            SELECT empid,

                     empname,

                     mgrid,

                     @depth

            FROM Employees

            WHERE empid = @empid

     

      -- holding variable to keep track of current child employee

      DECLARE @curempid int

     

      -- get the first child

      SELECT @curempid = MIN(empid)

      FROM Employees

      WHERE mgrid = @empid

     

      -- iterate each child and make the recursive call

      WHILE @curempid IS NOT NULL

      BEGIN

            INSERT INTO @Emps

                  SELECT *

                  FROM dbo.fn_EmpRec(@curempid, @depth + 1)

                 

            SELECT @curempid = MIN(empid)

                  FROM Employees

                  WHERE empid > @curempid AND

                          mgrid = @empid

      END

      RETURN

END

GO

 

And if we run the following query,

 

SELECT

  REPLICATE('|         ', depth)

    + '(' + (CAST(empid AS VARCHAR(10))) + ') '

    + empname AS empname

FROM dbo.fn_EmpRec(1, 0)

 

The resultset is:

 

Recursive UDF Result

 

Great!  - (until we try it on 25,000 rows or above that is)

 

 

 

Common Table Expression Example

 

Now let’s do it the CTE way.

 

WITH EmpCTE(empid, empname, mgrid, depth, sortcol)

AS

(

  -- anchor member

  SELECT empid, empname, mgrid, 0,

    CAST(empid AS VARBINARY(900))

  FROM employees

  WHERE empid = 1

 

  UNION ALL

 

  -- recursive member

  SELECT E.empid, E.empname, E.mgrid, M.depth+1,

    CAST(sortcol + CAST(E.empid AS BINARY(4)) AS VARBINARY(900))

  FROM Employees AS E

    JOIN EmpCTE AS M

      ON E.mgrid = M.empid

)

-- outer query

SELECT

  REPLICATE('|         ', depth)

    + '(' + (CAST(empid AS VARCHAR(10))) + ') '

    + empname AS empname

FROM EmpCTE

ORDER BY sortcol

 

 

This returns the same resultset as the recursive UDF.

 

Once you get used to CTEs, this is significantly easier to write than the UDF.  The key thing to notice here is the varbinary sortcol column on which the query is sorted.  It is the full hierarchical chain of empids leading to the current node appended together in order.

 

By checking the value of GETDATE() before and after running the UDF/CTE, we can see a significant performance gain with the CTE.  But I actually tried it on a hierarchy with over 50,000 members.  The UDF took over half an hour.  THE CTE SAW A PERFORMANCE GAIN OF ABOUT 90% !

 

For further details, refer to Itzik Ben-Gan's TSQL Enhancements in SQL Server 2005 white paper.

 

 

 

 

 

Published 09 November 2004 13:16 by christian.wade
Attachment(s): RecursiveHierarchiesToXml.zip

Comments

 

Ruben said:

Hello,

I am new to this blog and glad to have found it.  

I am attempting to write a sql script that will generate a Parent-Child Hierarchy report and tried using the script I found on your blog (modified to reference our columns) but i it generates an error which I just can't figure out the cause of it (Msg 245, Level 16, State 1, Line 5

Conversion failed when converting the varchar value 'V5501XX-F' to data type int.).  Below is the modified script and any help would be greatly appreciated.

Thanks!

           WITH EmployeeCTE(MasterKey, masterpartno, partno, Depth, SortCol)

           AS

           (

             SELECT MasterKey, masterpartno, partno,  0, CAST(MasterKey AS varbinary(max))

             FROM Subassemblies

             WHERE MasterKey = 1208

             UNION ALL

             SELECT E.MasterKey, E.masterpartno, E.partno, M.Depth+1, CAST(SortCol + CAST(E.MasterKey AS binary(4)) AS varbinary(max))

             FROM Subassemblies AS E

                       JOIN EmployeeCTE AS M

                         ON E.masterpartno = M.MasterKey

           )

           SELECT

             MasterKey, masterpartno, partno, Depth

           --,SortCol

           FROM EmployeeCTE

           ORDER BY SortCol

October 2, 2009 20:00
Anonymous comments are disabled
Powered by Community Server (Personal Edition), by Telligent Systems