Welcome to EMC Consulting Blogs Sign in | Join | Help

Claypole's World - The SQL Server Side

Concerning development of bespoke applications and database administration.

Calculating the ROI of DRY SQL vs FLY SQL

My colleague Jamie Thomson  posted an intriguing article on DRY SQL which you can read all about here. It raised a number of questions for me so I thought I’d post my own response to it.

Here is a list of some of these questions that sprung to my mind.

  1. Did the code change come up with the same query plan?
  2. Were the tables indexed in the same way? Do both have an index on asset class? Does it cover the query? (You’ll see why I ask below)
  3. Had the performance changed in anyway (for the better or worse)?
  4. Had the compilation time gone up or down?

I hope you can get the general drift of my thoughts.  Ostensibly they could be boiled down to two things.

  1. Was this really a re-write rather than a re-factoring?
  2. Was it worth it?

 Jamie’s illustration was an example based on some work he was doing for a client. Therefore he wasn’t really in a position to really answer all of these questions. I therefore came up with my own AdventureWorks based example which I’d like to take you through.

Before starting to cut any code I thought I’d set up my own hypothesis by answering the my own questions

  1. Yes I am now performing one sort and one aggregation instead of two of each.
  2. Have no idea – possibly both were indexed on asset class but maybe not covered
  3. Depends. Two separate sorts for the group by might be better than a single sort of one big dataset. On the other hand one operation for smaller volumes might be better.
  4. Might have gone down (SQL more easily to shred/understand to come up with the good enough plan)

Scenario

This is the code I used to create the scenario.  Before each run I cleaned out the cache and buffers and issued a check point for any dirty pages.

USE AdventureWorks;

SET STATISTICS TIME ON;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
CHECKPOINT;

/* WET SQL */
SELECT   LastName
        ,FirstName
        ,SUM(ContactID)
FROM    Person.Contact
WHERE   LastName = 'Zhou'
GROUP BY LastName
        ,FirstName
UNION ALL 
SELECT   LastName
        ,FirstName
        ,SUM(ContactID)
FROM    Person.Contact
WHERE   LastName = 'Edwards'
GROUP BY LastName
        ,FirstName;

DBCC FREEPROCCACHE ;
DBCC DROPCLEANBUFFERS;
CHECKPOINT;

/* DRY SQL */
SELECT   LastName
        ,Firstname
        ,SUM(ContactID)
FROM
(   SELECT   LastName
            ,FirstName
            ,ContactID
    FROM    Person.Contact
    WHERE   LastName = 'Zhou'
    UNION ALL
    SELECT   LastName
            ,FirstName
            ,ContactID
    FROM    Person.Contact
    WHERE   LastName = 'Edwards'
) a
GROUP BY LastName
        ,FirstName;

/*

CREATE NONCLUSTERED INDEX JRJ_NNCI_Contact_01 
ON Person.Contact(LastName);
CREATE NONCLUSTERED INDEX JRJ_NNCI_Contact_02 
ON Person.Contact(LastName,FirstName);

DROP INDEX Person.Contact.JRJ_NNCI_Contact_01;
DROP INDEX Person.Contact.JRJ_NNCI_Contact_02;

*/

Run 1

Let’s just run that sql as is to answer a fundamental question.  Is this a refactoring or is this a material change and therefore a re-write?

clip_image002

To me straight away two things are very obvious.

  1. The plans are different
  2. The costs are nigh on identical

Why are the plans different? Well because in the first instance (WET SQL) we asked them to be. We asked sql server to do the group by and then bring the results together.  Now you might see something like this on a distributed platform where the contact table was distributed. An example might be on Madison when it comes out. Processing the Group BY locally would work MUCH better than pulling all the data together and then issuing the group by.

So is this refactoring? That depends on your definition of the word. If your definition is based on whether the inputs and the output are the same only the guts have changed then yes this is a re-factoring. However, if your definition of re-factoring is whether you are making the same request but formatted differently for maintainability then no this is a re-write. Fundamentally this is a different request. The SQL might now be DRY and it does return the same data based on the same inputs but it is achieving this in a very different way.  Whether 1 is better than 2 will depend on data volumes amongst other things.

Irrespective of the change however, the performance is pretty much bang on the same.  At this level the ROI of making the SQL DRY doesn’t look like it’s paying off.  So rather than looking at making the SQL DRY how about making the SQL FLY?

As our baseline the subtree and compile costs of the WET SQL and the DRY SQL were as follows

  WET SQL DRY SQL
Sub tree cost 1.76 1.75
Compile Time
(CPU/Elapsed) ms
16/63 7/7

As we saw the Sub tree costs are nigh-on identical but the compile time is reduced. Tick in the box for the hypothesis step 4.

Let’s look at how investing in indexing might make more of a difference.

Run 2 - Index on Last Name only

Let’s add an index and see what happens

CREATE NONCLUSTERED INDEX JRJ_NNCI_Contact_01 ON Person.Contact(LastName);

This index won’t cover the query but it will help it allowing SQL Server to cut into the data more effectively:

clip_image002[4]

The index above has been added to AdventureWorks and the plan above has clearly picked it up.  The difference between the two is now at least  noticeable (but not much) in favour of the DRY SQL now but we are doing a lookup of the first name. However, what has happened to the actual performance?

  WET SQL DRY SQL
Sub tree cost 0.503 0.492
Compile Time
(CPU/Elapsed) ms
0/40 0/8

A significant drop in both sub tree costs and a drop in compile time for the WET SQL! However, we can take this further by covering the query.

Run 3 – Covered Index

New index this time covering the query by including first name as well as last name.

CREATE NONCLUSTERED INDEX JRJ_NNCI_Contact_02 ON Person.Contact(LastName,FirstName);

clip_image002[10]

Wow! Look at that SQL Server has now chosen the same plan for both queries. By any definition of re-factoring the DRY SQL hasn’t altered the query.  The Covering Index has forced order on the data so the group by is no longer necessary.  All that is left is to aggregate the data.

Notice DRY vs WET is not making a heap of difference to performance again with both Query costs weighing in at 50%. However, the lookups have been eliminated. Methinks a performance optimisation is in the offing….

  WET SQL DRY SQL
Sub tree cost 0.0069 0.0069
Compile Time
(CPU/Elapsed) ms
14/62 16/68

There we have it a truly massive improvement in performance! 0.0069 vs an original baseline of ~1.76.  However, notice that the DRY SQL compile time has now shot up! I ran this query several times just to be sure and every time the compile time was over 60ms.  SQL Server was definitely taking longer to find this plan.

Conclusion

No one can under-estimate the importance of manageable, maintainable code in production. Pages of SQL obscured by layer upon layer of view definitions is horrendous to have to unpick when there is an issue.  However, investing time in making SQL DRY may not give you any real performance benefit.  In fact in so doing you might alter the way in which SQL Server, or any RDBMS for that matter, interprets your request.  This might give you a performance improvement but it might also have little or no impact whatsoever. Just remember that it could and it could also make things worse.  The ROI of attempting to make your SQL DRY will therefore heavily depend on how maintainable the code was in the first place.

However, a little look at making your SQL FLY with a spot of index tuning might be just the ticket! The ROI in the examples above are pretty clear to me…

Cheers, James

Published Wednesday, July 15, 2009 4:53 PM by James.Rowland-Jones

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Andy Clark said:

Looking at the first sets of plans, I'm wondering if a multi-processor machine might process the two sorts in parallel hence decreasing the time to run?

July 16, 2009 1:45 PM
 

Log Buffer #154: a Carnival of the Vanities for DBAs | Pythian Group Blog said:

July 17, 2009 6:01 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About James.Rowland-Jones

James is an Advisory Practice Consultant with EMC Consulting. He works primarily with SQL Server technologies in architecture, development and administration capacities. He also co-ordinates our Internal SQL Server Community.

View James Rowland-Jones's profile on LinkedIn

Powered by Community Server (Personal Edition), by Telligent Systems