Welcome to EMC Consulting Blogs Sign in | Join | Help

Third Abnormal Form

Resolution to SQL Server problems. From the front line.

Forcing Order in an Execution Plan (SQL 2000) and Performance Coding Challenge

We have the following in SQL Server 2000

27 Partition Fact Tables containing Sales Data: Supplier, Product, Store and Day held in a Union All View SupplierProductStoreDay (over 1.35 Billion records)


We have another table that says ‘ProductPromotion’ which has a Start Date and End Date Promotion Period, when a Product is on Promotion


We require a Flag (ProductSaleWithinPeriodFlag that answers the question, this Product is on a Promotion, but did it have any sales YES/NO (1/0) on any Date during the Promotion Period

 

So the question that is needed to be answered is ‘Did we sell this Product in any Store during the Period?’ – The point is we know as SOON as we find the Sale in any Store (and in fact for any Supplier and Date within the Promotion Period), we can STOP searching for anymore.


However, if we do a MIN, MAX we still have to read all the data to determine the MIN or MAX figure, we just want a FIRST or ANY function (to check the existence), in SQL 2005 we could use a CLR Function to do this, but we need it in SQL 2000. And we want it as performant as we can get and we need to do it in a View, so the answer is determined at Run Time for loading overnight into an Analysis Services 2005 Dimension.


If anyone is interested I have a solution that solved this via UDF (however, using Table UDF – But as not as an In-line Function and not in the Where or Join Condition)


The time to return the Flag (Yes/No Sales) data for 86,000 ProductPromotions (Products on Promotion) against 27 Months worth of Sales Data (1,350,000,000 rows) is 25 Minutes – Which  is OK as it will be loaded into an Analysis Services Cube, so we only need it at Dimension Creation Time


The way I did was as follows

 

SELECT

X.ProductID,

            X.PromotionID,

            X.PromotionPeriodStartDateID,

            X.PromotionPeriodEndDateID,

            CASE WHEN MAX(Y.ProductID) IS NULL THEN 0 ELSE 1 END As ProductSaleWithinPeriodFlag

FROM dbo.MyProductPromotionUDF AS X

LEFT OUTER JOIN

            (

            SELECT

ProductID,

                        BusinessDateID

            FROM dbo.SupplierProductStoreDay – Union ALL View

            GROUP BY ProductID, BusinessDateID

            ) AS Y

ON Y.ProductID = X.ProductID

AND Y.BusinessDateID >= X.PromotionPeriodStartDateID AND Y.BusinessDateID <= X.PromotionPeriodEndDateID

GROUP BY

X.ProductID,

            X.PromotionID,

            X.PromotionPeriodStartDateID,

            X.PromotionPeriodEndDateID

 

It is important to use the dbo.MyProductPromotionUDF as a UDF Function to get the ‘working/filtered’ Product Promotion list as if you do not the Query Engine will evaluate ProductPromotion/SupplierProductStoreDay combination and then choose a  ‘really inefficient query engine’ and the time to work it out is INCREASED from 25 minutes to 3 hours.


Note: A key point is that the use of a Table UDF ‘forces’ SQL Server to process/execute that part of the Execution plan first then use the results to generated to process against the LEFT OUTER JOIN against the SupplierProductStoreDay – Union ALL View. Which is a neat way of ‘forcing’ some order into your execution plan!

 

BTW. I needed to use a GETDATE() in my dbo.MyProductPromotionUDF Function and you can’t (it is a non-deterministic function)! – A work around to put the SELECT GetDate() AS CurrentDateTime into a View: vGetDate and call the View in the UDF. I am not worried out any inconsistency as I am only after the current date, and not the time component:

 

See: http://sqlserver2000.databases.aspfaq.com/how-do-i-use-getdate-within-a-user-defined-function-udf.html


Also I have an added filters into dbo.MyProductPromotionUDF as I do not want to process ALL the Promotions that have ever been run, only those that exist for those dates we have data in the SupplierProductStoreDay for (reduces the number of Products and Periods to Join against in the Left Outer Join)

 

To understand, the Business requirement to set this Flag, even though we have around 90,000 ProductPromotions only 86,000 ProductPromotions are valid in the 27 months, and from that we only have sales for around 50% (44,000) of these. Thus we are currently wasting Analysis Services processing time ‘looking for sales data’ which we will do not have. By using this Flag, we can filter out looking for these Products beforehand.

Published 15 May 2007 09:56 by Paul.McMillan

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Personal Edition), by Telligent Systems