27 Partition Fact Tables containing Sales Data: Supplier, Product, Store and Day held in a Union All View SupplierProductStoreDay (over 1.35 Billion records)
We have another table that says ‘ProductPromotion’ which has a Start Date and End Date Promotion Period, when a Product is on Promotion
We require a Flag (ProductSaleWithinPeriodFlag that answers the question, this Product is on a Promotion, but did it have any sales YES/NO (1/0) on any Date during the Promotion Period
So the question that is needed to be answered is ‘Did we sell this Product in any Store during the Period?’ – The point is we know as SOON as we find the Sale in any Store (and in fact for any Supplier and Date within the Promotion Period), we can STOP searching for anymore.
However, if we do a MIN, MAX we still have to read all the data to determine the MIN or MAX figure, we just want a FIRST or ANY function (to check the existence), in SQL 2005 we could use a CLR Function to do this, but we need it in SQL 2000. And we want it as performant as we can get and we need to do it in a View, so the answer is determined at Run Time for loading overnight into an Analysis Services 2005 Dimension.
If anyone is interested I have a solution that solved this via UDF (however, using Table UDF – But as not as an In-line Function and not in the Where or Join Condition)
The time to return the Flag (Yes/No Sales) data for 86,000 ProductPromotions (Products on Promotion) against 27 Months worth of Sales Data (1,350,000,000 rows) is 25 Minutes – Which is OK as it will be loaded into an Analysis Services Cube, so we only need it at Dimension Creation Time
The way I did was as follows
SELECT
X.ProductID,
X.PromotionID,
X.PromotionPeriodStartDateID,
X.PromotionPeriodEndDateID,
CASE WHEN MAX(Y.ProductID) IS NULL THEN 0 ELSE 1 END As ProductSaleWithinPeriodFlag
FROM dbo.MyProductPromotionUDF AS X
LEFT OUTER JOIN
(
SELECT
ProductID,
BusinessDateID
FROM dbo.SupplierProductStoreDay – Union ALL View
GROUP BY ProductID, BusinessDateID
) AS Y
ON Y.ProductID = X.ProductID
AND Y.BusinessDateID >= X.PromotionPeriodStartDateID AND Y.BusinessDateID <= X.PromotionPeriodEndDateID
GROUP BY
X.ProductID,
X.PromotionID,
X.PromotionPeriodStartDateID,
X.PromotionPeriodEndDateID
It is important to use the dbo.MyProductPromotionUDF as a UDF Function to get the ‘working/filtered’ Product Promotion list as if you do not the Query Engine will evaluate ProductPromotion/SupplierProductStoreDay combination and then choose a ‘really inefficient query engine’ and the time to work it out is INCREASED from 25 minutes to 3 hours.
Note: A key point is that the use of a Table UDF ‘forces’ SQL Server to process/execute that part of the Execution plan first then use the results to generated to process against the LEFT OUTER JOIN against the SupplierProductStoreDay – Union ALL View. Which is a neat way of ‘forcing’ some order into your execution plan!
BTW. I needed to use a GETDATE() in my dbo.MyProductPromotionUDF Function and you can’t (it is a non-deterministic function)! – A work around to put the SELECT GetDate() AS CurrentDateTime into a View: vGetDate and call the View in the UDF. I am not worried out any inconsistency as I am only after the current date, and not the time component:
See: http://sqlserver2000.databases.aspfaq.com/how-do-i-use-getdate-within-a-user-defined-function-udf.html
Also I have an added filters into dbo.MyProductPromotionUDF as I do not want to process ALL the Promotions that have ever been run, only those that exist for those dates we have data in the SupplierProductStoreDay for (reduces the number of Products and Periods to Join against in the Left Outer Join)
To understand, the Business requirement to set this Flag, even though we have around 90,000 ProductPromotions only 86,000 ProductPromotions are valid in the 27 months, and from that we only have sales for around 50% (44,000) of these. Thus we are currently wasting Analysis Services processing time ‘looking for sales data’ which we will do not have. By using this Flag, we can filter out looking for these Products beforehand.