The Problem!
My colleague Karl Grainger has had a T-SQL problem today that we spent a bit of brain power figuring out so I figured it was worth blogging about.
The situation was that he had a slowly changing dimension that had an effective date on each row, that being the date on which that record becomes the current record for the item. [I'll demo this in terms of "widgets"!!!]
First, create our slowly changing dimension (SCD) table:
create table WidgetSCD
(
widget varchar(15)
, SomeProperty char(1)
, EffectiveDate datetime
)
The EffectiveDate column marks the date that the SomeProperty attribute of the widget changes. In other words the date that the record becomes 'current'. We also need some data:
insert WidgetSCD values ('ThisWidget', 'A', '2005-01-01')
insert WidgetSCD values ('ThisWidget', 'B', '2005-01-02')
insert WidgetSCD values ('AnotherWidget', 'A', '2005-01-03')
insert WidgetSCD values ('AnotherWidget', 'B', '2005-01-04')
insert WidgetSCD values ('AnotherWidget', 'C', '2005-01-05')
We needed to know when a record becomes effective AND when it gets superceded (which we don't explicitly store). In effect we need the following:
Widget SomeProperty StartDate EndDate
ThisWidget A 2005-01-01 2005-01-02
ThisWidget B 2005-01-02 NULL <--This is NULL because its the current record
AnotherWidget A 2005-01-03 2005-01-04
AnotherWidget B 2005-01-04 2005-01-05
AnotherWidget C 2005-01-03 NULL <--This is NULL because its the current record
The Solution
The first method we came up with for solving this was to use a correlated sub-query thus:
select
t1.widget
, t1.SomeProperty
, t1.EffectiveDate StartDate
, ( select min(t2.EffectiveDate)
from WidgetSCD t2
where t2.EffectiveDate > t1.EffectiveDate
and t1.widget = t2.widget
) EndDate
from WidgetSCD t1
This worked fine and produced the results we needed (as above).
It bugged me though. I didn't like the fact that the correlated sub-query was going to execute for every record in t1. So after a bit of to-ing and fro-ing I managed to come up with the following, much cleaner, solution:
select
t1.widget
, t1.SomeProperty
, t1.EffectiveDate StartDate
, min(t2.EffectiveDate) EndDate
from WidgetSCD t1
left outer join WidgetSCD t2
on t1.widget = t2.widget
and t1.EffectiveDate<t2.EffectiveDate
group by
t1.widget
, t1.SomeProperty
, t1.EffectiveDate
Instead of using a correlated sub-query it uses a join, so there's only 1 select statement being issued here. Using a correlated subquery the number of selects would be determined by the number of rows in the table, a huge problem if we had a few thousand records in there (which in this case, we have).
We can deploy this as a view and easily be able to query it to find out what the value of SomeProperty was for a particular widget at any point in time. Very handy!
Hope this is useful to someone! The syntax here is T-SQL but this will work equally as well on any RDBMS platform!
- Jamie