Welcome to EMC Consulting Blogs Sign in | Join | Help

Christian Wade's Blog

MDX Sprocs and Scripting: An Interesting Example

A colleague of mine once asked me to do a “Number of Days since Last Incident” calculated measure (sounds like fun!).  Basically, he had a measure called Incidents and wanted a running count of days since the last incident.

 

E.g.

Date

        Incidents

        Days Since Last Incident

1 Jan

        1

        0

2 Jan

        0

        1

3 Jan

        0

        2

4 Jan

        2

        0

5 Jan

        0

        1

6 Jan

        0

        2

7 Jan

        0

        3

 

Easy peasy, right?  It obviously needs to maintain dimensionality of Incidents, so a good candidate for a calculated measure.

 

Let’s evaluate the options for a simplified version that only works at the day level.  For this purpose, I created a simple cube: one measure group and 2 dimensions (Time and Territory).  Here is a screenshot of the cube DSV.

 

I added a server time dimension to dimension IncidentDate for simplicity.  It has a standard Calendar hierarchy.

 

  

Couldn’t be simpler, right?  The data and the solution with all the code can be downloaded here.

 

Standard recursive MDX

 

The traditional AS2000 approach would be to create a regular recursive MDX calculated member.  Here is the MDX.

 

WITH MEMBER [Measures].[Days Since Last Incident]

AS

    IIf

    (

              //check at day level

        Not [Time].[Calendar].CurrentMember.Level IS [Time].[Calendar].[Date],

              "NA",

        Case

            //check we are not at the beginning of the time dim

            When IsEmpty

                 (

                     [Time].[Calendar].CurrentMember.PrevMember

                 )

            Then 0

            //check if the current time member has incidents measure that is empty or 0

            When IsEmpty

                (

                                  (

                                         [Time].[Calendar].CurrentMember,

                                         [Measures].[Incident Count]

                                  )

                )

            Or  (

                    [Time].[Calendar].CurrentMember,

                    [Measures].[Incident Count]

                ) = 0

            //if so, we need to calculate how many days since the last incident

            Then Sum

                (

                    [Time].[Calendar].CurrentMember.PrevMember,

                    [Measures].[Days Since Last Incident]

                ) + 1

                     //if not, there is at least one incident for the current day, so 0

            Else 0

        End

    )

SELECT

       {

              [Measures].[Incident Count],

              [Measures].[Days Since Last Incident]

       } ON 0

    ,Descendants

    (

        [Time].[Calendar].[Year].[Calendar 2005],

        [Time].[Calendar].[Date]

    ) ON 1

FROM

       [Recursive Mdx Sproc];

 

In most cases, this would probably work fine.  However, in my test cube, the Incident Counts are sparse.  This means that the number of recursive calls for each day is quite large and this has an adverse effect on query performance.  This query takes 19 seconds on my laptop for all the days in 2005.

 

 

MDX Scripting

 

I was of course keen to try this using MDX scripting.  Here is the script.

 

CALCULATE;

CREATE MEMBER CURRENTCUBE.[Measures].[Days Since Last Incident]

    AS "NA";

SCOPE ([Measures].[Days Since Last Incident]);

( [Time].[Date].Members ( 1 ) : Null ) =

( Case

    //check if the current time member has incidents measure that is empty or 0

    When IsEmpty

        (

                  (

                         [Time].[Calendar].CurrentMember,

                         [Measures].[Incident Count]

                  )

        )

    Or  (

            [Time].[Calendar].CurrentMember,

            [Measures].[Incident Count]

        ) = 0

    //if so, calculate how many days since the last incident using a recursive

    //call that increments by 1 for each call

    Then Sum

        (

            [Time].[Calendar].CurrentMember.PrevMember,

            [Measures].[Days Since Last Incident]

        ) + 1

    //if not, there is at least one incident for the current day, so 0

    Else 0

  End);

//when at the beginning of time, evolve

( [Time].[Date].Members ( 0 ) ) = Null;

END SCOPE;

 

I was flabbergasted to see that the MDX scripting approach takes 1 second for the same query!!!  There are obviously some amazing optimizations available through MDX Scripting.

 

MDX Sproc V1

 

A voice in my head told me to try it using an MDX sproc.  Here is the .NET code.  It is by no means production quality; it is purely illustrating a concept.  I certainly was not expecting it to improve on 1 second!

 

// recursive approach

public static MDXValue DaysSinceLastIncident(

    Set days,

    Tuple incidentCountMeasure,

    int currentDayIndex)

{

    try

    {

        // are we at the beginning of time?

        if (currentDayIndex == 0)

            return null;

        return DaysSinceLastIncidentRecursive(

            days,

            incidentCountMeasure,

            currentDayIndex,

            0);

    }

    catch (Exception ex)

    {

        return ex.ToString();

    }

}

private static int DaysSinceLastIncidentRecursive(

    Set days,

    Tuple incidentCountMeasure,

    int currentDayIndex,

    int dayCount)

{

    // have we counted back to the beginning of time?

    if (currentDayIndex == 0)

        return 0;

    //check if the current time member has incidents measure that is empty or 0

    Expression exp = new Expression("[" + days.Tuples[currentDayIndex].Members[0].Caption + "]");

    MDXValue incidentCount = exp.Calculate(incidentCountMeasure);

    if (incidentCount == null || incidentCount.ToInt32() == 0)

    {

        //if so, calculate how many days since the last incident using a recursive

        //call that increments by 1 for each call

        return DaysSinceLastIncidentRecursive(

            days,

            incidentCountMeasure,

            currentDayIndex - 1,

            dayCount + 1);

    }

    else

    {

        //if not, there is at least one incident for the current day, so return

        return dayCount;

    }

}

 

In terms of what this code is doing, it is actually very similar to the standard recursive MDX approach above.  Probably the most significant difference is that, since we haven’t got an equivalent function to PrevMember in the Microsoft.AnalysisServices.AdomdServer namespace, we are passing in currentDayIndex and subtracting 1 instead.  Here is the MDX that uses it.

 

WITH

MEMBER [Measures].[Days Since Last Incident]

AS

    IIf

    (

              //check at day level

        Not [Time].[Calendar].CurrentMember.Level IS [Time].[Calendar].[Date],

              "NA",

        //call the sproc and let it do the work

        ClassLibrary1.DaysSinceLastIncident

        (

                     [Time].[Date].[All].Children,

                     [Measures].[Incident Count],

                     Rank

                     (

                           [Time].[Date].CurrentMember,

                           [Time].[Date].[All].Children

                     ) - 1  // taking away 1 because 1-based

        )

    )

SELECT

       {

              [Measures].[Incident Count],

              [Measures].[Days Since Last Incident]

       } ON 0

    ,Descendants

    (

        [Time].[Calendar].[Year].[Calendar 2005],

        [Time].[Calendar].[Date]

    ) ON 1

FROM

       [Recursive Mdx Sprocs];

 

 

 

As you can see, in order to pass in currentDayIndex, we use the Rank function.  The query takes 52 seconds on my machine.  Why does it take so much longer than the standard recursive MDX approach?  I guess the biggest reason is that we are taking hit for jumping to and fro from managed code to COM interop.

 

However, this code does illustrate a concept. The recursive call has to be made n times for every single day member.  If we look at the required output dataset, should this really be necessary?

 

Date

        Incidents

        Days Since Last Incident

1 Jan

        1

        0

2 Jan

        0

        1

3 Jan

        0

        2

4 Jan

        2

        0

5 Jan

        0

        1

6 Jan

        0

        2

7 Jan

        0

        3

 

Why can’t we just iterate the whole dataset once only taking a running count as we go along (rather than count back from every single day member)?  This would obviously vastly improve the query performance.  This is what I tried to do for MDX Sproc V2.

 

MDX Sproc V2

 

It is not possible for an MDX sproc to populate all the return data values for a set in one foul swoop.  Instead, our stored procedure DaysSinceLastIncident2 has to be called once for every single data value that is returned.  This makes it a little difficult to take our ‘one-iteration’ approach.  We are forced to do the iteration for the first call of the sproc, store the resulting data in memory, and use it for lookup purposes each time the sproc is subsequently called from MDX.  The following code is the only way I managed to do it.  Note: there are problems associated with it and these are discussed below.

 

// keep cached lookup data in memory

private static Hashtable daysSinceLastIncidentLookup;

public static MDXValue DaysSinceLastIncident2(

    Set days,

    Tuple incidentCountMeasure,

    string currentDayCaption)

{

    try

    {

        //initialize daysSinceLastIncidentLookup if necessary

        if (daysSinceLastIncidentLookup == null)

            InitializeDaysSinceLastIncidentLookup(days, incidentCountMeasure);

        return (Int32)daysSinceLastIncidentLookup[currentDayCaption];

    }

    catch (Exception ex)

    {

        return ex.ToString();

    }

}

private static void InitializeDaysSinceLastIncidentLookup(

    Set days,

    Tuple incidentCountMeasure)

{

    daysSinceLastIncidentLookup = new Hashtable();

    Expression exp = new Expression();

    int daysSinceLastIncident = 0;

    bool foundFirstIncident = false;

    foreach (Tuple tpl in days.Tuples)

    {

        exp.ExpressionText = "[" + tpl.Members[0].Caption + "]";

        MDXValue incidentCountValue = exp.Calculate(incidentCountMeasure).ToInt32();

        if (incidentCountValue == null || incidentCountValue.ToInt32() == 0)

        {

            //i.e. no incident found, so count the day (assuming we've already had at least one incident)

            if (foundFirstIncident)

                daysSinceLastIncident += 1;

        }

        else

        {

            foundFirstIncident = true;

            daysSinceLastIncident = 0;

        }

        daysSinceLastIncidentLookup.Add(

            tpl.Members[0].Caption,

            daysSinceLastIncident);

    }

}

 

And here is the MDX that calls it:

 

WITH

MEMBER [Measures].[Days Since Last Incident]

AS

    ClassLibrary1.DaysSinceLastIncident2

    (

              [Time].[Date].[All].Children,

              [Measures].[Incident Count],

              [Time].[Date].CurrentMember.Name

    )

SELECT

       {

              [Measures].[Incident Count],

              [Measures].[Days Since Last Incident]

       } ON 0

    ,Descendants

    (

        [Time].[Calendar].[Year].[Calendar 2005],

        [Time].[Calendar].[Date]

    ) ON 1

FROM

       [Recursive Mdx Sprocs];

 

It takes 4 seconds.  This time, rather than pass in currentDayIndex, we pass in [Time].[Date].CurrentMember.Name, which is used for the lookup having prepared the data in memory.

 

Unfortunately, it is necessary to store a lookup value for every day in the whole time dimension rather than just those days we need data for.  This is because we don’t know, at the point when the sproc is executed for the first member, what other members will follow.

 

This code would not work in production.  It is solely to illustrate the concept.  Here are the reasons why it would not work.

  • There is no way of guaranteeing the lookup cache is re-initialized each time the outer MDX query is run.  Therefore, the second time the query is run, it would use the already existing cache.  If the dimensionality of the second query were different – e.g. sliced by a territory – we would get back incorrect results.  If we could at least determine the dimensionality of the current query from within an MDX sproc, we could re-initialize the cache for different dimensionality.  This would be a nice addition to the Context class.
  • Even if we could re-initialize the lookup cache, it could cause erroneous behaviour in a multi-user environment.  This is because it could be shared by multiple concurrent queries.

So, in summary, MDX sprocs are cool.  They are a vast improvement on COM functions for MDX.  The fact that we can receive multidimensional objects such as Tuples, Sets, Members, etc. is fantastic.  As per my previous post, there is lots of potential in MDX sprocs.

 

However, here is an enhancement illustrated by this article, which would (in my humble opinion) greatly increase the power of MDX sprocs.

User-defined aggregate functions in SQL CLR have a method called Init() that is run for initialization.  Moreover, it is possible to instantiate the class decorated with the SqlUserDefinedAggregate attribute once per aggregate function call (not for each member that is then aggregated).  This means that class level, non-static variables can be reliably accessed by each method invocation of an aggregated member.

Something conceptually similar for MDX sprocs would be fantastic.  This would allow initialization for the MDX query as a whole and then reliable access to a reserved memory space from each method invocation of a calculated data cell.

 

 

 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Personal Edition), by Telligent Systems