Welcome to EMC Consulting Blogs Sign in | Join | Help

Christian Wade's Blog

Measure expressions: how performant are they?

 

Oh, and before the anoraks out there mail me saying “did you know that ‘performant’ is not recognized as an adjective by the major dictionaries?”, yes I do know – but I like the word, so I’m using it!

 

We are modeling the business of the Multi-Dimensional Parcel Service (MDPS).

MDPS Truck

They ship parcels domestically, internationally (and between dimensions).  We are only concerned with the operations of their trucks.  The trucks either

·        Collect parcels from depots to deliver to customers

·        Collect parcels from customers to deliver to depots

 

The customer pays for the delivery/collection, so revenue can be associated with a customer.  The job of collecting the parcels and delivering them has costs associated with it – e.g. petrol.  Costs are therefore associated with a job.

 

DSV

 

It is common for a truck to be part loaded in order to serve multiple customers.  In this case there would be multiple delivery/collection addresses for the same job.  It is therefore conceivable that we could have the following data in our Revenue table.

 

Revenue Table

The parcels for JobID of 1 were delivered to two customers (those with CustomerIDs of 1 and 2).  However, Customer 1 only took up a small amount of space in the truck.  The job took place in December and he was shipping Halloween decorations so there wasn’t much demand.  Customer 2, on the other hand, was shipping Christmas trees, so he used the vast majority of the truck’s space.

How then should we assign costs to a customer?  You have probably already noticed that Customer is a many-to-many dimension with regard to Cost.  A customer can have many jobs performed, and a job can be done for more than one customer at a time.

 

In our example, the total cost of Job 1 (JobID of 1) is £500.  With the default behaviour of many-to-many dimensions (i.e. no measure expression or MDX to split the costs), both customers 1 and 2 would be assigned a cost of £500.  This seems a little unfair – especially for Customer 1 because he only used a small part of the truck space.  It is a business decision as to how to split costs in this type of scenario.  In this hypothetical case, it makes sense to do a proportional split based on revenue.

 

This is what the CostMultiplier field is for.  For a given JobID, the sum of CostMultiplier always equals 1.  So, for the first row, 500 / 5,000 is 0.1.  For the second row, 4,500 / 5,000 is 0.9.  For the third row, 3,000 / 3,000 is 1.

 

Using MDX or a measure expression, we can multiply the cost for a customer by the CostMultiplier.  The following screenshot is therefore how an OLAP client tool would present the costs in our example.

 

Costs by Customer

 

Note: doing the proportional split based on revenue results in the data being aggregated using a regular sum.  The cost for Customer 1 + Customer 2 + Customer 3 is £800.  £800 is also the actual total cost of Job 1 and Job 2.  Summing the individual customers using the many-to-many dimension without the proportional split would have resulted in £1,300 (£500 + £500 + £300).

 

Note: the proportional split has to be done at leaf level.  This is because the calculation is, as George Spofford would say, “non-commutative”.  It involves addition/subtraction and multiplication/division, so we will get different numbers at the aggregated level depending on whether or not the calculation is done at leaf level.

 

So, let’s do it!  I created a relational database containing the above 4 tables and populated it with (quite a lot of) data.  I then created a cube on top of it to allow testing of how performant the different design options are.

 

You can download a backup of the cube that uses calculated members (Option 1 and 2) from here.  For a backup of the cube that uses the measure expression (Option 3), download from here.

 

Here is the metadata of the cube.

 

Cube Metadata

 

Here are the numbers of rows for each table

Customer –        53,186 rows

Revenue –    4,370,206 rows

Job –                979,921 rows

Cost –              334,342 rows

 

To populate the CostMultipler field, we can run the following query.

 

UPDATE Revenue

      SET CostMultiplier =

      CASE

            WHEN dt.Amount = 0 or dt.Amount IS NULL THEN 0

            ELSE r.Amount / dt.Amount

      END

      FROM Revenue r

      INNER JOIN

      (

            SELECT JobID, SUM(Amount) Amount

            FROM Revenue r

            GROUP BY JobID

      ) dt ON r.JobID = dt.JobID

 

Now, when we run the following query, we should get back 1 for every single JobID.

 

SELECT

      JobID,

      SUM(CostMultiplier)

FROM

      Revenue

GROUP BY

      JobID

 

I set the attribute relationship’s Cardinality and RelationshipType properties to improve performance (see this post).  I also of course designed aggregations on the 2 measure group partitions.

 

Various options are evaluated for performance in getting the costs by customer including the proportional split.  It is assumed that we do not have a query cache when testing performance.  The AS query cache can be cleared by running an XMLA command as pointed out by Darren Gosbell here.

 

OPTION 1 – TRADITIONAL STYLE MDX

 

I created a calculated measure called “Cost – Traditional MDX”.  It takes the traditional approach of defining the calculation in the member declaration.  See the MDX below.  Obviously only the calculated measure would be exposed in the perspectives; not the base measure.

 

Note: this will only work if the only exposed hierarchy from the [Customer] dimension is [Category – Customer].

 

CREATE MEMBER CURRENTCUBE.[Measures].[Cost - Traditional MDX]

AS

    IIf

    (

        [Customer].[Category - Customer].CurrentMember.Level IS

        [Customer].[Category - Customer].[(All)],

        // If at the all level of customer, just return the hidden base measure

        [Measures].[Cost],

        // Otherwise split the cost at leaf level proportionally to revenue

        Sum

        (

            Descendants

            (

                [Customer].[Category - Customer].CurrentMember, , LEAVES

            ) *

            Descendants

            (

                [Job].[Reference].CurrentMember, , LEAVES

            ),

            ([Measures].[Cost] * [Measures].[Cost Multiplier])

        )

    ),

FORMAT_STRING = "Currency",

NON_EMPTY_BEHAVIOR = { [Cost] },

VISIBLE = 1;

 

Guess how long the following query takes (on my dev VPC)?  About 1 minute 15 seconds.  There are only about 100 rows returned.  The reason it takes so long is that it has to go down to leaf level at runtime for all the Job/Customer combinations.

 

SELECT

      [Measures].[Cost - Traditional MDX] ON 0,

      {

            [Customer].[Category - Customer].[All],

            [Customer].[Category - Customer].[All].Children

      } ON 1

FROM

      [Measure Expressions]

 

OPTION 2 – MDX SCRIPT

 

We can improve on Option 1 using MDX script assignments.

 

CREATE MEMBER CURRENTCUBE.[Measures].[Cost - MDX Script]

AS

    [Measures].[Cost],

FORMAT_STRING = "Currency",

NON_EMPTY_BEHAVIOR = { [Cost] },

VISIBLE = 1;

SCOPE ([Measures].[Cost - MDX Script]);

   

    // Assign to all cells in the [Customer].[Category - Customer]

    // hierarchy except the [All] member

    ( [Customer].[Category - Customer].Members(1) : Null ) =

    Sum

    (

        Descendants

        (

            [Customer].[Category - Customer].CurrentMember, , LEAVES

        ) *

        Descendants

        (

            [Job].[Reference].CurrentMember, , LEAVES

        ),

        ([Measures].[Cost] * [Measures].[Cost Multiplier])

    );

   

END SCOPE;

 

This brought the query time down from over 1 minute 15 seconds to about 50 seconds.  It does of course return the same data.

 

UPDATE: LEAF-LEVEL MDX SCRIPT ASSIGNMENT 

 

Some readers may be asking why we didn't execute something like the following MDX script assigment.

 

SCOPE (Leaves(Customer), Leaves(Job));

    [Measures].[Cost] = [Measures].[Cost] * ValidMeasure([Measures].[Cost Multiplier]);

END SCOPE;

 

This would be taking a similar approach to what is described by Mosha here.  The Leaves function returns a set of tuples representing the lowest level of granularity across all attributes in the dimension that is passed in as a parameter.  ValidMeasure is used purely as a performance optimisation (see Mosha's post above).  As [Measures].[Cost] is a base measure rather than a calculated measure, it should automatically aggregate up the hierarchy (i.e. to customer category level, which is what our query is interested in) right?  Wrong!  Take a look at Page 388 to 390 of the MDX Solutions book.  It states that the outer dimension in a many-to-many relationship (in this case Customer) will not automatically aggregate up the hierarchy.  This is similar behaviour to MDX script assigments involving calculated measures.  This approach is normally very performant as it doesn't need to use the Descendants function; Analysis Services deals with the aggregation automatically.  Unfortunately, we have to rule it out in this case.

 

OPTION 3 – MEASURE EXPRESSIONS 

 

I created another cube and set up a measure expression for this purpose.  It is simply “[Cost]*[Cost Multiplier]”.

 

Note: when using measure expressions, it is often a good idea to set the DirectSlice property for the dimension relationship (in the Dimension Usage tab).  For the case of MDPS, setting the direct slice to Root([Customer]) (as shown by the following screenshot) means that the All member for Job/Customer attributes shows the actual total of costs – not just those which have a corresponding [Cost Multiplier].  If we do not set DirectSlice, Jobs with no revenue would not contribute to the total cost for all Jobs.

 

The Root(<dimension_expression>) function returns a tuple with the All/Default member of every attribute/hierarchy in the dimension.

 

 

Direct Slice

 

To re-iterate, it makes sense to set the DirectSlice as above for the case of MDPS because we want to report the total cost for all jobs (e.g. [Job].[Reference].[All]) irrespective of whether revenue has been received for them.

 

Using measure expressions brought the query time down to less than 20 seconds.

 

SUMMARY

 

Considering that querying Cost by Customer without any MDX or measure expressions (i.e. without the proportional split) takes about 12 seconds (due to sheer volume of data), I’d say measure expressions are indeed pretty performant!

 

 

 

Published Tuesday, July 25, 2006 5:36 PM by christian.wade

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Christian Wade's Blog said:

I recently investigated currency conversion for a customer.&amp;nbsp; I found that there is not much on the...
August 24, 2006 5:33 PM
 

Christian Wade's Blog said:

I recently investigated currency conversion in SSAS for a customer. I found that there is not much on

September 1, 2006 6:55 PM
 

B said:

A normal person would not use "performant" because it doesn't make sense.

Anoraks are the type of people who make up words (and give IT professionals a bad reputation).

March 16, 2011 2:13 AM
 

Amir said:

My issue is also Multiplication before aggregation. I have a question: I have the case where one of the operands (the measures) is is a calculated measure: OnhandQuantity, summing up all transaction quantities, how can I use it in "Measure Expression"? Knowing that the solution "Measure Expression" suites my need because I need to multiply then aggregate. A third "solution" is  Having an MDX directly but it wouldn't help me since it will aggregate both measures then multiply them. Using Sum (descendants ([dimension1].[hierarchy1].CurrentMember,, leaves), [Measures].[OnhandQuantity]*[Measures].[UnitPrice]) in MDX didn't help since I have about 10 dimensions and I don't know which dimension(s) the report designer will use in their report when using my measure. I also  used that MDX for one dimension ([Product]) for testing, but it again aggregated before mupltiplying. I am in a real trouble! Many thanks for your help in advance.

July 7, 2011 1:19 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Personal Edition), by Telligent Systems