It is generally considered data warehousing best practice to split your facts into different tables if the granularity/dimensionality is different. This is a very common question in data warehouse design. Consider the following data model.
The company traditionally sells its products through sales people visiting customer sites. They now have a website which is another channel for sales. The goods are delivered to the customer through Delivery Jobs. A delivery job can deliver to multiple customers in one go.
It might be tempting to have a single measure group for sales (rather than 2 as in the above model). We might then create a UNION query for it in the DSV. But what would we use for SalesPersionID in the OnlineSales SELECT statement? We would have had to introduce a “Not Applicable” member in SalesPerson (possibly with SalesPersionID of -2) for this purpose.
Although it might make sense in a transactional system to have a single Sales table, it is generally considered best practice to split out the facts. I think even Kimball and Inmon might agree with each other on this one!
Having split out the facts, we can of course still have a Total Sales measure that sums OnsiteSales and OnlineSales. This works fine in the cube. Assuming we have IgnoreUnrelatedDimensions set to false on both measure groups, if we were to query Total Sales by Sales Person, it would only ever return sales numbers from Onsite Sales. In most cases, this works fine and it is what the user would expect. We also then do not need to explain what on earth this “Not Applicable” member is all about.
But here is the question we’ve been building up to: what if we want to query DeliveryJob count by Customer or by Product? If we had taken the single sales fact table approach, this would have been a “walk in the park” (I was going to use the English rather than American equivalent of this analogy – the one whose acronym is POP – but I decided not to). This would have allowed simply creating a many-to-many dimension between
Customer => AllSales => DeliveryJob,
SalesPerson => AllSales => DeliveryJob,
Product => AllSales => DeliveryJob
The Sales measure group would be used as the many-to-many mapping measure group.
Having split out the facts, we cannot do this because there are effectively 2 many-to-many mapping measure groups. Analysis Services 2005, despite is fantastically enhanced modelling capabilities, does not allow this.
Here is a method, which in my opinion gets round this problem well. Introduce the AllSales measure group that does UNION OnsiteSales and OnlineSales.
Note: querying the OnsiteSales and OnlineSales measures are unaffected. They still query the respective measure groups and have nothing to do with AllSales.
The only purpose of the AllSales measure group is to act as a many-to-many mapping table against the DeliveryJob fact from Customer and Product. We don’t need a “Not Applicable” member in SalesPerson because it is not related to AllSales.
Here are the many-to-many relationships to query DeliveryJob Count:
AS2005 forces us to have at least one measure per measure group. I therefore created an AllSales count, but it is not likely to ever be used by the end user. Given this, I don’t believe it is even necessary to design aggregations against the AllSales measure group. Designing aggregations for DeliveryJob alone should do it.
I was recently faced with this modelling challenge although for completely different data. I solved it using this method. I think this could be a very common nut to crack when modelling complex relationships between entities with AS2005. Many readers, when faced with this challenge, might have come up with the same solution. If that is the case, hopefully it will take you less time to arrive at the solution as a result of this post!