I’m approaching the end of an enjoyable data warehousing project and we have collected some size/volume statistics around our Analysis Services 2005 cube. We have:
* 48 Dimensions. This includes role-playing dimensions.
* 751 Attributes and user hierarchies (!). So if a dimension has 5 attributes and 2 user hierarchies, this counts as 7.
* 21 Measure groups
* 68 Measures (including calculated ones)
This is all in one mother cube.
It currently only has 6 months worth of data and the fact table with the most data has 2,195,236 rows.
The dimension with the most members has 32,789 leaf-level members.
The fact tables are likely to grow considerably over the next few years. It may require partitioning in the future, but we don’t feel this will be an issue for at least a couple of years.
The cube generally performs very well for user queries. They have been using it in anger for a while now without any performance problems of note – although the beefier production server is quite robust. Obviously, doing a regular cross join on thousands of members isn’t a good idea (as with any AS implementation).
However, one thing that did make a noticeable difference (especially on dev/test where the servers are less beefy) is setting attribute relationship properties (Cardinality and RelationshipType). I have of course configured the attribute relationships to define aggregation paths for natural hierarchies. I had also heard that setting the Cardinality property and RelationshipType properties on the attribute relationships improves performance. We are now able to push more obscure buttons that were previously out of bounds.
When going through (all!) the attribute relationships, it became evident that about a quarter of them did not require one-to-many, which is the default Cardinality property. An example of this might be where you have a surrogate key as an attribute (which might be required as an attribute for reference dimensions) and a business key as another attribute in the same dimension. There would obviously be a one-to-one relationship between the two.
We are doing full cube processing, so I set all the RelationshipType properties to Rigid. You only need this set to Flexible if you are doing incremental processing for changing dimension members. The documentation out there (e.g. http://msdn2.microsoft.com/en-us/library/ms166553.aspx ) is unclear whether this improves query time; it makes sense that it would improve processing time.
As I said above, these 2 changes to the attribute relationship properties did seem to improve query performance. I did both the changes in one go so I’m not sure which is the most effective.
The whole cube now takes between 15 and 20 minutes to do a full process. There was a time when it was taking many hours. We traced it down to a query that Analysis Services was running against the warehouse relational database. It was doing a join between 2 views (about quarter of a million rows). We have a views metadata layer on top of the warehouse to allow reusability of business logic across reporting/analytical tools. The view on the RHS was joining back to the one on the LHS for a derived column. The execution plan showed a Cartesian type operation that was exploding the number of rows exponentially. All we had to do was change the ETL calculate the derived column and, hey presto, cube processing was reduced to 15 minutes.