A colleague of mine, Mick Horne, has recently been working on migrating one of our existing service offerings from Analysis Services 2000 to Analysis Services 2005 and found some interesting things that are worth sharing. Mick doesn't have a blog of his own so he's let me waffle on about it instead.
The existing AS2000 application contained a virtual cube (called PointOfSale) that contains, not surprisingly, sales data. The VCube incorporated 3 cubes holding PoS data for the client's 3 types of sales outlets. The 3 cubes were partitioned by year. This splitting and partitioning was done to improve query performance as the fact table has 60million rows of PoS data.
The cube contained 30+ dimensions which presented problems when aggregating.
- The SKU level in all-but-one of the product-related dimensions has been disabled to reduce the potential number of aggregations. This also means the users have to use Drill-Across to get information at SKU level as the natural Drill-Down path has been broken.
- Intermediate levels in many of the dimensions have had aggregations disabled to reduce the number of aggregations. This improves processing performance but reduces query performance.
Even with these changes, the partitions have been aggregated to around 60% optimization resulting in several hundred aggregations being designed in the Aggregation Builder for each partition.
Each partition takes about 20 minutes to process (3 cubes x 3 partitions per cube x 20 minutes per partition = 3 hours to fully process the 60 million rows of POS data).
Mick spent 3 days moving these cubes (and others) to AS2005. I assume he neglected to use the migration wizard as previous experience has taught him that the cubes this produces are far from efficient and don't realise AS2005 best practise cubes.
In Mick's own words: "I’ve implemented the 2000 cubes as separate 2005 cubes (if I had followed the 2000 design exactly then this would have resulted in a single 2005 cube with multiple measure groups…the design of the front end may require this at a later date but it is not difficult to implement."
- The new ability in AS2005 to have multiple hierarchies per dimension meant that Mick could implement the natural drill path in all required dimensions, therefore not having to eliminate SKU levels and therefore not having to use Drill-Across.
- All cubes are aggregated to 100% optimization and because 2005 is hierarchy-aware, this results in < 50 aggregations (as opposed to several hundred before) even in the POS cube with its 30-or-so dimensions
- The POS data has been migrated into a single cube partitioned by year (i.e. 3 partitions). Without any other optimizations at server, cube or database level, the POS cube FULLY processed in 18 minutes (compared with 3 hours previously)
Mick wasn't able to do much query performance comparisons as we don't yet have access to a 3rd party beta query tool for AS2005...that will come in time.
This all looks like pretty compelling stuff and a major shot in the arm for AS2005. Now I'm just looking forward to implementing the new technology on a live project.
- Jamie
UPDATE
There looks to have been a fair bit of activity around Mick's work so he's provided some more detail here:
These two solutions work on the same source database; browsing the cubes results in the same numbers. The 2005 cube is much much more flexible in terms of the user’s ability to drill through the data.
2000 database:
Shared Dimensions: 40
Cubes: 10
Virtual Cubes: 5
Largest cubes (POS 1, 2, & 3) have 3 partitions each.
Each POS cube has 21 dimensions, > 400 aggregations (@ 30% optimization)
Took 2 days to optimize this database to get a decent trade off between processing time and query performance.
2005 database:
Shared Dimensions: 16
Cubes: 5 (7 Measure Groups)
Largest cube (POS) has 3 partitions. It’s a combination of the three POS cubes in the 2000 solution.
POS cube has 5 dimensions (21 user hierarchies) < 50 aggregations (@100% optimization)
Took 5 days to build this database from scratch. No specific optimizations.
And in answer to the comment about hardware specs:
OK, well we weren't comparing like with like in terms of hardware but the performance results were far enough apart to suggest there was more to it than this.
The A/S 2000 Server is a P4 1700 with 4GB RAM The A/S 2005 Server is a P4 3000 with 3GB RAM
So in answer to the question, no, it doesn't look like you need a bigger server to run A/S 2005. Basically, the A/S 2005 architecture is better than A/S 2000, its much more enterprise ready and so a complex (set of) cubes in A/S 2000 can be implemented easily in A/S 2005.
Also, the development focus is on functionality and fixes, so it's likely that further performance improvements will be apparent in beta 3.
We're delighted to see there's been alot of external interest in this post. Keep the comments coming, we're keen to share our experiences.
cheers
Jamie