Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SSAS: Migrating Analysis Services 2000 to Analysis Services 2005

A colleague of mine, Mick Horne, has recently been working on migrating one of our existing service offerings from Analysis Services 2000 to Analysis Services 2005 and found some interesting things that are worth sharing. Mick doesn't have a blog of his own so he's let me waffle on about it instead.

The existing AS2000 application contained a virtual cube (called PointOfSale) that contains, not surprisingly, sales data. The VCube incorporated 3 cubes holding PoS data for the client's 3 types of sales outlets. The 3 cubes were partitioned by year. This splitting and partitioning was done to improve query performance as the fact table has 60million rows of PoS data.

The cube contained 30+ dimensions which presented problems when aggregating.

  • The SKU level in all-but-one of the product-related dimensions has been disabled to reduce the potential number of aggregations.  This also means the users have to use Drill-Across to get information at SKU level as the natural Drill-Down path has been broken.
  • Intermediate levels in many of the dimensions have had aggregations disabled to reduce the number of aggregations.  This improves processing performance but reduces query performance.

Even with these changes, the partitions have been aggregated to around 60% optimization resulting in several hundred aggregations being designed in the Aggregation Builder for each partition.

Each partition takes about 20 minutes to process (3 cubes x 3 partitions per cube x 20 minutes per partition = 3 hours to fully process the 60 million rows of POS data).

Mick spent 3 days moving these cubes (and others) to AS2005. I assume he neglected to use the migration wizard as previous experience has taught him that the cubes this produces are far from efficient and don't realise AS2005 best practise cubes.

In Mick's own words: "I’ve implemented the 2000 cubes as separate 2005 cubes (if I had followed the 2000 design exactly then this would have resulted in a single 2005 cube with multiple measure groups…the design of the front end may require this at a later date but it is not difficult to implement."

  • The new ability in AS2005 to have multiple hierarchies per dimension meant that Mick could implement the natural drill path in all required dimensions, therefore not having to eliminate SKU levels and therefore not having to use Drill-Across.
  • All cubes are aggregated to 100% optimization and because 2005 is hierarchy-aware, this results in < 50 aggregations (as opposed to several hundred before) even in the POS cube with its 30-or-so dimensions
  • The POS data has been migrated into a single cube partitioned by year (i.e. 3 partitions).  Without any other optimizations at server, cube or database level, the POS cube FULLY processed in 18 minutes (compared with 3 hours previously)

Mick wasn't able to do much query performance comparisons as we don't yet have access to a 3rd party beta query tool for AS2005...that will come in time.

This all looks like pretty compelling stuff and a major shot in the arm for AS2005. Now I'm just looking forward to implementing the new technology on a live project.

- Jamie



UPDATE

There looks to have been a fair bit of activity around Mick's work so he's provided some more detail here:


These two solutions work on the same source database; browsing the cubes results in the same numbers.  The 2005 cube is much much more flexible in terms of the user’s ability to drill through the data.

2000 database: 
Shared Dimensions: 40
Cubes: 10
Virtual Cubes: 5
Largest cubes (POS 1, 2, & 3) have 3 partitions each.
Each POS cube has 21 dimensions, > 400 aggregations (@ 30% optimization)

Took 2 days to optimize this database to get a decent trade off between processing time and query performance.

 

2005 database:
Shared Dimensions: 16
Cubes: 5 (7 Measure Groups)
Largest cube (POS) has 3 partitions.  It’s a combination of the three POS cubes in the 2000 solution.
POS cube has 5 dimensions (21 user hierarchies) < 50 aggregations (@100% optimization)

Took 5 days to build this database from scratch.  No specific optimizations.


And in answer to the comment about hardware specs:

OK, well we weren't comparing like with like in terms of hardware but the performance results were far enough apart to suggest there was more to it than this.

The A/S 2000 Server is a P4 1700 with 4GB RAM The A/S 2005 Server is a P4 3000 with 3GB RAM

So in answer to the question, no, it doesn't look like you need a bigger server to run A/S 2005. Basically, the A/S 2005 architecture is better than A/S 2000, its much more enterprise ready and so a complex (set of) cubes in A/S 2000 can be implemented easily in A/S 2005.

Also, the development focus is on functionality and fixes, so it's likely that further performance improvements will be apparent in beta 3.


We're delighted to see there's been alot of external interest in this post. Keep the comments coming, we're keen to share our experiences.

cheers

Jamie

Published Thursday, January 06, 2005 10:40 AM by jamie.thomson
Filed under:

Comments

 

jamie.thomson said:

Thank you Jamie for sharing these migration impressions.
You mentioned that Mick had problems using the Migration Wizard, can you please provide more details ? The cubes were inefficient when processing, when querying ? Was it caused by the generated data source views (or source SQL queries) or by something else ?

Thank you,
Adrian.
January 13, 2005 7:05 PM
 

jamie.thomson said:

Adrian,
I believe Mick said the cubes that the migration wizard produced were not 2005 best practise cubes. e.g. They resulted in seperate cubes rather than 1 cube with many measure groups.

I've emailed him to ask him more but he hasn't replied yet. We're workign on different sites at the moment.
January 18, 2005 10:08 AM
 

jamie.thomson said:

Adrian,
I've pasted Mick's comments below:
=============================================
There's nothing wrong with the way the migration wizard works - we just chose not to use it because what it produces isn't Best Practice for A/S 2005.

The new architecture for building A/S solutions is great, a massive step forward from 2000. The solution we were working with was right on the limit of what 2000 can reasonably be expected to do due to some of the architectural weaknesses in 2000.

Based on what we learned at PASS 2004 and on the Ascend BI course, I decided it would be more efficient to build the cubes from scratch than to unpick the results of the migration wizard.
=============================================

Hope that helps!

Jamie
January 19, 2005 8:56 AM
 

TrackBack said:

January 24, 2005 4:29 PM
 

jamie.thomson said:

Could you please also include more details on your hardware setup (memory capacity, CPU, etc) ?
Is there a need for hardware upgrade when moving from AS2K to AS2K5 to obtain similar results?
Thanks for your postings.
January 24, 2005 4:50 PM
 

TrackBack said:

January 24, 2005 10:21 PM
 

TrackBack said:

Great reference from a UK based Ascend facilitator from microsoft. Thanks Eric.
January 31, 2005 10:42 PM
 

jamie.thomson said:

Seems I've got some reading/playing to do! I am very keen on these multiple heirarchies, as I am currently managing an AS2000 database (that I designed and built - perhaps not the best design, but it works - just) with many cubes, several of which make use of up to 50 shared dimensions. These cubes are then combined with a virtual cube or two. Of these 50 shared dims, probably 25 all (eventually) drill down to SKU, another 10 drill to Location, more again are related to dates. Aggregations are basically unusable, due to the processing time required, and the memory required to properly process them is well above the 3Gb AS2000 limit. Are you saying that in my case, I could have one 'product' hierarchy, and any number (practically infinite??) of 'product attribute' drill downs (such as Brand --> SKU, Brand --> Supplier --> SKU, Colour --> Size --> SKU etc...) and there would be little or no query/processing overhead? So, if a user comes to me and says, "I'd really like to drill down from X to Y to Z" and I would no longer have to come up with new and varied ways to explain to them that if I did that then the cubes might not even process, as they are at breaking point already?
Thanks,
Matt.
February 15, 2005 11:34 AM
 

jamie.thomson said:

In answer to Matt K's comment...yes, this is absolutely the case.

In AS2005 you define attributes of a dimension and build any number of hierarchies that you want out of those attributes.
Attributes themselves are somewhat akin to member properties in AS2000 except that they all have equal "importance" as the attribute that defines the member in AS2000. In AS2005 the value displayed for a member is not defined by a certain field in a table then then has more "importance" than the member properties. The member is defined by the key of the table and then all other fields are available to be displayed as the value (exposed as attributes).

Hope that makes sense. The improvement over AS2000 in this regard is immense.

Feel free to fire more questions this way!

By the way Matt, in AS2000 you may want to explore building dimensions off of your member properties of a single Product dimension. This
should decrease your processing time.

-Jamie
February 15, 2005 11:45 AM
 

jamie.thomson said:

Matt,
That's exactly what I'm saying and that's why the new A/S architecture is so much better than 2000. You are spot on that your Product "dimension" can contain any number of related attributes - basically any field in the database can now be made available for analysis.

Your dimension and attributes are defined in your attribute hierarchy. You can define how levels in this hierarchy relate to each other which will determine how your aggregates work - but note you will only have a single SKU level in your attribute hierarchy.

On top of that you can now define User Hierarchies which are effectively drill paths through the data. User Hierarchies map back onto the Attribute Hierarchy but have no physical presence....i.e. you can create 200 User Hierarchies in your Product dimension that all drill down to SKU level and it will have no impact on your aggregations as your Attribute Hierarchy still has just the single SKU level defined.
Mick
February 15, 2005 7:16 PM
 

jamie.thomson said:

Jamie, Mick, great work.

Jamie, originally we went down the path of building only about 10 'normal' dimensions, and everything else was a virtual dimension. Due to the large number of virtual dimensions, and the fact that many of them had up to 5 levels, and as many as 100,000+ members at the lowest level, the query performance was, well, let's just say sub-optimal. This led us to convert almost all of the virtual dimensions to standard dimensions. The main AS2000 issue we have now is that with so many large shared dimensions, almost all of AS2000's 3Gb limit (minus query cache, minus process buffer etc) is consumed when the service starts - before any queries are even executed. This in turn has led to many AS2000 service restarts related to running out of memory - either during the build, or even during querying. The only way we could combat this was to implement some 'tough love' and cull many dimensions - or at least cut many of them back to single level dimensions with maybe 100 members, rather than drilling through the entire heirarchy. With AS2005, it sounds like the users can have their cake and eat it too. (I'm sure there's a "slice" related pun I could've thrown in there, but I'll let it go.)

This (mutiple related dimensions in AS2005) has got me intrigued. There has to be a catch? Let's look at a reasonably simple situation. I have a 'product' heirarchy with levels like (with # of members in []) Group [10], Department [20], Class [100], SubClass [250], Style [7000], Item [100000]. Let's say we get all this from DimProduct in the DB. Also in DimProduct are the following attributes, Brand, Colour, Supplier, Size, Fabrication, Material, Country of Origin, TargetMarket - and anywhere up to say 30 more. In AS2000 (due to memory limits and users insane requirements for queries to be processed quickly) we have to be very careful about what dimensions we create using these attributes, how far to let users drill down and such. In AS2005 - we don't have to be careful any more? Someone wants a crazy 20 level dimension, drilling down to SKU, then they can have it? And it'll consume no more memory, nor hog any aggregation time/space/memory. I certainly do hope this is the case, as that will offer my clients (whoever they may be at the time) virtually limitless analysis capabilities.

All in all, it seems very exciting. Now, I just need to investigate things like 'is there a concept of shared calculated measures' as I hate having to open Analysis Manager twice re-creating measures, then copying and pasting the MDX code into the new one. Why can't I import calculated measures into 'standard' cubes, in the same way I can import them into a virtual cube. But perhaps that's for another day.

Thanks for your replies.
Matt.
February 15, 2005 10:54 PM
 

jamie.thomson said:

Matt,
As far as I understand it (Mick knows more about it than me), yes, you can do that (i.e. have a crazy 20 level dimension). It'd be a bit unweidly but it would be OK. I would logically expect that the number of levels in your hierarchy would gradually impair performance but not by a great deal.
Theoretically you could build a nonsensical hierarchy sich as Colour-->Country Of Origin-->Department and AS2005 wouldn't have a roblem with it.

Just to introduce you to a bit of terminology...a dimension is still called just that - a dimension. You were using the term "dimension" a bit more loosely in your last post and using it to refer to hierarchies. Remember, 1 dimension, multiple hierachies per dimension.

To touch on your issue of shared calculated measures...in AS2005 the convention is to create only 1 cube and have what we call different "measure groups" in that 1 cube which all operate at matching dimensionality. I am not sure but I would guess that you would be able to share calculated measures between measure groups (for they are in the same cube). You need to ask someone that has had more exposure to it than me (Mick???)

Hope this helps Matt. Please keep checking back.

Regards
Jamie
February 15, 2005 11:13 PM
 

jamie.thomson said:

OK, I'll take these one at a time.

1) Yes, you do appear to get your cake and eat it (though please read back through the original post - we have done only limited query performance testing). This is because A/S 2000 isn't clever enough to realise that your 10 dimensions all with SKU at the bottom are actually one dimension with 10 hierarchies. It treats them as 10 distinct dimensions, and so all the levels in all these dimensions are available to be aggregated, so you will get SKU being aggregated against SKU, Colour against colour etc. which is why the aggregation designer tends to melt when you get up to about 20 dimensions.

The Dev team appears to have spotted this fundamental flaw and A/S 2005 is clever enough to know that actually SKU only needs to be stored once and all the other stuff relates to it in some way (as defined by you when you set
up the Attribute Hierarchy). True dimensional hierarchies!

User hierarchies are just virtual paths through the data that can be exposed to the users through a front end. A/S 2005 just maps them back to the attribute hierarchy you defined, so even though you have 200 User Hierarchies drilling down to SKU, they all map back to the single SKU level that gets used to build the cube aggregates.

2) Dimensions are stored on disk in 2005 so some of your memory issues for large dimensions should go away.

3) The measure group concept is a bit trickier to explain. Basically, what was a "database" in 2000 is a "cube" in 2005. "Cubes" in 2000 are "Measure Groups" in 2005. So, Jamie is right, your multiple cubes in 2000 would be implemented in measure groups within a single 2005 cube. You define the dimensionality of each of your measure groups as part of your overall cube definition (i.e. you tell A/S that Sales are SKU/Store/Day level but Forecasts are Category/Store/Week level based on the shared dimensions you have available).

Note, if you Upgrade a 2000 implementation using the migration wizard, it will (unless things have changed in beta 3) give you a solution that looks a lot like a 2000 solution (i.e. lots of cubes). We found it quicker to start again and build our solution to best-practice 2005 standards than to unpick the results of the migration wizard.

4) Lots of levels in a 2000 dimension might not be a bad thing. Since when you select more than one dimension member on the front end, A/S returns all siblings to the client, having lots of levels reduces the number of siblings per level and therefore the amount of traffic being passed up and down the pipe between client and server. You can sometimes improve performance by adding a grouping level into a dimension and making it invisible.
February 15, 2005 11:37 PM
 

surya said:

I want some basic information regarding migration process of analysis services.

In how many ways we can do migration?

How to decide which way/method to be followed?

February 9, 2007 5:27 AM
 

ssas many to many no intermediate measure group said:

April 28, 2008 12:45 AM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems