Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Once upon a time this blog was a hive of activity. Now however its pretty lifeless as you can probably tell so if are pining for more of the same you can find me over at http://sqlblog.com/blogs/jamie_thomson. I look forward to seeing you there!

SSAS: Experiences of processing in Analysis Services 2005

In the last few days I've been doing alot of work on processing SQL Server Analysis Services (SSAS) 2005 cubes, measure groups and partitions and I thought I'd share some of my experiences here.

1. First up - I'm a huge fan of XMLA. The ease with which one can now carry out and customise SSAS administration tasks (including processing) has increased tenfold since SSAS2000. I did encounter one or two problems though - more on this later.

2. The multitude of processing options now available provides far greater flexibility for processing your cubes. Hopefully I'll find time to put together a more detailed synopsis of this later.

3. ascmd.exe - A fantastic tool for querying SSAS cubes and issuing XMLA commands from the command-line which negates the previous reliance on the feature-blown alternative (i.e. SSIS). I've been using ascmd.exe to automate processing of cubes from DTS 2000 - more to come on that in the very near future. Kudos to Dave Wickert and his cohorts for coming up with ascmd.exe. Download the project for ascmd.exe from here.

4. All design-time information about a cube is stored in a .cube file. All, that is, except for information about partitions which is stored in a seperate .partitions file. The reason being that putting all the partition info into the .cube file would cause it to take longer to open up in the designer.
There is a <Partitions> collection element within the <MeasureGroup> element in the .cube file so as Adrian Dumitrascu says here don't rule out the possibility of this behaviour changing in the future.

5. So that's the good, informative and useful stuff...now onto the not-so-good stuff. The cube I am working on contains multiple measure groups and within those measure groups, multiple partitions. My partitioning strategy is to have a partition for every year's activity up to 2015 - the reason being that I don't want to process the whole cube every night - I only want to process data for the current year.
Here is an illustration of how one of my measure groups is partitioned up:

I now have a requirement to dynamically build an XMLA command at runtime that will process the partition for the current year. On the surface it seemed like a fairly easy thing to achieve but there were a few traps lying in wait for me.

As you can see from the illustration above I followed a strict naming convention for my partitions i.e. My partition name is the name of the measure group (spaces replaced by underscores) concatenated with the year for which the partition holds data. Knowing that, it should be pretty easy to construct the XMLA command because its just a matter of concatenating the current year onto the name of the measure group right??? Wrong!

It turns out that the XMLA Process command (which is used to process SSAS objects) does not accept partition names as arguments. In fact, it doesn't accept ANY object names. All object references have to be made via the immutable object IDs that are assigned to the object when it is created.

I have to be honest and say I think this is a bit amateurish. When do we ever reference entities by ID rather than name in ANY discipline such as this? We use the name because that's the mutable property so that we can make it something meaningful. My requirement to dynamically generate the XMLA command is a legitimate one but I am prevented from doing it because I can't use the object names in the Process command. I hope that this changes in Katmai!

In the end my workaround was a nasty one - I hacked the underlying XML in my .cube and .partition files so that the object IDs followed the same naming convention as the object names. This is dangerous because
a) You have to be aware of dependancies on the objects that you are changing
b) There is much more scope to make mistakes. I made one myself - a copy and paste mistake meant that I was no longer able to see my partitions within BIDS. Thankfully I had backed up my project previously so I was able to rollback
c) You are putting real-world meaning into your IDs and that is not what they are for.

I've complained about this issue here and I'm hoping that people will click-through and vote, comment and agree with me. I really raelly don't like this behaviour.


One aside to this - the default object IDs usually bear a resemblance to the underlying table/column in the DSV that the object was initially created from. Remember that IDs are immutable but there is nothing to stop a cube developer from changing the object to use a different table/column from the DSV. This means that someone who is not au fait with this peculiarity could easily look at the deployed cube objects IDs and make false assumptions about where the data is coming from. I was guilty of this myself only recently. For this reason I would much prefer object IDs to be GUIDs just as they are in SSIS - that way there is no ambiguity.

I've suggested this at Microsoft Connect here.

 

6. CTRL-F5 does not parse XMLA commands in a SSMS query window like it does SQL and MDX queries. I have requested this be changed here.

7. Its not really possible to see certain pertinent information about a deployed cube from within SSMS. For example:

  • There is no indication in the UI as to whether an object is processed or not. This is a bit of a step back from SSAS2000.
  • We don't know the ID of an object without scripting it out and as I mentioned above, you need the ID in order to write a Process command
  • We don't know when an object was last processed

Personally I think the criminally underused 'Summary' tab in SSMS could be used to display this information and it seems as though someone agrees with me.

 

Any comments on any of this? I'd like to see some feedback - see if people agree or not.

-Jamie

 

Published 20 June 2006 19:59 by jamie.thomson

Comments

 

furmangg said:

Good post, Jamie. Love your blog, and I'm real glad you're doing some SSAS related posts recently!

My opinion on the matter is that you should only design a single partition when designing the cube. (Really, if you had monthly partitions for 10 years, designing them manually is out of the question.) That one designed partition should be treated as your "template" partition. Then you can write code to programmatically create the required partitions by cloning the "template" partition then changing the binding so it's got the right slice of data. I think that all this code should be in an SSIS package which is responsible for maintaining partitions and processing the cube/dimensions. The part of the code which maintains the partitions should be smart enough to create the partitions if they don't exist (i.e. the first time you process the cube or when a new month/year arrives). So the package will use AMO to connect to the cube and interrogate it's partition status. When you can programmatically create the partition, you can control the ID of the partition and avoid the problem you discuss above. If this strategy is used, having the ID as a string (rather than guid) is helpful. This strategy works pretty well. Project REAL used a very similar strategy.

Love your comment on using the summary tab in SSMS. I voted for it.

ascmd.exe is a great tool, but you can get all that functionality by just using the Analysis Services Execute DDL task in SSIS. The only feature of ascmd I can think of off the top of my head that would be difficult/complex to duplicate in a pure SSIS solution would be the tracing features of ascmd. (Let me know if I'm forgetting about some features.)
June 21, 2006 00:51
 

jamie.thomson said:

Hello furmangg (not your real name I assume :),
Interesting thoughts - I reckon dynamically creating partitions is another blog post all of its own. I wonder if Microsoft provide any guidance on teh recommended approach here.
I've actually previously enquired about automatically creating new partitions - i.e. When data appears for a new member of the 'Year' attribute - automatically create a new partition for it with a pre-defined name/ID. I think this has really got legs but I doubt they'll ever go with it.

I realise that everything we're using ascmd for can be accomplished in SSIS but the reason I have to use ascmd is because our overnight ETL is still driven by DTS - we're upgrading everything piecemeal and AS is first. I've got a detailed blog post coming up about how to process SSAS2005 from DTS.

Thanks for the comments!!

-Jamie
June 21, 2006 09:03
 

Mary Bullock said:

"I realise that everything we're using ascmd for can be accomplished in SSIS ..."

How would you implement user defined scripting variables in SSIS SSAS DDL task?

Just wondering....thanks!

October 11, 2007 16:39
 

Leandro Tubia said:

Hi Jamie

I only want to add some little tricky code to what Furmangg explained.

I see that you've avoided partition creation programmatically by including future years, and I agree with that approach as you've to balance cost-benefit and it's no sense to build code for such an event that occurs once a year.

However if partition creation is an event that occurs very frequently (ie, every day, once a week or once a month), not only I'm used to align Name and ID partition, but other properties as well. I mean that if you have to create monthly partition MyPartition_200809, the partition key is spreaded on ID, Name and QueryDef definition:

ID: MyPartition_200809

Name: MyPartition_200809

QueryDef: Select <Fields> From <Query> Where Year = 200809

KeyErrorLogFile: C:\MyLogs\MyCatalog\MyCube\MyParititon_200809.Log

As Furmangg explained, with this approach you can easely clone the 'Template' partition programmatically, by doing the following steps:

1) Clone partition

2) Replace Template key occurrences by new Key string.

3) Save partition.

I pasted a little code that despite it is made with AMO, it uses xmla advantages. Note that CaptureLog method is used to set an xmla script capture mode:

...cube and Measure group initialization and so on

...

objPartition = MyMeasureGroup.Partitions.FindByName('TemplatePartition')

objserver.CaptureXml = True

objParticion.Clone()

objParticion.Update()

objserver.CaptureXml = False

objserver.CaptureLog(0) = Replace(objserver.CaptureLog(0), <TemplateKey>, <NewKey>)

results = objserver.ExecuteCaptureLog(True, False)

....

.... Error capture

....

September 8, 2008 13:31
 

Karthik said:

Hi Jamie,

Could you please let me know which is the best approach to process the cubes?

1. ASCMD - SQL Server Agent scheduled using ASCMD

2. SSIS - Creating XMLA in a script task and use AMO for executing it?

Please let me know.

Thanks,

Karthik

November 12, 2008 03:13
 

jamie.thomson said:

Karthik,

Depends on the scenario but if you have the ability to do (2) then do that.

-Jamie

November 12, 2008 09:25
 

Karthik said:

Hi Jamie,

Many Thanks for your reply.

How come the following are different?

1. Using an XMLA in a SQL Server Agent of step type SQL Server Analysis Process

2. Using a scrip task to generate an XMLA and executing it through SSIS.

Thanks,

Karthik

November 12, 2008 10:08
 

jamie.thomson said:

They're not really. The end result is the same, its up to you if you'd rather use a SSIS package or an Agent job.

-Jamie

November 12, 2008 10:12
 

Karthik said:

Thanks very much Jamie. Was wondering which will be the best with regards to performance? Which will have a better performance (time) ?

Karthik

November 12, 2008 10:17
 

jamie.thomson said:

Karthik,

The actual time taken to process the SSAS objects will be exactly the same. Running in a SSIS package will take slightly longer because of the overhead of spinning up a SSIS package. To be honest though the difference is negligible so it shouldn't be your main criteria for decding which to use.

-Jamie

November 12, 2008 10:23
 

Karthik said:

Thanks a lot Jamie. You are a genius!!!

November 12, 2008 10:34
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems