Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Once upon a time this blog was a hive of activity. Now however its pretty lifeless as you can probably tell so if are pining for more of the same you can find me over at http://sqlblog.com/blogs/jamie_thomson. I look forward to seeing you there!

Programmable in-memory aggregations

Just recently I was doodling in Visual Studio, writing some .Net code (dangerous, I know). The stuff I was doing wasn’t particularly process intensive but it WAS data intensive; by that I mean I was looping over a collection of entities (let’s pretend that the entities were vehicles sales) and producing lots of statistics about those entities. For each vehicle I built an object VehicleSale each of which was stored in a List(T), hence I ended up with a List<VehicleSale>.

Huh? A what? Ok, I’ll rephrase that in a way that you data guys can understand. I was basically building a denormalised list of vehicle sales and it looked something like this:

Type Make Model OwnerName OwnerHomeCity Colour Date PurchasePrice
Car Ford Focus John James London Red 16/03/2009 14000
Motorbike Honda 599 John James London Blue 02/03/2008 10000
Car Reliant Robin Simon Smith Hull Blue 26/01/2009 6500
Bicycle Emmelle Phantom Simon Smith Hull Black 14/08/2008 50
Van Ford Transit Henry Hodge Bristol Blue 26/09/2008 18000

You get the idea!

[N.B. By the way the data, and indeed the scenario, are figments of my imagination. This is purely for demo purposes]

 

I find myself querying this data structure to ask a number of questions:

  • How many cars have been sold in London?
  • What is the most popular type of vehicle sold?
  • What is the least popular colour?
  • How many cars were sold in March 2009 versus March 2008?
  • What was the value of sales in 2008?

This is a classic Business Intelligence (BI) scenario. In the SQL Server BI circles in which I hang out we typically put this data into a multidimensional database (aka Analysis Services) which pre-aggregates the data for us and hence enables us to answer those types of questions a lot quicker than it would ordinarily take if the data simply resided in a relational database. Analysis Services is after all nothing much more than an aggregation engine (not that I want to belittle that capability, its very very valuable).

In this case however the use of Analysis Services or any other multidimensional database product is not an option. My data isn’t in a database table, in fact it isn’t even physically stored anywhere. Its purely an in-memory data structure (effectively an unindexed heap) that I am building and querying on-the-fly and which will cease to exist once my application stops running. If the amount of data that I am storing in this data structure becomes suitably large then I would expect query time to increase exponentially, eventually rendering my application unusable.

 

And so that brings me onto the point of this blog entry. I started to consider whether there was a gap in the BI market; is there a need for a .Net object into which we can add data (just like we can a List(T)) but that will automatically aggregate the data for us as we add data into it? I envisage such a utility could be incredibly powerful especially if the data aggregations are immediately consistent because then you’re moving into the realm of real-time BI.

Does anyone have any immediate thoughts on this you’d like to share? I’m only throwing some vague ideas out here and I’d like to know if anyone thinks this is an interesting proposition or not, for all I know there might already be something that does this. Either way, share your thoughts in the comments.

-Jamie

Published 08 April 2009 16:57 by jamie.thomson

Comments

 

jwelch said:

Jamie, interesting concept, and I think MS is heading in this direction. Project Gemini uses an in-memory Analysis Services database. Of course, I don't know that there will be any way to use it directly from a .NET app, but it wouldn't surprise me to see that done the road.

April 8, 2009 22:24
 

jamie.thomson said:

Hi John,

I wondered how long it would be until Gemini was mentioned :)

Yes, Gemini is definitely a step in the right direction although there is no developer story for it. We live in hope.

-Jamie

April 9, 2009 09:06
 

SimonS Blog on SQL Server Stuff said:

Very interesting idea brought up by Jamie ( http://blogs.conchango.com/jamiethomson/archive/2009/04/08

April 9, 2009 11:55
 

Dew Drop - April 9, 2009 | Alvin Ashcraft's Morning Dew said:

April 9, 2009 13:34
 

Jason Haley said:

Interesting Finds: April 9, 2009

April 9, 2009 15:12
 

TrackBack said:

April 9, 2009 15:25
 

Internet Marketing Email » Blog Archive » SSIS Junkie : Programmable in-memory aggregations said:

April 9, 2009 18:25
 

john.rayner said:

I haven't played with it, but http://www.codeplex.com/i4o purports to do exactly this.

April 14, 2009 23:19
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems