Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Dryad

I've been meaning to talk on here about Dryad for a while but I've been busy elsewhere and probably a tiny bit lazy as well. So what's Dryad I hear you say?

Dryad is a research project within Microsoft Research and "is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming" or so it says here.

Where Dryad gets interesting is when you read about one of its proving grounds. Get a load of this:

  • SSIS on Dryad executes many instances of SQL server, each in a separate Dryad vertex, taking advantage of Dryad's fault tolerance and scheduling. This system is currently deployed in a live production system as part of one of Microsoft's AdCenter log processing pipelines.

That sounds very interesting. Its hard to decipher what is actually going on here but it sounds as though these guys have used Dryad to distribute SSIS workload across multiple cluster nodes. Unfortunately detail is a bit lacking so its hard to know exactly what has been done but if they have managed to parallelise a SSIS dataflow across the cluster then this is truly exciting stuff and raises SSIS into the echelons of Ab Initio which leads the way in terms of raw processing power with its highly distributed parallel architecture.

I'll keep my ear to the ground about Dryad and if I find out more I'll let you know. The real question is, will this be available for Microsoft customers (i.e. you and I) to use as well? I'll endeavour to find out.

Just to whet your appetite some more here's a little bit more detail from the pages of their research paper Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks:

SQL Server Integration Services (SSIS) supports workflow-based application programming on a single instance of SQLServer. The AdCenter team in MSN has developed a system that embeds local SSIS computations in a larger, distributed graph with communication, scheduling and fault tolerance provided by Dryad. The SSIS input graph can be built and tested on a single computer using the full range of SQL developer tools. These include a graphical editor for
constructing the job topology, and an integrated debugger. When the graph is ready to run on a larger cluster the system automatically partitions it using heuristics and builds a Dryad graph that is then executed in a distributed fashion.
Each Dryad vertex is an instance of SQLServer running an SSIS subgraph of the complete job. This system is currently deployed in a live production system as part of one of AdCenter’s log processing pipelines.

Thanks to Howard for the tip about Dryad.

-Jamie

Published Tuesday, November 13, 2007 4:19 AM by jamie.thomson

Comments

 

Dryad said:

November 13, 2007 4:44 AM
 

Matt Masson said:

"... if they have managed to parallelise a SSIS dataflow across the cluster then this is truly exciting stuff ..."

Yes, that's pretty much what they've done. They use a set of heuristics to decide how to split up their dataflow across the grid. We got a demo of it sometime last year, and it was very impressive. It's very cool stuff, but of course, is completely dependent on Dryad. Hopefully the technology progresses to a point where it can be easily deployed... definitely a project worth keeping an eye on.

November 21, 2007 5:53 AM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems