Welcome to EMC Consulting Blogs Sign in | Join | Help

Simon Munro

SDS is an Option for Azure Persistence

It was pointed out to me recently that my critical position on SQL Data Services and support for Azure in general are inconsistent, so I thought that a bit of clarification is required.

SQL Server is my database of choice and have been using it since version 4.21, which was long before it became mainstream and I even spent a couple of years on Oracle to make sure that I had broader database experience and breathed a sigh of relief when I returned to the SQL Server world.  As a fan of SQL Server I am really supportive of any efforts by Microsoft to put SQL Server into the cloud and was very supportive of their SDS initiative; imagine having a database platform that was both cloud capable and on-premise – it would be the only one in the market.  As information has been made available about SDS, it seems that it has hit the scale-out problem of the SQL (relational) model and has no way of managing consistency and partition tolerance easily (at all?).  The cloud idea of scalability by adding additional nodes as required has been lost on the SDS team and they seem to remain quiet on the issue (my question on the forums from April 2009 is still unanswered).

Now I don’t have a problem with the (lack of) scalability of SDS per se, rather the misleading marketing and evangelising that SDS is a relational database for the cloud and the implication that you develop as you normally would for a SQL database, without clearly outlining the risks associated with the lack of simple scalability.  In a Tech-Ed video earlier this year members of the SDS team jovially discuss the approaches to handle scalability on SDS which comes down to two things 1) Database sharding and 2) Microsoft is working on tools/frameworks to make it easier to shard a database.  In my opinion (SQL/Relational) database sharding as a starting point is bad design (you get pain and suffering without the ACID benefits of SQL) and waiting for Microsoft to develop some tools and frameworks is hopeful and risky.

Theoretical and academic discussions aside there is one simple problem with building an Azure app that uses SDS as its primary persistence mechanism.  If your business gets mentioned on Oprah and suddenly makes it big, it will hit the wall.  You can spin up lots of Windows Azure Hosted Services but only one SDS service and the load on that lonely stressed-out instance of SDS will render the application useless.  Part of the attraction of the cloud in general and Azure in particular is the ability to lower startup costs by hosting on the bare minimum and being able to scale as needed and on demand (almost).  If you do hit the wall on SDS you have two options 1) a rewrite of the data access layer to use sharding or the mythical framework that is on its way or 2) move the entire application on premise (which may be difficult because of the dependency on the Azure fabric) – neither options will be quick and easy to implement and opportunities will be missed.

The Azure platform (hosted services and storage) does not have the same scalability problems and is (seemingly) engineered with regard to scale-out as has been learned in stateless web farms and message oriented systems over the years.  Azure also forces developers to architect their applications in such a way that the scalability is built in by placing restrictions on access to disk, other processes and removing most of the urge to implement cloud unfriendly practices.  The focus on worker processes, queues, Azure storage and RESTful storage styles biases implementations to more service oriented styles which, almost by definition, can handle scalability.  I find that this is what is the most attractive about Azure – provided we can get developers to think a little bit differently about how to process data, Azure provides the platform for massive (and painless) scalability.

In all the cloud discussions, not only at Microsoft, persistence is the elephant in the room that everyone seems to ignore.  Current applications and architectures have such a high dependency on ACID data operations and all of the goodness that comes from using a SQL database.  Although it is generally known that SQL databases don’t scale out very well it seems that the competitor products are immature (at least in addressing a larger problem space) so they rather keep quiet and don’t point fingers at the entrenched database vendors who, in turn, don’t want to highlight scalability problems.  My issue is that these problems are ignored at the cost of the customer who eventually comes across problems that everybody seemed to know about the whole time.  So it is imperative that Microsoft talk openly and honestly with customers about the architectural considerations that need to be made on Azure and not just target small businesses who don’t have the skills in-house to ask the uncomfortable questions.

I have no doubt that Microsoft has the engineering skills to provide tools, technologies and frameworks to build cloud oriented data storage mechanisms, even if it is on a SQL model (Madison is, after all, a data sharding architecture).  There are clever people like Pat Helland who has a lot of experience on building distributed systems (although he is working more with unstructured data now – I think).  I sincerely hope that Microsoft gives the engineers a chance to build what is needed and not just leave it up to the marketers and the customers – after all the potential technical barrier is far greater than whether or not Outlook renders html with Word or not.

So, if you are a small business and evaluating Azure and SDS, take note of the following:

  1. Data consistency (as offered for free by SQL databases) is not always needed.
  2. Any Azure application should make use of a number of persistence mechanisms depending on the need
    1. SDS – where consistency is required
    2. Azure storage (Tables/Blobs) for high volume and high throughput operations
    3. Cache – to reduce the load on the primary persistence services
    4. Client side data – particularly with fatter clients like Silverlight
  3. Azure applications should make use of worker processes as much as possible and queue updates/requests against the storage mechanism if possible
  4. Sharding should be considered as a last resort and should not be generally applied.
  5. Considerations should be made for cloud and on-premise storage – stale and historical data can (and should) be moved to less congested storage mechanisms.

Disclaimer:  My opinions are formed by experience and publicly accessible information.  I have no access to MVP, partner or private beta materials and Microsoft (for obvious reasons) doesn’t talk to me directly.

Simon Munro

@simonmunro

Published 25 June 2009 16:28 by simon.munro

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Simon said:

I tend to find in most discussions that compare SDS and Azure storage just gloss over the transactional differences. Whats concerning is that transactions are not very well understood in app design and adopting azure storage they have no protection, at least with SQL there is at least a little protection.

I also don't believe business really know if they do or don't need transactionally consistent applications, and if they say they do but don't need consistency, they don't understand the ramifications of that.

June 25, 2009 16:44
 

simon.munro said:

It's good to see a database guru such as yourself stop by Simon. :)

I agree - all this discussion about transactions is confusing to most and, at least from a vendors' point of view, best keep out of public discussion.

Maybe a year or two from now we will long for good 'ol single instance SQL as our database, when things were simpler and easier.  It maybe best for MS to offer scaled-up nodes for those that require it, rather than wrestling the consistency dragon.

June 25, 2009 17:05

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Personal Edition), by Telligent Systems