There is no doubt that SQL Data Services (SDS) looks, feels and smells very different to the SQL Server that we have grown accustomed to over the years. The model is obviously different but there is little formal and clear description by Microsoft on what that model is – the pros and cons and the reasons for the change within the context of cloud computing. Perhaps it is time to bring in the database theory big guns and crack open SDS for the benefit of the community.
My introduction to SDS was a colleague showing it working and I immediately saw that there was something different in the model and although I started understanding what SDS was doing, I definitely did not immediately recognise anything relational. Later I was at a presentation where SDS is referred to as ‘the relational database in the cloud’, which is difficult to reconcile with what I was seeing. I don’t mean to come across as pedantic academic (I am definitely not the latter), but I would like to understand the underlying database model that SDS is trying to implement. Once I grok that model I can begin to understand how it would be positioned against a traditional on premise SQL database and other cloud offerings.
Granted, my grasp of the maths behind relational theory is a bit rusty, but I cannot fit SDS into a relational model. The relational model defines a relation as a set of tuples and the tuple has a finite set of attributes. I could not, in any scratching around on the Internet, find a reference to tuples within a relation having a different (and non-finite) set of attributes. In fact the CJ Date’s position on the relational model not permitting a null value makes me think that creating a superset of attributes for all the entities of a particular ‘kind’ in a SDS container would start breaking some relational model rules by introducing way to many null values for attributes that are irrelevant. Also the implementation issue that large sets of data need to be split across multiple containers in SDS, so a relation is logically split – I think that breaks the relational model. I also think that the issue of the functional dependencies between the attributes and the tuple must also play a role, meaning that varying functional dependencies within a relation would be invalid. Maybe some relational database gurus could shed some light on this for me, but as far as I am concerned, SDS is not a relational database.
So why does Microsoft perpetuate the message that SDS is a ‘relational database in the cloud’. As I see it there are four reasons:
- SQL Server is a really good brand in IT and SDS has something to do with data so it can brand ride
- Microsoft believes that most of its customers don’t really know what a relational database is and they can call it what they like
- Microsoft is trying to create a differentiator between SDS and Azure tables
- SDS intends to add more features, such as a schema layer, to the service over time to give it a more relational feel
There are two problems with telling the market that SDS is a relational database. Firstly, the reality of SDS won’t live up to the marketing expectations – IT executives will buy into SDS based on their understanding and trust in SQL Server and find themselves a very long way from being able to port their applications to the cloud. Secondly, over emphasising the relational model detracts from the benefits of not using a relational model in the cloud. However, Microsoft seems to be the only big player with both a cloud database offering and a traditional relational database. As far as I am aware IBM and Oracle don’t have a SDS equivalent and Amazon and Google don’t play in the RDBMS market. Perhaps the confusion comes out of this unique position, but Microsoft needs to find a way to turn the dual offering into a positive spin with correctly positioned products.
The cloud is the first threat to emerge against the relational model in the enterprise. The shortcomings of the relational model with respect to scalability have been overcome by the new breeds of data storage and they do things fundamentally differently. For example, problems of the large numbers of tables that emerge in the relational model begin to fall away (as they do with SDS). The notion of ACID transactions becomes less important and is there is emphasis on availability instead of consistency as described in Brewer’s CAP Conjecture. It is the very weaknesses of the relational model that are driving alternative database models in the cloud, so branding SDS as ‘a relational database in the cloud’ is doing it a disservice in the eyes of database practitioners even if it does work for the marketable folk.
Microsoft has hinted as providing more ‘relational features’ in future versions, such as joins, which makes me question if they themselves know exactly what they mean by relational features (confusing relation and relationship is something that they do in ADO.NET). There have also been hints at a model for entity schemas which would appear to be another service on top of SDS that would enforce a schema, without fundamentally changing the SDS model – think, for example, of providing ADO.NET Entity Framework or ADO.NET Data Services on top of SDS. I have also seen suggestions of T-SQL on top of SDS – again this would probably be a query service on top of the existing model, rather than embedded into the core. Maybe Microsoft is planning to put a relational facade on top of SDS so that it would be easier for their existing customers and developers to use initially.
I would like to believe that database professionals have a fundamental understanding of the relational model and it’s shortcomings. I also believe that those same people need to be made aware of the benefits of a non-relational model and how it all hangs together in the context of cloud computing. Instead of vague marketing speak, I would like Microsoft to make definitive statements about the SDS model and get some heavyweight database theorists to comment. Pat Helland works for Microsoft, but is strangely quiet on SDS, I would like people like him to weigh in on the discussion.
In terms of the overall Azure platform, the difference between the SDS model and the SQL Server model is probably the biggest obstacle to overcome when trying to port existing applications and existing understanding. The reason for the differences lies at the heart of cloud computing and the architectures required for massively distributed, scalable and available applications are different to those required for the model that SQL Server supports. So we need to speak out about the model so that we can highlight the benefits of the new model and get database professionals to understand that there is more to cloud data than something that is supposed to, but doesn’t, look like the relational model that we are familiar with.
Simon Munro @simonmunro