Some documentation for SQL Azure has been released on MSDN at http://msdn.microsoft.com/en-us/library/ee336279.aspx and upon reading there are some interesting nuggets of information that are worth calling out.
Under a section headed “Key benefits of the service” the following statement appears:
A key advantage of SQL Azure is the ease with which you can scale your solution. After partitioning your data, the service scales as your data grows.
Hmmm…there’s two seemingly contradictory statements in there, “ease” and “partitioning your data”. Last time I looked there was nothing particularly easy about partitioning data, certainly not when it is going to be partitioned across multiple databases. I wonder if the captain of the Titanic proudly declared “Sailing to New York will be easy once we’re past all those meddlesome icebergs”!
Another name for this partitioning technique is sharding for which Dare Obasanjo has a great discussion up at Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes. Dare says database sharding is:
the process of splitting up a database across multiple machines to improve the scalability of an application
which is exactly what SQL Azure proposes except that they use the word “partitioning” rather than “splitting” and its across multiple virtual instances rather than multiple machines.
Dare talks about numerous disadvantages of sharding but the key one is that you lose the benefit of referential integrity (RI) which is, to my mind, THE main reason for using a relational database in the first place. Let’s take a look at a simple example. Suppose (incredibly hypothetically) that Amazon adopted SQL Azure and chose to shard its product catalogue across multiple instances; they then face the very real situation where it is not possible to have full RI between their products, customers and the orders that link them together. An order would most likely contain products from different shards therefore which shard should the order live on? Maybe Amazon could shard the order line items according by product but how do they enforce RI back to the order header and, onward, to the customer? Clearly Amazon can’t enforce RI across shards so they have to use some different data storage mechanism in which case why bother using SQL Azure at all?
This is an extreme case but its one that you have to consider when using SQL Azure and the problem is exacerbated when you consider that the maximum size of a database in SQL Azure is 10GB. “Oh, you want to store your new product in my database? You say it’ll only use up 5KB of storage? No can do, you’re maxed out! Sorry!” (Although arguably the 10GB limit is an advantage because at least your shard’s maximum limit is absolute rather than theoretical.)
There are of course solutions to these problems and they generally involve writing your application to workaround these limitations. Doing that is, however, an expensive and time-consuming undertaking which is something anyone adopting SQL Azure needs to be aware of and you won’t find it written down anywhere in the SQL Azure documentation.
Be aware, that’s all I’m saying! If your database is not going to grow anywhere near 10GB then SQL Azure might well be a good fit for you.