Welcome to EMC Consulting Blogs Sign in | Join | Help

Simon Munro

Consistency is an Unnecessary Obsession

In order to build massively scalable databases you, as an architect, will have to toss out data consistency. The knee-jerk reaction is that consistency cannot and should not be compromised but the reality is that real-time consistency is less important than you may think. It requires a bit of effort, more talking to users and re-engineering of things that were previously taken for granted but it is possible to build really good and functional systems with data that is, at least for a few seconds, inconsistent.

Have you ever thought that there was milk in the fridge, only to find out that there isn’t any left? Ever been to a store and found an empty place on the shelf? Checked in at the airport only to find that the flight is full? Paid by credit card because there wasn’t enough cash in your wallet? Opened the wardrobe to discover that you don’t have an ironed shirt? Businesses and consumers take for granted that, like life, what you think is the current state is often not – and it doesn’t really matter that much.

IT people on the other hand seem obsessed with the notion that things must be accurately represented within their systems and generally force this mental model into their architectures and designs. When dealing with the business, how often do you ask them about the accuracy of the data that is required? It would seem like a stupid question - obviously business always wants the data to be accurate. What about being accurate eventually, say within a few seconds? In most cases you may be surprised by the answers that you receive. As database professionals we may assume, and without challenging the assumption, that the amount of stock on hand must be 100% accurate and every time an item is purchased that the available stock should decrease by one until there is nothing left – in which case the item is out of stock. In some businesses that may be the case, but not always. Maybe the distribution point overstocks to allow for shrinkage. Maybe the distribution centre has allocated stock for various stores and can easily ‘borrow’ stock destined for another store to fulfil orders. Maybe the distribution point has some items where more stock is carried by just-in-time suppliers and can be replenished within a few hours. Maybe you should be asking a lot of questions to find out exactly what the case is with that little piece of data.

Apart from general inaccuracy there is a concept that no matter how accurate the system is that something outside of the system introduces inaccuracies that simply have to be dealt with. The concept of ‘Apology Based Computing’ uses this as a basis for system architectures. It asserts that in the event of a system being inconsistent that the resolution of the problem, an apology to the customer, is indistinguishable from the apology that needs to be made if something else went wrong. Pat Helland has an example where a book is purchased from an online store and, on picking the stock, in turns out that the book was run over by a forklift – stuff happens. A customer service person would have to contact the customer and give them the bad news (although maybe not the truth) and offer a resolution. If the system reported that a book was available and two different data nodes sold it at the same time, a similar apology would have to be made to the customer – they could even say it was run over by a forklift and the customer would accept the explanation.

A while back I tried to explain this to a colleague but he kept getting caught up in the availability of stock with customers adding to baskets, removing from baskets, checking out and abandoned baskets - I kept on trying to explain that it doesn’t matter. Assuming that the inconsistency of the data is no more than a few seconds out of date, there would seldom be a contentious situation and if the load was so high on the website that this happened frequently, stock levels would also be high so that sales could always be made. But let us assume that it does matter for a minute and come up with a way to handle apologies. Let us also assume that customers would throw green custard on the boss if they received an apology email hours after they had checked out and paid.

What you would need to do is keep stock, baskets and checkout in sync across nodes. You would need to make allowances for abandoned baskets and a few other things as well. I reckon that at the database level I could keep these in sync using Service Broker and not be more than a second or two out of sync. Check. Check. Check. But what about the few seconds where two (or more) nodes select the last item in stock? Firstly, you have to check that this has happened by knowing what other nodes are doing so that you can detect the deadlock. Then you have to fight it out...

harryhill_fight

Although it would be an interesting interactive shopping idea to have the two basket holders fight it out using a boxing (or hair pulling and scratching) game, it may be easier to think up some rules as to who gets the goods. So you can put your apology in up front.  Here are some examples:

You and another shopper grabbed the last item of the shelf at the same time and...

 

for the Winner…

for the Loser…

 

You were the first to checkout so it is yours

The other person rushed through to checkout and beat you to it

 

You are a regular customer so we decided you should keep it

They got it but there is a blue one in stock which we can give to you for a discount

 

You already have the matching item in your basket

They got it but we’ll give you free shipping on the items in your basket

You get the idea... the point is that something that is contentious can be resolved quickly, easily and even positively long before it becomes a problem.

This may be all very interesting, but I hear you asking ‘Why do I care? I get consistency for free with the database’. Well that may be true – for now. The advent of systems that have to sustain a massive, low margin load, means that the cost having a single database to maintain consistency (and partition tolerance) is prohibitive. So naturally business is looking at cheaper options and the cloud (or at least low cost distributed databases such as CouchDB) are solutions to the cost problem.

“If you’re concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck. Take that as a given.”

— Werner Vogels, Amazon CTO and Vice President

So while distributed databases that exist in the cloud provide scalability, availability and performance at a low cost, the price that is paid is in terms of real-time data consistency. If you don’t have real-time consistency in your platform you have to ask questions from the business about data consistency that you haven’t asked before and design apology based systems, which is hard.

The assumption that real-time data consistency is an absolute imperative is a lazy cop out that biases the architecture towards models that can become prohibitively expensive or underperforming.

There are some cases where consistency is the primary consideration, but there aren’t that many. People often quote banking as an environment that needs real-time consistency but, at least as a banking consumer, I don’t see it. Cheques, although used less often these days, are about the most (data) inconsistent mechanism that you can think of. Paying money from one account into another and only seeing it reflected the next day (or a week later) doesn’t seem like there is no time to resolve consistency issues. Besides, inter-bank communications are done with SWIFT, which is a message network anyway and the issue of consistency should not be confused with Atomicity, which is required in banking systems.

There are some applications that require real-time data consistency, particularly when apologies are cannot be made – you don’t want airport controllers working with inconsistent data, causing two planes to land on top of each other.  Such examples are few and can be specifically engineered into the rest of the system.

plane_crash404_671675c

So when designing cloud based systems you need to toss out some of your ingrained practices and need to not only master the technologies involved, but also revisit requirements that are either stated or assumed, and make sure that they fit in with the new model.

Simon Munro @simonmunro

Published 06 March 2009 15:56 by simon.munro

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Anthony Giwa said:

geez im glad you pointed this (almost crazed) obsession with providing exact realtime info. I totaly agree we should all be able to look back at the requirements and ask is this REALLY needed rather than just assume oh well yea we SHOULD definately consider total consistency.

March 6, 2009 19:42
 

RBarryYoung said:

Excellent post.  I am glad that someone finally had the cajones to say this in public.  I discovered the truth of this many years ago, but anytime that I suggest to another database professional that the Foreign Key or other DRI rule that they want to enforce may have substantial costs for minimal benefits, I get nothing but expressions of horror and outrage.  "OF COURSE we have to have RI rules, how else can we insure consistency?"  

Never mind that they already have plenty of inconsistency because DRI is only capable of catching a tiny fraction of the many real forms of inconsistency that can exist in a database.  And never mind that many of these same professionals use the NOLOCK option, multi-statement DML without transactions, Data Warehouse ETL and floating-point arithmetic, which are not necessarily wrong, unless you believe in absolute consistency in your data.  In fact, anything less than Serializable isolation level always incorporates some level or chance of inconsistency.

But they don't like to think about that, because it means that they already have traded some consistency for scalability.  Consistency really is the hobgoblin of little minds.

April 4, 2009 17:20
 

Delivery Focus said:

It was pointed out to me recently that my critical position on SQL Data Services and support for Azure

June 25, 2009 16:28

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Personal Edition), by Telligent Systems