Welcome to EMC Consulting Blogs Sign in | Join | Help

Simon Munro

Data as a Service

I have recently been speaking about the reasons why NoSQL just won’t die and a common thread is the cost of keeping data. The basic idea is that SQL databases are inherently expensive places to store data, with costs coming from specialised hardware, software licences, networking and a whole army of associated skills to keep it all running smoothly. It is also the desire to store and process every single little bit of data that inhibits scalability as the processes and infrastructure simply cannot cope or change quickly enough to handle growth or peak demands.

Deciding what to do with all our data is not straightforward, so keeping it in a good ‘ol SQL database seems to make sense. There is little desire and few available skills to assess each piece of data as to where, how long, what format and what level of aggregation it should be stored in, so the knee jerk reaction is to store it at its most granular level in a SQL database with ACID, failover, backups and everything else that may be part of the standard solution.

I think it is safe to say that most systems store far more data than is necessary. I would also assert that in high growth environments that the inability to effectively scale will inhibit growth. So the question to ask ambitious business units and product teams is this: Are you comfortable that your ability to grow your business is inhibited by the large amounts of data sitting around that you never use and will probably never need?

I would think that given the choice, businesses would not want IT systems inhibiting their growth, and even less so if those systems are providing something that they don’t need or use. So the basic principle is pretty straightforward, but how do you go about changing behaviours? Go and analyse the lifecycle of every piece of data and try and determine how we should store it, in what format and for how long? I think that this may be a futile exercise similar in magnitude and success to the ‘Enterprise Data Model Project’. Not only don’t we really know how to go about doing it, but the answers to the questions will often be the same, as the cognitive dissonance will remain, and data will land up in the same place it has always been.

Recently I was discussing with someone the success that colleagues of mine had building a retail web site that only touched the SQL database when an order was placed and all application state, such as the current basket, was stored elsewhere such as distributed in-memory cache. Things such as baskets would normally be considered useful data that should be in the SQL database for analysis reasons, after all, how can you determine conversion rates (crucial in ecommerce) without basket vs order information? The answer, in this case, partially lay with the use of Google analytics, where the project had some really smart and highly skilled people that know how to make it work. Google analytics performs a very specific service, storing and processing data that serves a particular purpose. Ten years ago we would have collected this data in our own website, stored it in our own database and spent a fortune trying to write reports and make sense of it. Now we simply put some JavaScript in the web pages, pay a nominal fee and let someone else provide the storage, processing, front end, training, support, and documentation for, in the case of web retail, a huge chunk of the data ‘needed’ by the system.

Is Google Analytics NoSQL? In a sense that data previously destined for a SQL database is no longer being stored in SQL, yes but, in the sense of the NoSQL movement, it is not a piece of software that needs to be mastered and installed on a bunch of servers. Google Analytics is, from an architectural perspective, a data service – you give it some data and it does a whole lot of things with it, at the very lest allowing for storage and processing of the data.

I think that over time we will see the emergence of more specialised data services for storage and, perhaps more important, processing of distinct chunks of data. We already have data services in cloud computing platforms – Azure Storage, Amazon Simple Storage and so on, but these are more generic storage services, without offering (much) processing, rather than specialised services that provide specific functionality to a particular shape of data. I can imagine data services where you would unload a whole list of your system requirements. The Holy Grail is authentication and authorisation, but there are others that may be more niche and less contentious. You could offload product catalogues onto specific services that handle searching for you. How about content management services that can take all the supporting infrastructure and front-ends away? After all, your content is usually in cache anyway so occasional refreshes off a distant server shouldn’t be that much of a problem.

Even those examples are generic, how about something that is even more specific? I’m currently developing an appreciation (the hard way) for the complexities in the mapping and geo-aware data world and can picture a few data services in that domain. I can imagine a data service that stored routes, only. I think there are a few mobile applications that could make use of that – like cloud-based bookmarks, but with complex geographic structures. And if that became a standard that could be shared among applications, the richness of photos, video, searching, directions, bookings and meeting friends could change dramatically. Opportunities arise, functionality increases and customers are happier because we chose to farm out (apparently) crucial data to a service that is better at working with it.

Obviously there is risk associated with giving your data to someone else. Aside the overall trust issues (what else does Google do with your data?), whenever you pitch the idea there will be a flurry of emails quoting auditing or regulatory reasons why it shouldn’t be done. However, as attitudes change (a lot of individuals are happy with all of their email sitting with Google), more robust SLAs emerge and cloud computing interoperability improves the reasons why not to let someone else store and process your data will begin to be overwhelmed by the benefits in doing so.

In the next year or two, the market for such niche data services will largely be the domain of startups (where funding may be better spent elsewhere) and offerings that need a reduced time to market and the idea of plugging in a service that ‘just works’ is the only way to deliver on time.

There definitely is an opportunity for providers of data services as early entrants can wrap up their little corner of the market immediately. More importantly, to the broader market, there are the opportunities for new solutions to take advantage of the services style in order to accelerate development and reduce overall cost.

Provided, of course, that we are prepared to alter our views about data location, format, control, value and lifecycles.

Simon Munro

@simonmunro

Published 21 April 2010 21:40 by simon.munro
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Personal Edition), by Telligent Systems