Recently I went on record as saying:
everything that travels on the internet is data in one shape or another so I believe that the skills and experience that we as data professionals apply today in hosted on-premise database applications will serve us well when data is stored in the cloud
I figured I should explain that point of view a little more which I'll attempt to do in this blog entry.
Where to start though? Well my interest in cloud computing began when I was recommended to read RESTful Web Services by Leonard Richardson and Sam Ruby sometime in Spring 2007. Many people whose blogs I read such as Jon Udell, Pablo Castro and Alex Barnett were mentioning it in dispatches but most of all the bloke that I shared a lift to work with every day, Andy Britcliffe, often spent a good part of the journey eulogising about it; so I decided to jump in head first. I learnt a lot from reading that book and it gave me a new perspective on my day-to-day activities so I highly recommend it; if I had to distill what I learnt from it into one paragraph it would read like this:
Everything that travels on the web be it a web page, an instant message, an email, a blog entry or a search query result is data. The RESTful way of thinking stipulates that every piece of data on the web is something which is uniquely identifiable and which gets created, read, updated and eventually deleted.
That should sound fairly familiar to anyone used to storing data in a database. Every row of data in a database is (or should be) identifiable by a unique key and will get inserted (created), selected (read), updated and deleted. In other words the web behaves semantically in the way that we as database pros are all used to; CRUD-like.
When you think about the web in CRUD terms then concepts that we are already fairly familiar with start to become apparent. Querying, indexing, data storage, replication, aggregation, high availability. This is familiar terminology and it is terminology that increasingly we are seeing applied to the web. Take the following URI:
If you hit that link then you are taken to a web page that provides what Live Search thinks are the 13th and 14th most relevant web sites relating to the search term "jamie thomson". The information may be different tomorrow but we get some results and they are indisputable based on Live Search's algorithms. Similarly we can tweak that URI to provide the same information just in a slightly different format by applying an extra query predicate:
Same information, we just see it as an RSS feed. The fact that all we had to do was tag a predicate onto the end of the URI is significant because it means that all we need to view the data is a web browser. ANY web browser. That sort of simplicity can't be understated and is a key tenet of RESTful web services.
[In case you were wondering, the reason I chose to use Live Search for this example rather than Google is that I don't know offhand how to return Google's results as an RSS feed. Doubtless someone will tell me in the comments.]
Let's think what had to happen in order for Live Search to return those results to us:
- Massive amounts of data are retrieved and stored in a datacentre somewhere
- That data is replicated to different data centres in order to make sure that its always available
- The data is indexed in order that it can be returned to us quickly
- We query that data using predicates that limit the returned results
As before, there's nothing here that is alien to a database pro. Storing and moving masses of data, indexing it, querying it and making it highly available are our bread and butter so from a 10000ft view I don’t see anything here that is a great departure.
That’s a short synopsis of where my interest in cloud computing comes from and over the next few months you may see a slight diversion from my usual content as I explore RESTful services and related subjects more deeply. Fear not though, I do remember what pays the bills and I’ll still be talking about good old SSIS and SQL Server as and when there’s something to talk about.
-Jamie