Of late one may have noticed that I have become very interested in high-brow, generally vaporous, disciplines such as RESTful data services, data interoperability and cloud computing whilst coincident with that has been the inexorable rise of the term “mashup” in the information technology lexicon.
“Mashup” means different things to different people but to me its simply the practice of combining data from multiple places with the aim of discovering or passing on knowledge that wasn’t known before. Well hey, that sounds a lot like what I do in my day job; the main difference being that I don’t generally hear the term “mashup” being bandied about the London meeting rooms that I frequent to the same extent that it does in the funky web 2.0 and swanky startup world; the term I hear (and use) is the considerably less cool “data integration”. Fundamentally though I don’t think there’s that much difference between the two so maybe enterprise data integration people like myself have something to learn from these so-called mashup players.
One of my favourite mashup tools out there is one I’ve spoken about before – Yahoo Pipes. If you haven’t had a look at this it really is worth taking a glance. Yahoo Pipes enables you to extract data from multiple web-based data sources, transform it using a series of operations like sorting, joining, unioning and filtering before finally outputting that transformed data in one of a number of different formats; its a data pipeline for web-based data (A pipeline? Oh, there’s something else I’ve talked about before – noticing a pattern here?). Here is an example of a Yahoo Pipe: Yahoo Finance Stock Quote Watch List Feed w/Chart
Extracting. Transforming. Sorting. Unioning. Filtering. Outputting. This Yahoo Pipes thing is starting to sound awfully like ETL tools such as SQL Server Integration Services (SSIS) wouldn’t you say? They even look a little bit like each other with their boxes joined up with lines between them:
I’m now reminded of what my good friend Andy Britcliffe of Sharpcloud once said to me upon reading my blog post (and viewing the embedded video) Consuming web services in SSIS 2008 a full two years ago. I distinctly remember Andy’s words on that occasion: “SSIS is the ultimate mashup tool”! I didn’t disagree!
Most mashup tools share one common characteristic in that they invariably require someone with some technical nous to set them up in advance so that they can be used by the less tech-savvy amongst us and the same applies in enterprises as well; data is distributed by the IT guys to the information workers and this distribution of data typically takes months whereas the consumers of that data want it available in hours. In both arenas I sense a shift occurring; now the consumers of the data are being empowered to find and interrogate data for themselves and in the enterprise this is happening through the adoption of tools such as Qlikview, Omniscope and (in the near future) Microsoft’s Gemini. I find this to be a fascinating development not because it means there may be less work for me to do (admittedly that would be nice) but because information workers now have the opportunity to be much more productive in their daily jobs and I expect those who invest in learning these new technologies to be the cream that rises to the top of enterprises in the near future.
Up until recently I hadn’t been all that interested in Microsoft’s Gemini project, indeed I was very sceptical of it, but as I started to formulate some of the thoughts that I’m writing about here I began to realise how important it will be when it gets released sometime (hopefully) in early 2010. I earlier described mashups as being “the practice of combining data from multiple places with the aim of discovering or passing on knowledge that wasn’t known before” and that description fits very well with Gemini. If you don’t know what Gemini is take a look at this video:
That demo glosses over the main point I’m making which is that here we see data that is originally pulled from multiple sources and combined in a familiar place (Excel) where the end user can consume it. The person speaking in the video is Donald Farmer and he has a blog entry with many other links to Gemini resources at Microsoft Project Gemini links.
At the top of this email I also talked about how I’m interested in data services, that is data available over the web that we can consume via an API and use for our own knowledge discovery and I was introduced to such a data service just yesterday when listening to Jon Udell’s “Interviews with Innovators” podcast. In the most recent episode Jon interviewed Stephen Willmott whose company 3scale Networks has taken it upon themselves to make data held by the United Nations freely available via a data service to anyone that would like to consume it [UPDATE: Read Jon's own writeup of the interview at Influencing the production of public data]. For example, if you want to know the United Kingdom population’s annual growth rate since 1991, that data is available, for free, at http://undata-api.appspot.com/data/query/Population%20annual%20growth%20rate%20(percent)/United%20Kingdom?user_key=XXXX (you need to sign-up for a free user-key and substitute it for XXXX in order for this query to work) and is returned like so:
“Wouldn’t it be cool” I thought, “if I could consume that data inside of Excel using Gemini”, perhaps in this example to combine it with birth rates over the same period to discover if there is a correlation between the two. At the time though I didn’t know if Gemini made it possible to consume data directly from data sources so I went straight to ask the man who would know, the aforementioned Donald Farmer. I contacted Donald over Twitter and here is the conversation that ensued:
- Me: @donalddotfarmer Is there a list of data sources types from which #Gemini can get data? Interested in data from web APIs e.g. undata-api.org (link)
- Donald: @jamiet I'll need to check out that site in particular, but we do support Atom feeds. (link)
- Me: @donalddotfarmer Ahh that's good news. How about POX/RSS? Does Gemini allow us to parse it or use XQuery? (link)
- Donald: @jamiet No we don't support XQuery - we just consume Atom feeds as they come - the users can then filter and sort in Gemini (link)
- Me: @donalddotfarmer OK, so Atom only right now. Looking forward to getting hands dirty, think I know what 1st feature request will be :) (link)
Lots of techy abbreviations in there so let me summarise. Gemini will be able to consume data from web services that deliver it in the popular Atom XML dialect (more on Wikipedia) which is great news and no great surprise given that Microsoft announced in February 2008 that Atom would be their XML syndication format of choice going forward (see my blog post Windows Live Dev announcements for a more complete commentary). I happen to know that the United Nations data provided by Stephen Willmott is not currently delivered in Atom format but no matter, at least things are moving in the right direction and as I alluded during my last tweet to Donald I’ll be asking for support for other syndication formats in the future.
This has turned into a rather rambling blog post so I’ll call a halt here. As always though I’d be interested to know other people’s thoughts on data services, usage of that data in enterprises or anything else I’ve mentioned herein so if you have any thoughts please leave comments in the space below!