|
|
It's not a hobby, it's therapy.
-
Well, it was comedy all the way at the start of day two. I arrived in the main hall for the keynote speeches to be greeted by a 12 foot image of our glorious MD and serial blogger, Dorf and his glamorous assistant David Portas in a video presentation of the work Conchango has done with McLaren F1 using the filestream component of SQL Server 2008.
Following that we had a speech from Ben Stein; you probably know him as the boring teacher in Ferris Bueller's Day Off..."Bueller.....Bueller...". He's actually a respected economist and he gave a great speech about the state of the world's economies at the moment, throwing in a few jokes and discussing some of the major social and economic problems facing the US today. Not what you expect from a keynote but compelling stuff.
Having sat through three keynotes, I'm now concerned that I haven't heard any mention of two areas that are important to the work I've been doing for the last 3 or 4 years and in one of those cases which I thought would be fundamental to Microsoft's strategy for BI going forward.
The glaring omission is Master Data Management. In 2007 Microsoft purchased Stratature, one of the leading MDM tools on the SQL Server platform which looked at the time incredibly sensible since it filled a big gap in Microsoft's portfolio for BI and data integration.
I'm left wondering where Microsoft see Stratature (now "Micorosft MDM") fitting into their vision for BI; clearly it’s not near the top of the priority list. There have been several posts on these blog pages about what MDM is and where it fits into data integration architecture, and one of its key roles is in the enablement of data services as part of an SOA. This is the second omission from any of the keynotes so far and I'll come back to it.
Microsoft seems to be playing safe with the data warehouse as the future of BI. With Project Madison on the horizon finally bringing the ability to scale, that's not surprising. It’s a steady strategy that is proven and popular and the components that make up an MDM solution have historically been buried in traditional data warehouse projects. You can't do BI or build a data warehouse without using components of MDM - data mapping, cross reference, hierarchy management, data quality - but taking these concepts and building them out into a separate application promotes the ability to build more agile applications and integration technologies that can reuse the services that are captured in that separate MDM application.
I went to the MDM Customer round-table and got the first two questions in. OK they were probably a bit out of context for a customer session but worth throwing in there. The first was basically "where is MDM in Microsoft's strategy?" - The only response was from Stratature founder and compere for the session, John McAllister who acknowledged that "...we didn't get top billing". My second question asked where the panel thought MDM lived in the world of Gemini where users were being encouraged to do their data integration on the desktop. They looked at me like a taxi driver might look at someone running towards them at five-thirty on a Sunday morning in the rain wearing only a pair of underpants. The panel went mute but John moved us along by answering that if there was a need to map between different reference data, then Microsoft MDM would be a good solution for that. The problem there of course is that that doesn't quite fit the vision of "self service BI". Anyway I was left with the distinct impression that we have two teams driving forward but not speaking to each other a great deal.
Many of Microsoft's competitors are branching out into areas like EII. Forrester paint a good picture of how EII works via the provision of an "information fabric" - a virtual layer that sits above a set of integrated data services exposing data through a SQL (for bulk loading or performance requirements) or web service interface. Behind the fabric sits more traditional looking data integration where data is extracted from systems of record, integrated though reference to an MDM system and finally published to the interface. There may be a database behind the scenes holding data (the cache or warehouse) but equally there may be data services that issue pass through queries to systems of record. The point is that the data that is presented to consuming applications is consistent, clean, integrated and the single version of the truth, and the consumers don't need to know the actual pattern that is used to get their data. The added benefit of the realization of more real time access to data is also important.
This seems so far away from the DW-centric strategy coming from Microsoft that I'm wondering if it’s even on their radar.
I've heard people mention Project Velocity and Project Astoria as potential ways to enable data services. I'm not going to comment on them here but neither was mentioned in any of the keynotes or sessions that I attended so I think if you're looking for them on the BI Strategy priority list, I'd start at the bottom and work upward.
Things took an unexpected turn when I went to Steve Walker's session on Building a BI Competency Centre. Steve is a data architect in the Database Consulting group at Chevron so our paths cross frequently. At the end of the slides, the audience woke up and it turned into an impromptu question and answer session and eventually Steve started fending questions in my direction. This started when someone in the audience asked about the work done in Aberdeen on Project Seer to enable BI through Service Orientation. "This is your lucky day, we have the author of that paper in the room....." and so it started. It ended with Steve changing the deck on the fly to put my email address up in lights. I'm claiming that as a speaking slot and will expect my Microsoft BI branded denim shirt in the post.
I was also approached in the Chevron session by someone who'd spotted my badge. I was expecting another question on Data Services but all I got was "...do you work with Jamie....I'm a fan of his..?"
OK. Timeout. For those of you out there missing Jamie's blog, I can ease the pain slightly by referring you to an interview with the great man. In summary, he “likes nothing more than getting the laptop out and hammering away", which I think we've all done in our time, however given his recent marriage his hammering is clearly taking priority over his blogging. Don't worry, he'll be back as soon as his two week honeymoon in a caravan outside the main gate of the Redmond campus is over.
The conference party was at the Qwest stadium, home of the Seattle Seahawks. It was good to get to walk around the pitch and have a few beers with some fellow-Brits from Contemporary, but events that encourage 3000 overweight men to queue up to play rock band karaoke aren’t usually on my to-do list and all the free beer in the world sometimes isn't enough to banish the crushing sense of despair I feel working in this industry.
Back to the conference. Amir Netz gave us more details about Gemini, in a presentation focused around "That Guy". Expect this to be coming to a conference near you, repeatedly over the next two years. The spin was slightly different to the first day's keynote in that it focused on the overwhelming odds against the BI practitioner when facing "Those guys" that propagate Excel Hell throughout the organisation. The premise of Genesis is that it’s much better to bring them into the BI effort in a controlled manner than to try and stop them, which given their numbers is impossible.
So some more technical details for you.
In this demo, Amir pulled 100 million rows into Excel. Data is stored in memory in a SSAS instance tied to the spreadsheet. In memory will be a new SSAS storage mode. There will be a lightweight set of ETL-like tools available within Genesis to transform the data on the way into Excel. ETL on the desktop folks. Any defined transformations will be re-applied when the data is reloaded. Relationships between data will be determined by a heuristic engine in the background that will look for column name matches or similarities and might even look at the data to see if relationships can be inferred from the values of the data elements themselves. Where the system isn't sure, the user will be prompted.
When published, the "sandbox” is posted back to the SharePoint server and the cube from then on is hosted in memory on the server, along with any other sandboxes that have been posted back. The server will automatically configure multiple SSAS instances behind SharePoint and connectivity between the UI and its cube will be enabled through SSAS web services. You'll be able to see who is using your stuff through some "socialisation features", such as the names of people using each spreadsheet, its popularity etc.
Amir also demonstrated a nice looking admin UI - the operations dashboard - which showed a summary of all the sandboxes by popularity, size, queries etc and enabled the ability to drill into specific areas to show how trends for each sandbox had changed over time. There is also the capability to monitor the performance of the server in terms of both memory and CPU to diagnose potential problems before they happen.
Security will be in the file through SharePoint, not the data - so if you do pull those salary figures out of the HR database, do remember to lock it down in SharePoint.
Finally there will be an option to take popular sandboxes and hit an "upgrade to performance point" button and move them across to a regular SSAS instance.
The session was packed out and it seems to be universally popular among the attendees and I can't deny that we have seen some very slick demos.
So that’s the end of the conference for me and the last of my briefs. The headline act was undoubtedly Gemini which stole the show.
There is going to be a lot of hype around Gemini and don’t forget this is two years away so maybe I’m expecting too much of a complete vision at this stage. I have concerns around governance – without which we’re just automating bad practice; the push for integration and transformation on the desktop is a worry and if I’m sat in the audience next year I’d want to see Gemini plugged into something much more realistic to see how the UI works with several large dimensions as filters. The stuff we’ve seen so far looks like EIS for the 21st century.
I’d like to see some some slightly deeper BI questions thrown at the tool that makes the SSAS engine on the desktop have to think beyond SUM(Sales) reports. We also need to see the realities of server requirements to cope with the creation of dozens of user generated sandboxes.
The lack of clarity of where MDM fits in to Microsoft’s strategy has been a disappointing omission this week so I would like to see that given higher priority next year and hopefully by then we’ll know more about how Zoomix will be integrated into the stack.
I’ll end on the most interesting piece of news from a Conchango/EMC perspective – the DATAllegro acquisition is bearing fruit and with EMC hardware part of the reference architecture, Conchango’s history in implementing large enterprise data warehouses and our as gold partner status with Microsoft in BI we’re in a pretty unique position to help people take advantage of the advances in the technology.
Finally using all my powers of investigative journalism, I found out the correct name of the cover band on stage on Tuesday morning. Wait for it.....The Dudley Manlove Quartet. I couldn't make this stuff up. It’s not a great name from a merchandising perspective is it? I'm still looking round for a tour Shirt though - "Manlove!" on the front and "Sleepless in Seattle 2008" on the back.
Preferably in denim.
Maybe sleeveless.
|
-
It was nice to start the day drinking Starbucks in Seattle. It has that feeling of authenticity about it like drinking Boddingtons in Manchester or getting head-butted in Glasgow. Except of course, that Boddingtons is no longer brewed in Manchester since the Belgian owners moved production from Strangeways, where it had been for 200 years, to South Wales. That’s why Boddies isn’t advertised as the Cream of Manchester anymore, merely The Cream. Not that I’m bitter (boom boom).
It was equally exciting to walk into the main hall for the first keynote this morning to be hit by an ageing guitar band belting out cover versions. The PA was so loud that the distortion made hearing the between-songs banter difficult, but I think they were called The Deadly Hanjdob Quartet. They finished with a rousing rendition of Rick Astley’s “Never Going to Give You Up”, which was bad when it was released in 1987, hasn’t aged well and suffers further when butchered by an old bloke sweatting in a suit. 0830. Slightly better than an 0530 semi-naked rain filled taxi encounter, but not much. Mornings just aren’t for me.
So onto the keynote and the first hour was a little disappointing. Stephen Elop, president of Microsoft’s Business Division spoke about Microsoft’s vision for BI.
This started by stating that 10 years ago, BI was a relatively immature discipline.
Eh? Microsoft’s offering (SQL Server 6.5) may have been immature but the rest of the world was moving on apace. There were multiple ROLAP technologies out there , several OLAP vendors like Essbase, Oracle Express, Holos, Cognos and BusinessObjects, though at the time BO had this crazy idea that pulling data back to the desktop, building a micro-cube on the fly and using that to render reports was the way forward – it was called Desktop OLAP (DOLAP).
I’d only just stopped coughing when he went on to say that Microsoft had transformed the market (fair enough) and with Analysis Services 2008 they had a product which in scalability terms was “virtually unlimited”. Hello? I’m not even going to rise to that one.
So having let myself get wound up by the marketing pitch, I was knocked sideways by the announcements from Ted Kummert, VP of Data and Storage platform. First off we had a demo of how the DatAllegro purchase might work with SQL Server. A 24-server shared-nothing SQL Server cluster with DatAllegro orchestrating things was demoed against a partitioned 150TB, one trillion row fact table database using some simple Reporting Services reports to return data in seconds. OK, so it was a sanitised demo and the queries were selected to run quickly but 150TB in SQL Server? Better news for us is that Microsoft is working to select standard hardware for a reference architecture and the storage aspect is covered by EMC. It brings a warm glow to the Galloping Data Architect's cockles - if only I knew the company song.
It was suggested that this would be licensed as a separate SQL Server SKU and if you’re looking for more information, its called Project Madison. Of course a cynic would ask what the point of this expensive acquisition was if they already have “virtually unlimited” scalability in SSAS 2008 – 150TB cube anyone?
Last but not least was Microsoft’s new vision with Project Gemini the "unique vision" that would address the needs of those 80% of users that currently wanted, but didn’t have access to BI at their desktop.
Unique vision…errr, ok, maybe I misheard Michael Saylor, CEO of MicroStrategy announce a vision to put “a crystal ball on every desktop” back in 1996. That would be 2 years before Microsoft released OLAP Server with SQL Server 7 in that “immature” BI market.
We got a demo of Project Gemini from Donald Farmer; this is the Microsoft vision of “Self Service BI”, where users can load data from varying sources into Gemini using an Excel Add-in and then create pivot tables based on the data sourced by Gemini. Results/reports/applications can be published back to SharePoint for use by other users. In his demo, Donald pulled back a 20 million row fact table into Gemini (which sorted and filtered the data almost instantaneously), combined it with an “external” data source, showed the integrated data model created dynamically in the background, created a pivot table report and simple drill down dimension structures in Excel and finally formatted and published the results as a reasonably tidy looking dashboard application back to SharePoint in a matter of minutes.
The UI that is created can include dimensions as slicers for basic report filtering and any selection within the slicer propagates to all pivots and charts on the sheet. Simple drill down is also supported (within the context of your Gemini data set).
Apparently SSAS is the engine doing the work behind the scenes so there must be some sort of dynamic cube created which attaches itself to the workbook to serve up the data so quickly. It was a blinding demo, typical of Donald who is clearly excited about the capabilities and power that it can bring to the desktop – the 80% of users that Stephen Elop targeted. Microsoft is putting their subprime mortgage on Excel.
The Microsoft view is that it is better for users to go out and get their own data that for that overworked IT department to source it for them and add it to the data warehouse which would take forever to prioritise and implement.
After a few hours reflecting on this I began to have a few doubts.
The Microsoft BI vision is one of guided analytics, lightweight dashboards with Excel as the tool of choice for "power users" and Reporting Services becoming more of a user-oriented tool for self service reporting. I get the impression that Microsoft is targeting Business Objects here and form is definitely taking precedence over function. ProClarity is sadly dead and train of thought or high-end analytics doesn’t appear to get a look in.
I’m not sure I want all my users going out and dragging multi-million row result sets into Excel to work on locally. If I have 500 users in my company, how many times would the same result set have to be stored into multiple Excel reports for things to become unmanageable? Repeat that for the set of available data. And all this posted back to SharePoint?
Where is the single version of the truth in this architecture? I’ve just spent 4 years of my life trying to convince users to stop using Excel as a data store and here are Microsoft positively encouraging it. Hell will freeze over before this capability is used responsibly in most organisations.
Is it really too much to ask for external data that is useful to the business to be included in the data warehouse? Surely we have gone past the point where we spend 3 years building monolithic databases that don’t have the flexibility to incorporate new requirements as they are discovered?
I also have severe doubts about having users integrate data at the UI. Again, how many users will have to pull the same external data source into their spreadsheet before that has cost more than having it sourced once centrally. Where is the ability to share or reuse that integration? Where do the metadata mappings take place – because in the real world external reference data isn’t going to exist in the same context as the stuff inside the organisation, someone will have to provide mapping. I have this horrible vision of Stratature being served up in Excel for users to do this on the desktop. Personal MDM.
The other flaw in the argument is that the IT department is supposed to monitor what the users are up to and identify reports that are being shared by multiple users, have become business critical or which are becoming too large and bring them back into the IT department. This is obviously a different IT department to the one that was too busy to bring that external data source into the corporate data warehouse in the first place but have time to watch chaos unfold on SharePoint.
All the BI focussed updates will be packaged into a SQL Server update code named Kilimanjaro. This is separate to the 24-36 month cycles of major SQL Server releases and the date given for release is first half 2010.
In other news, there will be an October feature pack featuring Attunity connectors for Oracle and Teradata and the release of Report Builder 2.0 which makes several significant enhancements to the usability of the product including a rather nifty looking shared component library which allows users to select components developed by others and also highlights any changes made back to the original author to give them the option of pulling the revised version into their report. All the handling of data sources and fields is done automatically in the background. This functionality comes from Microsoft’s acquisition of 90 Degree.
Microsoft have also bought Zoomix which will appear in the next major release as SQL Server Data Quality Services, part of the SSIS stream. This sounded interesting and fills an obvious gap in the portfolio but no further details were given today.
So, as I strolled back from the Microsoft BI partner event last night, where its worth saying I was in a minority of one with most of these concerns, I reflected on what seems to the revival of DOLAP as the future of Microsoft BI (I'm really hoping to be proved wrong on this one), the dumbing down of analytical capabilities in the front end and a world where 150TB data warehouses are fronted with Reporting Services. Time to dig out my old MicroStrategy 4 certification perhaps?
The really scary thing though – other than the demographic that walk Seattle’s streets in an evening which seems to be 30% business people, 30% geeks and 30% tramps and hobos - is that most of this stuff is at least two years away. By then we will be 13 years into Microsoft’s strategy for BI, the competition will have had plenty of time to react to these latest announcements anyway and there will be a 60% chance that I have been bored or bludgeoned to death on my way back to the hotel.
|
-
I’m at the Microsoft BI Conference in Seattle and thought it might be worthwhile trying to keep a blog of all the hot news at it unfolds. My record in this area isn’t great – I only managed part one of two parts when I tried to blog about the 3-peaks walk in 2007 so the chances of seeing another blog on this subject are about the same as seeing James Pipe in Houston in a strong wind.
My day started badly. At 0440 the phone rang. There are only two possible thoughts when the phone rings at that time. The first one is “where am I”, which lasts until the second ring. Then you switch to “who’s dead?”
I was actually due to get up anyway to get ready for an 0515 taxi to the airport. The phone showed “international” in the caller display so I ignored it, figuring it was someone trying to sell me something to enhance specific parts of my anatomy…but they did leave a message, which disturbingly, when I rang back was from KLM telling me that my flight from Manchester to Amsterdam had been cancelled and they’d reassigned me to a Delta flight leaving at lunch time with a one hour connection in JFK on the way to a 2200 arrival in Seattle.
Now, I don’t now if you’ve had the misfortune to try and make a short connection at JFK, but there are a lot of planets that have to come into alignment before you can get even close. Assuming your flight is actually on time and they get the luggage off in time you still have to get through customs without having to settle down for the usual three year encampment in a refugee village at the homeland security queue and make it through the terminal to your connecting flight.
Helpfully, the lady from KLM didn’t leave a contact number so I had to unpack my laptop, boot it up and scour their website for a useful looking number. By the time I’d got through the automated enquiry system, spoken to someone human and convinced them to move me onto a Leeds Bradford Flight which would still make my original connection in Amsterdam, my taxi was waiting outside and I had to run out into the street in the rain wearing only a pair of underpants find out if the driver could go to Leeds Bradford airport rather than Manchester. It was 0520 on a Sunday morning. Does this kind of stuff happen to anyone else or is it just me?
There is something quite demoralising about getting on a plane in a wet, grey, cold city; spending 14 hours crunched up in a seat at the back, then getting off in a wet, grey, cold city.
This is the first time I’d flown Northwest and I was in seat 38G – an aisle seat at least but so far towards the back that I had to cross an international date line to get to it. On top of that, or rather underneath it, there was some sort of metal box which must have been integral to the operation of the aircraft as I put a significant amount of effort into trying to move it, without success. The result was that I could stretch my right leg out into the aisle but my left was scrunched up underneath me. After six hours, pain had been replaced by numbness but when we landed after 10 and a half hours, I found that my left leg was 3 inches shorter than my right, which made the sprint to the homeland security refugee camp painful for me and disturbing for those I hobbled past.
Washington StateTrade & Convention Center doesn’t look to have changed much at all. I took a walk up there to register and to check out the welcome reception full of optimism since the brochure in my hotel room promised “millions of dollars worth of improvements”, and it’s been about five years since I was last here. Well, the T-shirt shop in the lobby is still there, looking like it would be more at home in Blackpool – who buys anything there? In fact nothing appeared to have changed at all including the army of elderly women that they get to police these events. These girls make Dad’s Army look like a bunch of pre-pubescent youths. I’m sure I actually recognised some of them from the last conference I was at here, but many of them seemed close to death 5 years ago. Perhaps they are actually undead and the only way to get a job with them is to sacrifice a DBA at every Microsoft conference that takes place there and drink his blood. Anyone missing any DBA’s?
Registration was easy and the welcome reception offered free beer which helps. The clientele were the usual set of uber-geeks huddled round in circles excitedly discussing technology. What was particularly disturbing on this occasion was that there were people there filming it. Has the world gone mad? Who in their right mind is going to watch a feature length documentary of overweight geeks chatting to each other about SQL Server? It would be some kind of snuff-movie-in-reverse whereby anyone unfortunate enough to watch it would be so bored that they would be unable to get out of their seat and eventually die as their neural functions shut down at the sheer horror of it all.
I avoided the cameras and made my way to the Microsoft store. It’s a bit like the T-Shirt shop downstairs in that I wonder who does their clothes shopping here to keep it going. Who wears Microsoft embroidered denim shirts? They break fashion rules in about 8 dimensions. If it’s you, please let me know, my brother-in-law is a doctor and he might be able to refer you to someone who can help.
I nearly fell over in the bookshop – there it was, glistening in a shiny white cover in front of me – I suspended disbelief for a moment to run my hand across its polished dust jacket. Wow. “Data Modelling for Developers”. It was an understandably short book and I resisted the urge to open it, satisfying myself that all it could possibly contain were a instructions on how to start SQL Server Management Studio, documentation of the CREATE TABLE script and an appendix picturing the developers’ top 100 all time favourite do-nuts.
I did consider buying Kimball’s book to tape to the sole of my left shoe to even up my leg-length but eventually decided against it on the grounds of cost. For the record, they were also displaying Inmon’s DW2.0 book – I would say “selling” but of course this is a Microsoft conference so the chances of anyone buying it is remote to say the least. Several people seemed to be displaying burns just from touching it.

Back in the welcome reception, the vampires were serving the food which appeared to have gone through Wonka-Vision on the way to the dining table. The combination of nibble-sized bites, small plates, free beer and the single long table stretched out in the middle of the room with the food groups repeated along its length was asking for trouble.
It took over 10,000 scientists from 100 countries, 25 years and 3.2 billion Euros for CERN to construct the Large Hadron Collider in Geneva. They could have saved their money and come to this conference. As the Developers got the scent of food they moved in towards the table and filled up their plates – however being unable to get enough food to satisfy their hunger, they kept refilling as the next group of miniature Alaskan crab cakes came by, eventually looping back on the other side of the table for a second pass. As more Developers joined the procession, the loop had to speed up, giving each person had less time to select their favourite nibble, meaning they had to spend longer looping round the table at ever increasing velocities. It was fascinating for a while until the appearance of a group of hungry DBA’s inthe reception area making for the food table. I didn’t want to stick around when they joined the line – although it would have been nice to take the credit for the discovery of the Higgs boson, I didn’t want to be there if the collision caused the creation of a black hole, the explosion of a room full of overweight technologists and ultimately the end of the world.
|
-
I'm back!
It’s been a while since my last blog post but reader-pressure has forced me to come out of retirement.
Firstly I received an email from Tammy Freeman asking me the following:
“One of my tasks at work is to come up with documentation on cost-efficient training plans for … data architecture. We are looking for training through such mechanisms as CBTs, books, and websites. … After reading your blog, I feel that you may have an opinion in this area”

Whilst thinking that one over, I attended one of Conchango’s regular Community days where a number of people asked me to revive these posts.
One devotee was friend-of-the-Galloping-Data-Architect, David Seymour, who happened to be sporting a clean shaven visage and definite signs of a back-wax under his crimplene shirt, leading me to the conclusion that he’s courting (he’s the one on the left by the way).
Which reminds me that when I refer to Conchango, I really mean EMC; much like Mr. Seymour, we have finally been approached, romanced and gobbled by a much bigger admirer.
Back to the question, what’s the career path to becoming a DA? It sounds straight forward and as someone who claims to be one and also who mentors others in that direction you’d think I’d have a ready-made answer. I don’t. In fact my arrival at this particular role was more a result of luck than judgement.
I’m not going to go over what I think a Data Architect is as I’ve covered that here.
What I will say is that I think there are two types of data architect – on smaller projects the work of a DA is usually picked up by one or more of the team members; data modelling, data integration design, source to target mapping, data quality investigation, system of record investigation - the kind of stuff will be done as a matter of course by somebody already there to do something else, usually a DBA or one of the senior BI developers. This is technical DA work and happens every day on every BI or DI project.
There comes a point however when someone has to oversee the end to end data flow and impact of technical decisions across multiple similar data integration or BI initiatives, and that means stepping away from the technology.
This is a weird moment in your career; for me it happened when the work we’d done in one business unit was adopted as the standard for the rest of the organisation. Suddenly, ideas you were happy to espouse in the comfort of a project room in Aberdeen are exposed to lots of new people around the world. And by nature, it seems, people are sceptics.
Your usually empty calendar fills up with meeting requests, life begins to revolve around the production of documentation and you engage in heated discussions about why a supportable compromise is actually better for the customer than a technically brilliant solution that only Steven Hawking could offer level-3 support for.
Teams sift through your strategy like ferrets looking for holes, yet somehow don’t seem to read any of it. And then suddenly you notice that SQL Server 2005 has become SQL Server 2008, you have lost any deep technical knowledge of any of the products that are being implemented around you and your hair goes grey from worrying about what you’re going to do when all this comes to an end.
Consolation comes in stolen moments building a data model or seeing if you can still deploy a Reporting Services report.
I guess this is the same for architects everywhere. There’s nothing quite like interloping into a meeting of Solution Architects with a high level, strategic agenda. It won’t take long before someone locks the door, draws down the window blinds and fires up a projection of their home C# development project whilst the rest of the room emit deep throated growls of pleasure and rub their thighs frantically as thousands of lines of lovely code scroll upwards before their wide eyes.
Looking for a more formal career path isn’t as easy as you would think. Go to Amazon and search for Data Architect. You’ll get lots of related books data modeling, solution architecture, data quality, XML, distributed systems, data warehousing, ETL, data mining, metadata, EII, and oddly enough a highly recommended book about world peace; but nothing that takes you through the essential disciplines required to call yourself a DA. No Data Architecture for Dummies.
So my answer is to try and get an understanding all these things.
Reading textbooks bores the life out of me, but I have a few stalwarts on my bookshelf including “Building Enterprise Information Architectures: Re-engineering Information Systems” and “Enterprise Integration: The Essential Guide to Integration Solutions” which I'd recommend.
There are others up there but I can't see the titles for dust and given the cost, I’m loath to recommend too many since much of the information you need can be found on the web.
On that note, I tend to keep up to date by scouring useful websites like EbizQ, The DA Newsletter, TWDI, B-Eye Network, Kimball etc.
The analysts are also pretty useful; Gartner and Forrester do some excellent research across this subject area (if you’re subscribed of course) and run some good seminars and courses (the blogaholic and myself went to Gartner’s MDM event last year, and the next one is coming up in November).
In terms of formal education, I think an understanding of data modeling (logical, physical, OLTP and OLAP) is essential. The role of MDM in SOA seems to be gaining traction. A course giving an overview of enterprise BI can’t do any harm and if you’re working with a company aligned to a particular vendor, getting some detailed training in the technologies they sell in each area of the BI architecture is always going to be useful.
I can’t recommend any specific courses because a) I haven’t been on one for so long that I’d be guessing as to their quality and b) one that I recommend being run out of Colne might not be appropriate if you’re living in Oklahoma.
Having gone through all those alternatives, it turns out that I’ve not been able to see the wood for the trees. Tammy asked me what I thought of DAMA International. Initially I thought this was in reference to the Israeli transsexual that won the 1998 Eurovision Song Contest. It would have been an unusual request for discussion in a technical blog, but certainly one worth exploring. However it turns out that she is in fact referring to the Network for Data Professionals. 
Discussing data architecture might not have the wow-factor of male-to-female sex reassignment surgery but it is an excellent resource for DA’s and I’m a bit annoyed that I haven’t come across it before now. I’d recommend you go and take a look round, and if you’re based in the UK, consider attending their Data Management and Information Quality conference in London on 3-6 November. The speaker list looks pretty good, and for the record Rick Van der Lans took me through an excellent logical data modeling refresher course about ten years ago just after I'd joined Conchango from IBM.
So I've blogged a lot on this subject, but not told you a great deal, and certainly not answered the question which is a useful skill for a consultant. My next stop is the Microsoft BI Conference in Seattle next week, so maybe I'll have more to offer when I get back from there.
I was going to write this during a visit to Houston but I flew in on the Monday following hurricane Ike. Some areas of the city were in quite a mess and some of the team out there were still without power when I left 2 weeks later. The impact on the downtown area forced the office to close for the week, so I resorted to stealing bandwidth from Toby de Belder, a colleague who has a moulting cat and no vacuum cleaner. I’ve been coughing up fur-balls ever since I got back to the UK and my wife is using them to stuff cushions.
From a personal perspective, conditions in the JW Marriott were pretty grim, forcing me to live a quite basic existence for the week. The concierge lounge was closed, meaning I had to pay for breakfast, the bar was closed until further notice and most disturbing of all is that Starbucks were offering a vastly reduced selection of cakes and pastries.
The threat of the hurricane was all too much for James Pipe; he felt a stiff breeze the weekend before the hurricane made landfall and immediately climbed into his felt-lined snake skin cowboy boots, packed up his Louis Vuitton man-bag and was last heard emitting a cry of relief as he slipped quietly into the rear entrance of the Gaylord Texan.
|
-
The NFL came to the UK the other week, when the New York Dreadnoughts took on the Miami Flange Brackets at the “Wembley Bowl” in London (England). Apparently one million people applied for tickets to see 120 fat men rolling around in the mud, a claim not quite backed up by the 9000 empty seats in the stadium. Still, it was nice to see “football coming home”. And not only that, we even saw Eli (Bernard's lad?) Manning score a spectacular touchdown (which needn't involve the ball actually touching down, obviously).
Sports fans will of course know that America invented the oval-ball game, and this purest version of the sport was only corrupted when in 1876 William Webb Ellis in a break with tradition, shed six stones, threw off his Power Rangers fancy dress costume and managed to run with the ball for more than 35 seconds without needing a 12 minute time-out. Thus the game of Rugby Union was born.
Fans of the SSISmeister will know that we recently attended the Gartner MDM Summit in Florida. This was an excellent conference where there was a strong push on the importance of data quality, data architecture and data governance, particularly where MDM was incorporated as part of a wider SOA. That was music to my ears. One of the fundamental principles established early in the Seer project in Aberdeen was that data quality should be addressed as close to the SoR as possible. In fact, we can state that much more firmly – data quality and data governance HAS to happen as close to the SoR as possible. Easy to say, not so easy to implement. The SoBI architecture requires an MDM application to be resident within the services layer where the mappings between systems and data within those systems can be defined and maintained. As data is requested from a SoR, the façade takes the source data, applies mappings and key matching from the MDM application and returns the results to the calling application. Some minor transformation and data cleansing may also take place at the façade layer where it makes sense to do so. Data services are intended to shield consumers from details such as where the data resides, what are the physical data model and query mechanisms supported by the data source and how the data access processes (in particular, data integrity) are implemented. When data services are a key component of the SOA, projects that use them are unlikely to succeed without a complementary focus on data. With the development of data services, MDM becomes an integral part of SOA rather than merely a discipline that supports it. Performing MDM translations within the façade effectively means the SoR’s are the “single version of the truth”. The façade becomes the interface into the SoR and all data emerging from that façade is clean, consistent, reliable and mapped to common data structures and primary keys. It also has to be trustworthy, which puts data quality high on the agenda. In this architecture there is no data warehouse. This would traditionally be the place that you can clean up the data before presentation to the user. In ETL, the Transform comes after the Extract. In SoBI it has to come first. There are a number of reasons why this is difficult: Identifying a data owner in the business can be problematic. I was told that Tom owns the data in System X. I spoke to Tom, he introduced me to Tony, but Tony hasn’t looked after the data for 5 years. He suggested Ted might be able to help. I emailed Ted but he doesn’t work in that department any more. Ted forwarded my email to Pete who phoned me to tell me that the best person to speak to is Gary. Gary identified Tom as the most likely data owner. Getting the business to change business process necessary to improve data quality can grind a project to a halt. Hey Bob, I know you never wanted System X in the first place and its taken you nearly 10 years to find the best way of working with it, but there are a bunch of consultants who have never been near an oilfield that want you to do things slightly differently going forward. Application owners can be territorial about their data and their domain. Hey Sam, I know you’ve looked after drilling data for 20 years and that anybody that wants to get access to it has to come through you, but these consultants are here to open your system up, clean up the data and make it real easy for anyone to get to the data they need. Will you help them? There isn’t time or budget to support data quality initiatives for other projects as well as maintaining a Business As Usual service. Sorry Tarquin, we’re pretty busy at the moment supporting the teams out in the field, but we’ll put your data quality improvement requests on the agenda at our next Steering Committee meeting in March 2008 and see if we can’t get something done before the end of next year. There is an ongoing conflict between the need to address data quality and data governance early and the need for projects to be seen to be delivering. I’ve spoken to the project sponsor and although he sees the longer term advantages of cleaning up System X, he really wants his Oil Production report ready at the end of the month. Is there something we can do short term to fix the data before we publish it? The fundamental one – often, the quality of the data is sufficiently good to support the application for which it is stored. It’s only when you expose that data to a wider audience that the flaws begin to look ugly. In this case, the response from the system owners when you ask them to fix data that will give them few perceived benefits is often one of reluctance. Roger, I know you’re reasonably happy with System X, but there’s an application back in the office that needs your data and the bits they need are in pretty poor shape – could you spend the next three months cleaning up the data that they require, even though it won’t make any difference to the way the system works for you? This is where a data warehouse begins to look attractive. We don’t really need to identify a business owner for the data (though it helps) because once the data get to the warehouse; to all intents and purposes we own it ourselves (as long as the numbers the business sees are accurate). We can also build the required business process changes into the ETL streams rather than work the change into the organisation. The net result is that the application owners don’t feel like their territory has been encroached upon, nobody’s power base is significantly eroded and everyone is happy because the data that has been so difficult to get access to historically is now available. Unfortunately for us, the whole point of data services in SoBI architectures is to have all the hard work done before the data is published by the service. That means any consuming application should be able to integrate data directly from the services. In a SoBI world, if you end up with a traditional-looking data warehouse, you’ve pretty much failed, and the reason for that failure is very likely to be as a direct result of not addressing the fundamental problems of data quality and data governance at source. A side effect is a confused picture about where to go to get data (data service, warehouse, either, both?) and the small matter of having spent more than you needed to if a data warehouse was your ultimate aim. It’s worth re-iterating some of the principles that need to be followed if a SoBI project is to be successful: · No SoBI service should be developed until appropriate data quality and data governance processes have been put in place · The architects (solution and data) must place as much emphasis on the importance of data quality as they do on the overall architecture of the solution. · Data issues have to be addressed as close to the SoR as possible. Without this focus, SoBI projects will end up building data warehouses. There are many factors that contribute to the complexity of SOA, and usually, one of the last ones to be considered is the data. If data were given higher importance when SoBI architectures are being conceived, a more pragmatic, reusable set of data services would be developed. Technology is the simple piece of the puzzle, what is more complex, difficult and time consuming is engaging the business to effectively manage their data. Without this happening, the development of services will expose poor data to an increasing number of applications, people and processes and the service layer will not deliver the stated aim of a single, consistent version of the truth. I’ll end with a mention that today is Thanksgiving Day in the US. It started out with the leader of a group of hopeful Englishmen praying to god, saw them being massacred a couple of years later and is now the day when people sit down and watch a bunch of overweight turkey’s get stuffed. In England we've introduced a similar festival called “Steve McClaren Day”. PS. Whoever it was that bet me a fiver I couldn't go a full blog post without mentioning Jon George's hair, please pay up.
|
-
….I couldn't believe I was only at the half way point. After bitter arguments, stress, rain lashing into my face, arms aching and sweat running into my eyes due to being overburdened by the amount of gear I was carrying, I started the long trudge back to the agreed meeting point…. This was my journey back to rain soaked Preston train station after finally managing to locate an off-licence in the near vicinity. I'd received emergency in-transit telephone instructions from Dan Perrin and 24 cans of Stella were duly acquired and somehow forced into the non-existent space in my hold-all and rucksack. The purchase almost caused a diplomatic incident when the shopkeeper refused to let me take all 24 from the cold-bin, insisting "you can't take them all, they are for customers", and then made me unpack them all to double check I wasn't stealing any. Lesson 1: Get the beers in, in advance of departure. 
Half an hour later a big white van pulled up at the other end of the car park. My eyesight isn't what it used to be but I could see the words "Budget" emblazoned down the sides. People I didn't recognize started pouring out, holding their backs and stretching. "Can't be them", I thought, "I'll wait for the van with the 'Luxury' signage". Unfortunately they turned and headed my way. The journey from Egham had taken almost 8 hours. It's about 240 miles, so they would probably have made better progress if they'd cycled. Unfortunately Michael Knight had insisted on stopping at every service station so he could search out a McDonald's Vanilla Thick Shake, and the vans were limited to 60mph. They also had really uncomfortable seats in the back. I'd been collected by Conchango Team 2 (call sign "Asia" for the rest of the challenge). My team ("Africa") had continued ahead up the M6 in search of a service station with the widest selection of fast food outlets. Michael Knight still needed a McDonald's. Dan Perrin needed a beer. After introductions, we rejoined the M6 and eventually caught up with Africa at the next services. Unfortunately there were no McDonald's but plenty of other unhealthy options presented themselves, which were gratefully consumed. At this point I swapped vans to join my team, leaving a few cans of Stella behind to help numb the pain of the van seats for team Asia. Beer consumption on the move is a delicate balance between the brain and the bladder. I managed a beer in each van before eventually dropping off…..for about 3 minutes, by which time the rigid, upright, solid seats and badly positioned headrests meant I woke up with a crick in the back of my neck. It's difficult to imagine the moment when the person in charge of designing those seats stood back, looked at his handiwork and announced to the world that he's achieved what he set out to do. Unless of course the Torquemada family have gone into the van refitting business. Lesson 2: Make sure the vehicle wasn't assembled by a sadist. You're going to spend 3 days living in it. The journey up the M6 was fairly quiet. I'm not sure whether it was boredom, fear or just the fact that nobody liked me, but conversation was scarce. Fortunately Jon George has had the decency to grow a comedy hair style, so staring at the back of his mullet kept me entertained for several hours as it swung majestically from side to side in tune with the bumps in the road. We passed an hour playing Trivial Pursuits. Saz had brought along the "family fun" version so we had 150 questions along the lines of "What colour is mentioned in the title of the children's TV series Blue Peter?" and "Who is the Queen of England?" I think I came third. Team Asia played the "yellow car" game for three hours, which involved looking for yellow cars on the motorway and shouting "yellow car". Are you getting a good sense of the boredom yet? We did at one stage manage to connect an MP3 player up to the van's sound system. The speaker (singular) sounded like it had been assembled by stretching cling-film over an empty margarine tub but we did manage to enjoy something other than the sound of a diesel engine for a while. The MP3 player clearly had it in for us as it served up "Misty Mountain Hop" (Led Zep), "Rain" (The Cult), "King of Pain" (Police), "Rain When I Die" (Alice in Chains) and the sing-a-long classic that is Cradle of Filth's "Thank God for the Suffering". Eventually Mel took control of the music at which point we got an hour of only hearing the first seven seconds of each song before she skipped to the next track. Why do women do that? The last hour driving into Stirling was through torrential rain. I was sat on the first row of seats behind The Mullet and the rain was so heavy that I couldn't see the road. At around 2330 we did finally make it to Stirling Travel Tavern. Allan Partridge eat your heart out. As I staggered in clutching the remnants of a warm can of Stella, soaking wet from the rain and feeling queasy from the excess of deep fried food I'd consumed over the last seven hours, I was starting to feel decidedly Scottish. The receptionist informed us that the local pub would be open until midnight. A hopeful look around revealed no takers so I consoled myself with a can of Stella in bed before getting the last few comfortable hours of sleep for the rest of the weekend. Saturday morning started with breakfast at the local services. All the boys had a fry-up and all the girls had exotic forms of Muesli. And I include Michael Knight in the latter group. It was then back to the Travel Tavern, pack up and head back to the van. Now, at this point I decided not to squeeze the pillow I'd been carrying with me back into my bag, I just carried it to the van (there wasn't really room with all the remaining Stella in there). This led to accusations of "Travel Tavern pillow theft" for the remainder of the weekend, though I suspect much of this was borne out of jealousy for what was a spectacularly good idea given the lack of comfort offered by the van. Lesson 3: Take something with you that will make sleep easier and more comfortable. Don't worry about the smell. The proof of this being that as we drove into Fort William, the ladies insisted on an impromptu shopping expedition and Saz came back with enough sleeping accessories to run a B&B out of the back of the Africa van; for team Asia, it looked like Lorraine and Linda had bought a king sized travel water bed with duvet and pillow set. I'm sure it was only the lack of available time that saved the vans from being decked out with floral curtains, scatter cushions and a host of scented soft furnishings. Just for the record, the pillow came from the Tardis (yes we have a Tardis) in the playroom at home. Amongst other things, it's the emergency store for spare bedding whilst the kids are going through potty training. My wife wouldn't let me take a decent one and she sent me off towards the playroom with the words "Sod off. You can take one of the old ones but it might smell of p***". For those in team Africa that borrowed the pillow during the course of the weekend – I should probably have told you that earlier. 
We had one more stop on the way to the meeting point - the excitement of an endless supply of Mrs Perrin's lemon drizzle cake meant Team Asia couldn't contain themselves for the whole journey so we had to stop for them to relieve themselves. The girls set off on a two mile hike to find an appropriate level of seclusion, unfortunately to no avail - other than for Heidi who could wait no longer and so provided brief roadside entertainment for the traffic heading towards the Scottish Open Golf tournament. I believe the Scottish Tourist board have set up a hotline number for anyone travelling on the A82 that day who now need counselling. Lesson 4: Make use of all available facilities before you get in the van. And so we finally arrived at the Glen Nevis visitor centre at the foot of Ben Nevis. I don't know what the Nevis brothers did to get so famous, but their mother must have been very proud. However if I was Glen, I would be somewhat distressed to see my brother get a mountain named after him whilst all I got was a small car park and some public toilets. All that remained was to avoid eating Lozza's protein balls, strap on the equipment - which included a pair of red Y fronts (to be worn on the outside of the trousers obviously), a gold cape and a mask - get our kit checked by the marshalls, clock in at the start point, pick up a radio and we were off! To be continued..... ----------------------------------------------------------------------------------------------------------------------------- Find out what we were doing here. Find out why we were doing it here.
|
-
It's becoming increasingly common for me to walk into the office and be asked I would be able to spare half an hour to install 250GB of RAID 5 disk in a development server.
Like many of our clients, my co-workers are confused by the roles and increasingly grand titles we consultants have given ourselves and its an easy mistake to make - try sorting out DA, DBA (everyone else), DBA (Microsoft), DBAA and IA as potential data related roles on your next project.
Personally I blame developers for this - if they had been satisfied being called Computer Programmers, this confusion would never have arisen. Instead they promoted themselves to Software Engineers, which meant the Systems Analysts had to become Data Architects. The Tech Leads declared themselves Solution Architects and the dustbinmen became Refuse Collection Engineers. We're in a spiral of self promotion and its quite comical sitting in project meetings where everyone declares themself an Architect. Soon there'll be nobody left to do any work.
Anyway, lets start trying to unravel some of this, starting with the inevitable confusion between a DA and a DBA. For the record, DBA and DA are completely different roles.
DBA = Database Administrator. Primarily a technical role, a DBA is responsible for the build, maintenance, performance, scalability and reliability of a database application or a database domain. E.g. you might get a SQL Server DBA or an Oracle DBA but rarely someone who does both well. A good DBA knows the internals of a particular platform inside out and should be a key reference point for anyone developing code against a database. Also the DBA should be intimately involved in the process of turning a logical data model into a physical data model. A SQL Server DBA would likely understand how to build a server for SQL Server RDBMS, Integration Services, Analysis Services, Notification Services and Service Broker. Would know how to install & configure the software. Know how to map software to hardware in the most optimal configuration given the hardware resources available. Would know how to configure storage and memory on each server for the application(s) being run; setup and configure backup and recovery processes for each server and for each type of database being hosted; configure the hardware for high availability and configure the software for replication; understand the data being hosted by the set of applications and performance tune the applications, databases and code accessing that data.
Conchango doesn’t really do “pure” DBAs. They are difficult to sell to clients, who invariably have their own. That means that the work a Conchango DBA would end up doing is very piecemeal – e.g. a short SQL Server health check, or a day performance tuning or a couple of days problem solving here and there. This was frustrating for the DBA and ultimately unsustainable in terms of billability.
Whilst someone picks the resourcing manager up off the floor, and the MD is looking for the blank P45's lets look at the DA role. I'll come back to DBA.
DA = Data Architect. This role is primarily non-technical. Your DA is the person on the project who has overall vision of the flow of data from source to target. They know where to get data from, where it’s going to, how the movement of that data will be accomplished. They’ll understand the project architecture and how data flows between the architectural layers. They will be able to advise on issues of data quality, should be very customer facing, should be able to understand the relationship between data elements across any number of backend systems, understand how that data will be integrated, know how to apply MDM, advise and direct data related development on the project, design data models from conceptual model, through logical data model, through to physical data model. They will use CASE tools to help with database design. All this can be accomplished with little or no overlap on the role of the DBA.
Because Business Intelligence projects are fundamentally data driven, the importance of these roles are fairly well understood within the BI domain. I don't think the difference is quite as well understood in the development domain and as we move towards increasingly non-technical parts of the company, the two acronyms become pretty much interchangable.
It's the same with many of our clients. When a requirement for a DA comes up, the tendency is to push a DBA forward for the role. That's not a great recipe for success, and what tends to happen is a large chunk of analysis and design work gets missed and has to be picked up by other members of the team during development.
Having cleared that up, allow me to complicate it again.
Jeffrey Yao, a regular columnist for SQLServerCentral.com discusses the role of the DBAA in a recent article. You can guess what the additional "A" stands for. And whilst I cringe at the arrival of yet another architect, the role he discusses is perfectly valid. If you can't or won't follow the link, I'll quote a snippet of his definition:
DBAA = Database Administrator Architect. A DBAA is a professional who is responsible for designing a solution framework that maximizes the efficiency of the resources dedicated to the data system administration to meet the business challenges, such as cost, performance, security and regulatory compliance requirements etc.
The main responsibility of a DBAA is to achieve the highest possible ROI with the available resource in the context of the various business requirements. The details of this responsibility may include: Define the administration scope in terms of targets and risks / costs Build up an optimized processes model which can maximize the ROI for the current resources Pioneer in evaluating / choosing the right mix of technology Explore / create innovative methodology to adapt to business environment. Act as a facilitator / advisor for the stakeholders to best use the data system / asset.
This is the kind of person you'll get if you request a DBA from Conchango. You can still have your two-day SQL Server health check or an exhaustive examination of your SQL Server 6.5 clustered index strategy but the economics of consultancy requires the DBA to have embraced a more enterprise view of the world, usually within a particular domain (BI being the example I know best). Hopefully that puts the P45 back in the drawer for a while.
OK, nearly there. Things wouldn't be complete unless Microsoft put their own spin on things. So you can now become a Microsoft Certified Architect: Database. Or, as those that have qualified call themselves, DBA. That's Database Architect. Now looking at the curriculum, it suggests really techy DBA to me, but I guess with a $25,000 enrollment fee you're only going to put your best people through this and for that kind of money you're going to need a title to match, so I guess this falls between DBA and DBAA.
The other role mentioned in the opening paragraph is IA - the Information Architect. Information Architects are what Data Architects called themselves when there is more than one on the team and one wants to sound more important than the others.
However, in a last-minute twist, it seems Web Designers aren't happy with their self-promotion to Interactive Media Consultants. They can now also be an Information Architect - defined here as "the process of organising and presenting data to the user in a meaningful, clear and intuitive manner". That could be really confusing. I guess it won't be long before I walk into the office and someone with strange facial hair hands me a pritt-stick, a marker pen, and asks me to arrange pictures on a big piece of card.
|
-
For the last 20 years, the “largest, heaviest and most unusual object” ever found at the summit of Ben Nevis is a piano. On July 14th, that record will be broken not once, but twice by the 2007 Conchango 3-Peaks Challenge team*. The challenge consists of climbing the UK’s three highest mountains – Snowdon, Scafell Pike and Ben Nevis – in a 24 hour period. The time objectives are as follows: - Ben Nevis – 5hrs 30 mins
- Drive 6 hours to Scafell Pike
- Scafell Pike - 4hrs 30 mins
- Drive 4hrs 30mins to Snowdon
- Snowdon – 4hrs
Conchango will enter two teams of five people. The concept seems to be simple. We drive to the middle of nowhere at some ungodly hour, spend 5 or so hours climbing up a hill, then back down it, then spend 5 hours travel-sick in the back of a transit van, before repeating . Twice. What’s worse is that two of the climbs will be in the dark and one will be in bad weather so we won’t actually be able to see where we’ve been. The proof will be the seeping blisters, aching limbs and the slightly warm feeling that results from an exhaustion-based loss of bladder and bowel control. If I have the misfortune to suffer a Paula Radcliffe moment myself, I’m hoping we’ve at least made it across the border into Wales**. Why anyone would volunteer for this is something of a mystery. The mystery is compounded when you consider that most of the people who will be involved don’t know each other. Or had any idea who else might volunteer before they volunteered themselves. Here are the rest of the team: Shane Collins, Dan Perrin, David Höhn, Michael Knight, Peter Hay, Sarita Ward, Michael Jones, Heidi Lin, Peter Murphy. I know one of them. I’m guessing that amongst the rest there is at least one Australian, a sci-fi buff, some quirky facial hair that would get you killed in Burnley, the biggest geek in the company and someone with no interest in football. And probably the same amongst the men. We’ve been advised to start training 12 weeks before the event, with some basic groundwork eventually building up to hard training to build up stamina. Since we’ve only just signed up for the challenge, there are only 8 weeks left. Personally I think it’s important to get the basics right, so I’ve opted to miss out the last 4 weeks of training rather than skip the early essentials. I’m now jogging to the pub on a Thursday evening (50 yards) and because I live in the North, we have stairs on the inside of our house, which I now climb two at a time. Eight weeks of that should see me in peak physical condition. Those of you familiar with Conchango will have spotted the distinct lack of involvement from the Conchango board of directors. Advancing years, bulging waistlines and 30 years of sendentary lifestyle rules out some. Others would struggle to make the distance in platform shoes. Some would find it impossible to travel in such working class conditions and are awaiting the release of a Porsche-Transit before volunteering. Chris Saul did volunteer at one point but later withdrew when the promise of an M&S retail outlet at the top of Snowdon turned out to be a hoax. I’m sure in their absence they will at least show their support by sponsoring the team and that each of them will individually match the current single highest donation we have received. Let’s not forget the dedicated support team who will accompany the hikers on the challenge. The volunteers are Jon George, Saqib Barlas, Lozza and Guy Sturgess. If you think spending 24 hours within six feet of nine people you have nothing in common with, halfway up a mountain in the dark, whilst someone hammers nails into your legs doesn’t sound like much fun, spare a thought for these four. Their job is to drive the rest of us between locations – that’s 10 or so hours of ten people whining – interspersed by 15 hours of waiting in a cold transit van at the bottom of a foggy mountain and cooking a curry for 12 (thanks Saqib). Unless of course the romance of the bleak landscape becomes overpowering, in which case – for Lozza at least, it could be like jumping into a barrel of warm play-dough. Our reward for completing the challenge is a disco. I’m sure after 24 hours of no sleep, climbing up (and down) three mountains, travel sickness, incontinence, sprains, boils, blisters, aching limbs and an unending stream of fascinating conversation about C# development, we will all be looking forward with relish to spending four hours dancing to 70’s classics with a bunch of tree-hugging bearded 50-somethings with bad dress sense and a strong opinion on green issues. I suppose it’s for times like these that beer was invented. And so on to the point of this post – sponsorship. We are raising money for CARE international. CARE works in 70 countries with more than 48 million poor and marginalized people each year to find a way out of poverty. It is one of the largest development agencies in the world with 91% of it’s donations going directly to help fight poverty, it’s marvelous, heartwarming stuff. So, if you think CARE are great, please send them some money via the link below. For those of you who couldn’t care less about CARE, please spare a thought for your colleagues that have signed up for this. Whether trudging up a mountain in the rain or rolling about in the back of a transit van, they would appreciate your support so please do sponsor them. Finally, if you don’t care about CARE and have never heard of any of the people taking part in the challenge, the sponsorship page allows you to enter a name and a line of text along with your donation, so please use this as an opportunity to spend a couple of quid to anonymously insult someone you don’t like at Conchango, safe in the knowledge that the whole of the company will see it. http://www.justgiving.com/conchango2007 *Dan Perrin's beer gut and my backside. **This is not a slur on Wales. It is an acknowledgement that if an unexpected deposit of natural fertilizer occurs, where better to distribute it than the fabulous Welsh countryside where it can help enhance the already stunning natural beauty of the landscape. Alternatively, take your mind off it by reflecting that Italy finished above you in last season’s Six Nations.
|
-
I've just returned from my local supermarket where I went for a bit of shopping and to buy some cold and flu remedy for the family. The supermarket is unfortunately situated adjacent to a slaughter house - if the wind is blowing in the wrong direction, as it was today, the walk from the car to the store is one filled with the putrid smell of rotting flesh, though on the positive side, it is an opportunity to meet other people sharing the retching experience in the foyer of the store.** The aforementioned cold and flu remedy consisted of 2 boxes of Calpol (paracetamol for kids), some Nurofen, a Lemsip soluble (for me) and one of those new Lemsip snortable (for the wife). Imagine my surprise when I got to the till and was refused the Calpol - apparently one is only allowed to purchase three paracetamol based products at any one time due to risk of abuse. A number of issues leaped to mind here - firstly, how much Calpol would you need to take to abuse yourself? If its OK for a 2 month old to take 4 sachet's a day, how much would need to be consumed to take down an overweight data architect? Secondly, why, if I had decided that the general poor state of enterprise data quality was just too much to bear and had come to the conclusion that a self-induced Calpol-based demise was the only option left to me would I have bought the rest of the week's groceries at the same time?*** Thirdly, was this draconian measure in response to a spate of attempted Calpol based self-abuse incidents in North East Lancashire - I know Burnley lost at home on Saturday but things can't be that bad? Lastly, who decided the limit of 3 items - surely the delineation should be based on the total quantity of paracetamol rather than the number of paracetamol based medicinal remedies in my shopping basket? Anyway, I'm not usually one to get flustered and start shouting about these things, rules are rules after all so I left the Calpol with the check-out assistant, paid for the rest of my groceries, dropped the shopping off in the car, walked back into the store, picked 2 new boxes of Calpol from the shelf and went to a different till. Its at this point we put the store's principles to the test. If they are really concerned about Calpol abuse, surely measures would be in place to prevent me making the purchase....? ...But make the purchase I did, same card, same me, same products. No alarm, no crisis, no refusal, easy as pie. Presumably I could have gone round and round all day buying 3 paracetamol based products at a time. What is needed here is a bit of Business Intelligence. As my shopping is scanned at the POS terminal, the items in my basket can be analysed. We can identify that I have several items in there that have a product category of "paracetamol based home medicine". If I have (and have used) a loyalty card the store knows everything about me. What I've bought, when I've bought it - so there is ample data there to deduce that I have overstocked on Calpol in my last two visits to the store (this could be done in real time of course as soon as the transaction goes through the till, but contrary to the marketing hype this is as much about having the right architecture and infrastructure in place than it is about having the latest and greatest BI software - and Infrastructure and Architecture don't usually come cheap). As a loyalty card holder the store could make a more informed decision based on my profile - I have kids therefore buying cold remedies in family-sized quantities isn't exactly out of the ordinary. A quick key search on my loyalty card ID could bring up basic details about who I am which could help the cashier make a more informed decision. Even without a loyalty card, in this instance I used the same card to make the payment, so again a bit of real time BI could deduce that I'm trying to bend the rules and could alert the cashier. If real time analytics is out of the question, a bit of general historical profiling wouldn't be too difficult. How about a plot of sales of paracetamol based products by store by month? Maybe not that useful to see sales rising during the winter months. Or how about some basket analysis? What products are usually bought when Paracetamol based products are also in the basket. That lets you spot a "normal" basket (and also allows to supermarket to position these items close together on the shelves). Going off at a tangent slightly this correlation of products won't necessarily return sensible results. A few years ago we worked on a huge SQL Server BI proof of concept in New York for one of the big book retailers. We pushed 4 billion transactions from an Oracle legacy system into a SQL Server data warehouse, squeezed an Analysis Services cube on top of it and were tasked to build four Reporting Services reports that were previously impossible to create. One was "Category Correlations" - quite simply, when someone in a particular US state buys one category of book, what is the next most likely category for them to buy? We proved the structure and functionality of the report in Excel using the correlation coefficient function, I then built the code in T-SQL which pushed the 4 billion rows through the logic and the results into a dedicated fact table, and was quite pleased it ran in about 6 hours. Former Conchango stalwart Pete "The Daddy" Spencer then did his usual which was to look at my code, roll his eyes, smile sympathetically and spend 5 minutes rewriting it. His version ran in 20 minutes. The results were so odd that the report was dropped. Instead of getting confirmation that people who bough books on IT also bought Sci Fi, we got results like people in Texas who bought books on Guns also bough books on Womens Studies. Maybe they do, but I'm not going to be the one to ask. We could have a report that plotted sales of paracetamol based products against Burnley's league position. Or sales by day of week or even time of day. The time of day may not seem relevant but when we've built BI systems to identify potential fraud, quite a bit of it can happen late in the day, when the store manager has left for the day and the dodgy cashier can refund a few "return no sale" items onto their credit card. These wouldn't help the situation now, but it might help shape the store's policy to be less draconian. I could go on listing potential reports and analysis that could be produced to back up this policy but I'm sure your fascination levels are already maxed out. I'm also feeling slightly light headed as this is my second post in a week. If I'm not careful the blogging disease will bite and before I know it I'll find myself browsing through technical manuals in spare moments, scouring the internet for airfix models I haven't built yet and buying comics. **Did you follow the link? You didn't believe me did you!? I don't make this stuff up you know. ***Note - in the interests of responsible blogging, I did make enquiries into the effects of a paracetamol based suicide attempt and apparently you can cling on for several days whilst your liver gradually shuts down, suffering only a bit of sweating, vomiting and abdominal pain in the meantime (difficult to spot if you've recently enjoyed a double strength Chicken Jalfrezi with Chilli Paratha from my local curry house).
|
-
It has been a while since my last post. There are a number of reasons for this - lack of time, the fact that blogging comes quite low on my list of fun things to do and mainly because the blog posts I had lined up on my laptop were dealt a severe blow following a United Airlines flight from LAX to Oakland (where my laptop bag was taken care of by their Valet Service) and a taxi ride from Oakland to San Ramon (where my laptop went in the boot). I can't actually prove when or where something speared a hole through the external casing, however what I can say is that I sent emails immediately before I boarded the flight at LAX and the next morning the laptop was dead. I'm also fairly sure that there wasn't a man hidden in the boot of the taxi with a portable hammer drill or someone jumping up and down on it in my hotel room while I slept. So, I've been back to Bakersfield. Another quite eventful trip, starting with me arriving in San Ramon at midnight and being unable to get into the flat I was supposed to be borrowing. This meant a lengthy cruise round San Ramon by taxi discovering that all the hotels over 1 star were full, and eventually being dropped off at hobo-motel, checking into a room for the night, pushing my way past the local chavs hovering round reception and trying desperately to get some sleep despite the hip-hop festival that was unfolding three rooms down the corridor. That was the first night, the second, I got an earthquake, but at least I had managed to break into the flat by this stage and so was able to cower under an overturned settee until the shaking stopped. From San Ramon, work had kindly booked me on the 7am flight down to Bakersfield - so a 4am start on a Saturday morning. The payback was the fact that I had the whole plane to myself on the flight down. As the stewardess said, "nobody wants to go to Bakersfield on a weekend but we have to fly down there because there are quite a lot waiting to get out". Imagine what you'd do with a private jet....well, I slept. As I casually mentioned the fact I was cruising around in a Mustang on my last visit, it's only fair to report that awaiting me in the Bakersfield Avis car pool was a cream Chrysler PT Cruiser. It looked like a mobile cream puff and with the high driver's seat, tinted windows and hugely underpowered engine felt like I was driving round in a clown's car. The early arrival meant I couldn't check into the hotel, so there was nothing for it but to hop in the fun-mobile and cruise Bakersfield. I would have felt less self-conscious wearing a clown's costume. Fortunately for me, David Seymour - stalwart of Conchango's Business Team and winner of the "Conchango's Hairiest Man" competition at the last Christmas party - has joined the team in Bakersfield. Size 16 shoes, garish shirts and oversized suits are a core part of his business-casual collection, so once David had leant me the necessary outfit I was good to go. This post introduces one of our integration patterns, we have lots of these and this one shows how a legacy application or an application that expects to receive data using a locally defined reference data dialect rather than the common reference definitions that our MDM solution contains. It takes data from a service which communicates with a legacy system which itself stores data with a locally defined reference data dialect. I'll use an Oil & Gas example – well production data (the amount of oil produced by a well) . We have a production facade which will return data in a common dialect and our MDM service will provide a translation and cross-reference service that is available to data producers and consumers who may identify oil wells differently to the common definition in our MDM application. This is in line with service oriented principles which encourage a common dialect to describe common entities – this is the basis of the ability to easily integrate SO services because the service (or façade) does not know about the legacy dialects. To overcome the fact that it does not understand the common definition of wells, our consuming application may need to build an adaptor (a facade in reverse) to consume the common format and determine how it maps to the local client dialect in conjunction with the cross referencing provided by our MDM solution. In this way, only the consuming application is concerned with how to translate the common dialect into their own legacy or internal dialect – again this is in line with service oriented principles of encapsulation, where the translation functionality is kept as close to the required service as possible and not spread around to the possibly multiple service facades. Such distributed transformation functionality would increase the management overhead and reduce the agility of the overall solution. In scenarios where an application may request data of the same type from multiple different systems, the transformation logic would need to be present in every system that the application might use. Similarly, the service façade will only accept requests in common dialect so the client application may need to go through an adaptor to translate the client request from local to common dialect when the data is returned from the facade. Adaptors are specific to the consuming system and are not part of the service façade. They care only about translating the calling applications' dialect into common dialect. MDM sits in the centre of the architecture and, by being aware of all dialects, can offer services to both adaptors and facades. If a SoR speaks common dialect, its facade doesn't have to worry about MDM, it can just issue a straight pass-through query to the SoR and return the result to the calling application. In our example, this isn't the case – the SoR has its own way of identifying wells which also doesn't match the commonly agreed definition, so MDM is required to translate transactions from the SoR into the common dialect as they are exposed by the service facade. The consequence of these patterns is that our MDM solution will need to be aware of the dialects available within the business but the facades will not. Here's the pattern. Underneath it are descriptions of the numbered steps. - Forecast application issues request for data – "give me yesterday's production for the well I know as W01"
- Adaptor captures the request and issues call to MDM service to translate local well identifier into common dialect – "what's the common name for the well I know as W01?"
- MDM service returns well identifier in common dialect – "It's called WELL001"
- Adaptor forwards query to Production service façade – "give me yesterday's production for the well WELL001"
- Production Service Façade issues call to MDM service to translate common well identifier into SoR dialect – "What does the allocation system call WELL001?"
- MDM service returns result in SoR dialect – "it's called PRDWL1"
- Façade issues query against SoR – "give me yesterday's production for the well I know as PRDWL1"
- SoR Returns results to facade – "PRDWL1 – 1000 barrels"
- Façade issues call to MDM service to translate SoR well identifier into common dialect – "what's the common name for the well I know as PRDWL1?"
- MDM service returns results to façade - "It's called WELL001".
- Façade returns results to adaptor - "WELL001 – 1000 barrels"
- Adaptor issues call to MDM service to translate common well identifier into local client dialect – "what does the forecasting application call WELL001?"
- MDM service returns results to adaptor – "it's called W01"
- Adaptor returns the results to the forecasting application "W01 – 1000 barrels"
On the face of it, it looks complicated, and for a single application calling a single system of record, it is - in fact there is no point building services if no other applications are interested in the data you expose, you may as well stick to the simple point-to-point approach. Where the value of this pattern becomes apparent is when there are multiple consumers of the data, each potentially with its own set of reference data, combined with multiple SoR's behind the facade serving up the data from different systems on different assets in different businee units. In this case, you have eliminated a complex lattice of point to point integration solutions and by isolating producers from consumers, made a brittle architecture much more resilient.
|
-
Conchango is making strenuous efforts to reduce its carbon footprint. Initiatives include an energy efficient kettle, low-carb light bulbs, recycling bins, and a scheme to ensure all non essential electrical items are switched off in the evening. In a more radical step, plans are afoot to plug all the developers into a giant fart-catcher* – the obvious results of a sedentary lifestyle fuelled by cakes and coffee means developers are probably second only to cows in terms of methane production. The collective emissions will be processed, converted into bio-fuel and used to power the central heating system at Conchango-Central during the winter months. The UK’s efforts to reduce the effect of the greenhouse effect came to mind whilst I was sat in my 4-litre V6 Ford Mustang rental car, nose to bumper on the 12 lane 405 interstate freeway that heads north from Los Angeles towards Bakersfield. When the booking was made I specified my meagre budget, the fact that there would be only me and a single suitcase and that I wouldn’t need anything too big. I was a little surprised at the result, however when I look out of the driver's window and all I can see are the wheel-nuts of the army of 12 litre monster trucks, maybe the assistant at Avis got it right. This is America after all. Anyway, enough about the serious environmental challenges facing the planet, let's talk about data. As the number and diversity of data sources and potential systems of record increases, the greater the integration challenge. It’s also increasingly unlikely that any new system will make any effort to solve the wider data integration challenge – this is an enterprise problem and not usually something that can be addressed within the budget of an individual project – hence without a data architecture strategy, systems are implemented in isolation and any required integration is done point-to-point. Hence the need for a well defined data architecture; as interdependencies between systems grow, so does the importance of managing and maintaining these interdependencies in a structured and strategic way. For a SoBI project to work a structured approach is required; SoBI demands common data exchange formats, master data definitions, official and trusted Systems of Record (SoR) for critical data, common and rationalized application tools and common interfaces to data and performance metrics. Data architects need to understand and plan for information flows through the IT environment much the same way that process engineers plan for the fluid flow through a plant or logistics specialists plan for movement of goods and services through a supply chain. Above all SoBI demands a huge investment in data governance. Without a willingness on the part of the customer to clean up their SoR’s and to have a roadmap in place to migrate to applications that can support SOA, SoBI can’t work. The SoBI architecture includes a project database but this is a data cache – for business intelligence data or for data that simply cannot be accessed on demand from the SoR (for which there may be any number of reasons) – but definitely NOT a traditional data warehouse where data can be scrubbed, de-duped and cleansed. In SoBI, the data owners can’t hide a sloppy implementation or poor data quality behind a corporate data warehouse. Data integrity starts at the SoR. Most companies that are yet to embrace an Information Architecture focus on a point-to-point type of integration. The problem with such an approach (even in systems which currently operate successfully) is that the participating systems are susceptible to the development of a complex lattice of inter-application connections, resulting in a large number of inter-application dependencies. In such scenarios it is not uncommon for the scale and the complexity of the inter-connections to become a barrier to the addition of new business functions. In such scenarios, the nature and complexity of the interconnections, coupled with the fact that in many cases not all of the dependencies are well understood or well documented, can lead to a situation where the architecture becomes highly brittle; although the system continues to function in it’s present (stable) state, changes to any of the component applications can often have unforeseen and unpredictable consequences on other applications in the enterprise. The point to point approach usually results in similar data being stored in multiple applications within the business. Applications broaden their scope from their intended purpose for the convenience of their users and because different logic may be applied to the data during the integration process, it is not unusual for different systems to hold different values for data that should be the same. These symptoms inevitably result in a do-it-yourself approach to data integration within the business community. If data can not be made available in a timely fashion, or data sources cannot be trusted, it is natural for people to create their own data stores to hold and process the data they need. In some cases, departmental databases are used as data stores. Desktop databases such as Access are also convenient, but for most business users the data is stored in a semi-structured format in applications such as Excel. It’s not unusual for a complex network of linked Excel spreadsheets to evolve alongside the point-to-point integration between applications. At this point the reliability or effectiveness of the data and any control over its integration, format or quality has effectively been lost. Web services and related internet standards are enabling companies to utilize legacy applications and create a new web enabled workflow structures. These technologies are ready for use and provide an opportunity for companies to deploy them as part of the “plumbing” their architecture for data integration. You may have guessed that I’m back in Bakersfield for a couple of weeks. Thommo fans will be pleased to learn that he is adapting well to his new home. I happened to be behind him in the queue for sandwiches when he ordered his foot-long lard & cheese Subway, complete with “toe-may-doe” and “bay-zil”. No, really. A Yorkshireman putting on an American accent - it almost made the trip worthwhile. Happy New Year and I note that none of you sent me a birthday card. *for those of you who have young children, I can heartily recommend the Walter series of books.
|
-
The group I’m working with is responsible for defining strategy, principles, practices and re-usable components for upstream data integration projects globally. Currently I spend half my time in the UK and half in California. It’s not as glamorous as it sounds. Nice as it would be to have an oilfield on the outskirts of San Francisco, unfortunately its in the middle of the Californian desert, close to Death Valley and just far enough away from Vegas, San Fran or LA to make them weekend-only destinations. And on top of that, it seems I’m flying myself to an early grave! In the UK I spend my time on delivering stuff for the global program that I work on. When I’m in California its working on the latest project to adopt the SoBI approach and principles. If you’ve read the first two blog posts and managed to wade through the SoBI white paper, you may be wondering what we have been doing in the 12 months since it was conceived, written and published. The short answer is implementing it. A longer answer is that this is a big project – the project in Aberdeen is approaching two years since conception with around 25 people staffing it at peak times. The architecture is new, and fundamentally delivering an architecture is in itself quite a challenge in Upstream O&G. Upstream is the part of the Oil business that delivers the dollars. Basically, the more hydrocarbons that can be extracted, the more revenue the company makes. Historically this has led to quite a tactical approach to integration – the relative wealth of Oil & Gas companies ensures that best of breed applications are bought, but they are implemented to solve specific business problems, often without a holistic view on data integration between applications (“Give me the shiny things NOW!!”). It also means that in comparison to companies in other verticals I've encountered, upstream O&G has lots of applications out there, many of them storing similar data and accessing data from other applications in a very tactical, point-to-point manner. Upstream Oil and Gas is also a very complex environment in which to work. In almost 10 years of working in the area of data warehousing and business intelligence, I’ve never come across a combination of: an environment so littered with candidate data sources systems implemented so myopically with such a reliance on Excel as a data store supporting such a complex business
Add to that a new architecture and throw in agile as a delivery mechanism and you have a significant challenge. "Upstream" refers to the process of exploration and production of oil up to the point where it is transferred on for sale. I’m surprised by the number of people who think that producing oil involves drilling a straight hole down to a conveniently located pool of oil and catching it as pressure forces it to the surface, before selling it on at huge profit. Firstly, there is no magical pool of oil. What you have, are layers of sedimentary rock into which the oil flows and sits, along with water and gas and lots of other chemicals in pores between the molecules of rock. Think of it as the process of sucking water out of a sponge. The amount of stuff that the rock can hold is its porosity. The ease at which the stuff can flow through the rock is its permeability. Skipping over how a potential oil reservoir is identified, the next thing to do is to drill. Depending on where you are in the world, this can be an expensive exercise. In the Californian desert where the oil reservoir isn’t too far below the surface, things aren’t too bad, but if you are working in deepwater exploration, things are significantly more complex and expensive. Take a look at the Jack #2 well announced by Chevron in deepwater Gulf of Mexico where the well was drilled to a depth of 28,000 feet. To put that in perspective, Mount Everest is 29,028 feet high. This is space age technology, and it doesn’t come cheap. Cost estimates for field development in the vicinity of Jack #2 are about $80-120 million per well drilled, with an additional $1.3 to $1.5 billion for subsea facilities. Upstream Oil and Gas is very much a risk/reward business and we tend only to hear about the rewards (which admittedly can be substantial). It’s also surprising to find out that wells don’t have to be sunk vertically. The drills are directional so it’s possible for a well to take a circuitous route to the oil bearing formation. In some circumstances wells are drilled horizontally – in this case the well starts off vertical but curves as it goes down eventually flattening out to a horizontal line. It’s also possible that each surface well could have a number of wellbores within it – i.e. multiple paths leading off from the main well to different parts of a reservoir. Each wellbore also needs to be completed – completing a well consists of a number of steps; The process of completion usually involves blowing holes in the casing at the bottom of the wellbore to expose it to the oil bearing rock. Each wellbore can be completed multiple times and you may wish to complete the well such that hydrocarbons from two or more formations may be produced simultaneously, without mixing with each other. So what’s left? Stand at the top of the well with a barrel and collect the oil as it flows to the surface? If you’re lucky, the pressure in the reservoir will force fluid to the surface. Usually however this isn't the case, and even when oil does flow under its own pressure, this reduces over time as the fluid is extracted, reducing the pressure in the reservoir. This is your bottomhole pressure and it’s something you have to keep an eye on. It’s checked regularly by performing a well test which gives an indication of the pressures and temperatures at the bottom of the wellbore and gives an indication of the theoretical production rate of the well. A good reference for O&G buzzwords is the Schlumberger Oilfield Glossary.
I think that’s enough for one post. Next time I’ll cover all the stuff that happens once the fluid is actually out of the ground – and what on earth any of this has to do with data integration.
Some news to finish. The head of the Technology Team, Iyas AlQasem, has just informed us that Conchango will end 2006 larger than it’s ever been. I presume this relates to number of employees now in the company rather than the average weight of its consultants, though it should be noted that with so many developers working in Egham during the day, the man who runs the donut shop is now driving round in a Bentley Continental.
|
-
Allow me to be the second to congratulate Jamie Thomson on reaching #2 in the Google search results for SQL Server Integration Services (SSIS). Jamie is Conchango's celebrity blogger, and has developed something of a cult following in cyberspace. He sees himself as the David Beckham of the blogging world and has taken to wearing those big 70’s style sunglasses and, if we’re really lucky, a sarong around the office. Anyway, back to San Francisco, we did our pitch to the client which went rather well. I tend only to speak when I think there is something worth saying – I put this down to the fact that I was an only child and had nobody around to listen to me when I was growing up. If this theory is accurate, Simon must be the youngest of fourteen, but between us we did a convincing job of letting the customer know that we knew what SOA and BI was all about. At Conchango the EAI and ETL folks had had a number of discussions around where the two disciplines overlapped (mainly focused around the Microsoft technologies, Biztalk and DTS), and the market was converging in terms of the major ETL players acquiring EAI capability through acquisition. However much of the talk was idealistic – yes you can use Biztalk to load data into a data warehouse in an event-driven world, but its rare to come across an application that has been built to support that. You are more likely to find a system that doesn’t even record when data has changed in an environment where the users can change any of the data in the system. In those circumstances you are definitely in the heavy-lifting ETL world. However, there is a grey area between ETL and EAI products – as a message gets longer and less frequent, it begins to look like a scheduled data stream. As an ETL data stream gets shorter and more ad-hoc, it begins to look like a message. There’s a pretty good overview on Microsoft’s integration technologies here. So, we had spent time discussing the theory , but without a real customer requirement to prove or disprove our ideas. After two days of workshops, Simon and I retired to a bar in Union Square and over beer started to piece together what eventually became Service Oriented Business Intelligence (SoBI). The key shift in our thinking was not to focus on where the disciplines of Service Orientation and Business Intelligence are mutually exclusive, but on the key strengths of each discipline and the areas where there is a synergy between them that can be exploited. From the outset we tried to be pragmatic – for example acknowledging that there will always be a case for pure ETL where large data volumes are required to be moved in batch. We brought the ideas back to the UK and the brains of Rob Grigg, legendary socialite and leading architect from Conchango and Sean Gordon, architect in the Microsoft Scotland office, and a very handy man to have in a quiz team (excluding sport) were added to the mix. The results of our efforts have been published as Service Oriented Business Intelligence in the Microsoft Architecture Journal: Be warned, it’s a touch on the dry side. We’ve also had the opportunity to present a couple of times. Rob and I at the Connected Systems conference, and Rob and Sean at the Microsoft Architect Insight Conference Have a look, we're always interested in feedback on this stuff. That’s enough for now. I write this in the KLM lounge at LAX airport and my Heineken is getting warm.
|
-
Hello and welcome to my blog. It’s been a war of attrition between me and the Conchango establishment to get me to write a blog. Clearly I’ve lost. I’m currently sat in a hotel room in Bakersfield and since I have no interest in Country & Western music or guns, this seems to be the most constructive way of wasting time until the jet-lag kicks in and I can go to bed. I have to declare in advance that I am no longer involved in development; hence this blog will be 100% fluff. If its geekery you’re after, I’m afraid you’ve come to the wrong place – however help is at hand, there are literally dozens of uber-techy types writing all sorts of stuff in strange languages on this very site. I’m currently data architect on a global upstream program for one of the super-major oil companies. It all started about two years ago when they came up with the strange idea of implementing an information architecture that enabled Business Intelligence (BI) but used Services as the primary means of data integration. It was a Microsoft account, but we’d been put forward as implementation partners and I guess someone in Resourcing, saw “BI” in the description of the opportunity and I got the gig. We flew out to San Francisco for a couple of days of workshops to go through requirements, principles, objectives, roles and the scoping out of the initial architecture. It was over breakfast in the Fairmont Hotel, that I met Simon Thurman from Microsoft. He introduced himself as the SOA expert for the project and he almost choked on his eggs Benedict when I said I was there to talk about the data warehouse we were about to build. It goes like this – when you build a data warehouse, you connect to the most readily available source of the data you need, you understand the schema of the system you are connecting to, you drag out as much data as you can, you clean and transform it and push a nice shiny new version of it into the data warehouse, all done in the shortest time possible. Oh, and if the source of that data changes, that’s OK, we just rebuild the ETL against the new system. There isn’t much room for services, abstraction, systems of record or contracts in there. In fact as I’ve found to my cost, ETL developers tend to growl at you if you even mention these words in the context of data integration. The flip side of course is that in an SOA, the true source of data is identified in a system of record (SoR), and that data is owned by that system. The schema belongs to the SoR and shouldn’t be visible to consumers of that data. The SoR should be abstracted from data consumers to ensure consumers are insulated from change. Data is exchanged via services and changes to the SoR should have no impact on those services or the consumers of the data they provide. So even though there are aspects of BI and SO that appear to be mutually exclusive, what the customer was asking for sounded sensible. Provision of a single interface into an application that future-proofed consumers from changes to the system. BI without the need to store data that had already been stored multiple times around the business unit. Integrating data from a diverse application landscape.
I’ll talk more about how we resolved this fundamental issue (it involved beer), Bakersfield and occasionally data architecture in future posts.
|
|
|
|