Large enterprise databases that form the backbone of business are not going to move out to a cloud computing platform any time soon, but more and more data will be processed in and originate the cloud and data professionals need to understand how to work with data in the cloud.
Non Transactional Data Processing
The delay between transactions being processed and business intelligence (in the form of analysis and data mining) being available is largely down to the ability to process data. The processing of hundreds of millions of rows into a cube is processor intensive and is generally left to an overnight job on a machine that is usually underpowered because it spends most of its time idle. While the big cloud computing vendors are not providing BI services yet, the prospect of having huge amounts of processing power available to analyse data has to be compelling. Cloud computing based BI can provide near real time data analysis for one business while it’s competitors wait until the next day or the end of the month.
Scattered Data
As much as DBAs insist that their large well run database is the master of all transactions and the single version of the truth, the reality is that data is becoming decentralised and is scattered across the network, home computers and the Internet. By design, application architects have been making use of read-only data stores and caches but over time the rigidity of the centralised database has forced application developers to store some data elsewhere ‘temporarily’. The centralised database has lost its way in a sea of data and the traditional DBAs will be paddling aimlessly around with it. Regardless of whether it has to do with cloud computing or not, the data professional needs to take more responsibility for the data, not by locking down the database, but by understanding how it is used and where it is stored.
NoSQL
For most data professionals the NoSQL movement is seen as an aggressive attack of their fiefdom, which, in a way, it is. The SQL model has become a bottleneck and does not support the dynamic nature of cloud computing architectures. Data professionals should, if not embrace then at least understand, the NoSQL movement and the patterns that it represents. Data professionals need to see that in some cases NoSQL is a good fit for a particular requirement (such as search) and need to work with the NoSQL practitioners so that the data is meaningful, accurate, secure and ultimately finds its way back to the centralised data store in neccessary.
Data Risk
Understanding the risk of data loss or compromise is a big part of cloud computing and the answers are complex because of different types of data at varying levels of granularity represent different risks. The enterprise data professional is more intimate with every table and row in the database than anyone else in the organization and needs to help the business understand what atomic pieces of data are for so that the risk can be managed. Over time this will affect how databases are designed – optimised for distribution and risk rather than performance or the most logical scheme – and data professionals are key to making this happen.
Biggest Change
While cloud computing will change application architectures and user interfaces, by far the biggest impact will be on databases. After all, much of the high performance storage consumed by enterprises is dedicated to data and economic benefits of cloud computing will be applied to optimising the cost of data storage, processing and operations – yet it is data that is the most exposed to risk.
While most of this series has been dedicated to less technical roles, the data professionals deserve a special mention because I see the disparity between interest in cloud computing and the role that they need to play greater than any other group of people who need to understand cloud computing.
Disclaimer: This is not a complete list and seasoned DBAs know to keep developers away from their stuff
Simon Munro
@simonmunro
The ‘Who Should Know About Cloud Computing’ Series
This post is part of a series of posts for non technical roles, which you can follow from the links below