In future, more and more solutions will have data that is not stored directly in a SQL database but rather in cloud EAV (Entity Attribute Value) data stores, caches, client storage, document databases and so on. Usually it is the existing DBA’s and other SQL biased database experts that are the custodians of corporate data and, in general, do a better job at understanding, controlling, backing up, cleaning, authorizing and generally keeping data in a good, secure and reasonable state. So when presenting to the London SQL User Group a few weeks ago I tried to come up with something that was obviously cloud based, but was all about data and had elements that would be familiar to SQL experts.
I came up with CatFax, a contrived example where audit rows of tables are stored, not in the originating database, but in the cloud. The idea is that audit history, which is seldom accessed, tends to clutter up a production database and would be a good candidate to move elsewhere – off the production database and, in this example, into the cloud. The primary requirement that I set myself was that it should be accessible from SQL and even though the data is stored in a completely different format in a completely different location, it should look like SQL data.
The naming of CatFax is based on the lolcat ‘You can’t fax a cat’ where something is neither possible nor desirable. This project demonstrates that you can indeed fax a cat – you can store SQL (relational) data in the cloud in a meaningful and accessible way and, in particular cases, is desirable.
The project is comprised of the following components
- A SQL Server Database
- A SQL CLR procedure - Catfax.Sql.Client
- An Azure Hosted Service
- An Azure Storage Service
To add a row, the application works something like this…
- In T-SQL data is passed to the SQL-CLR stored procedure
- The SQL-CLR stored procedure reads the entire row out of the database and serializes it to an xml fragment
- The xml is sent to the wcf service hosted in Azure
- The Azure service looks for the previous instance of the row in an Azure table and increments the row counter by one
- The xml row is added as an xml fragment to an Azure table
To retrieve a row, the reverse happens…
- In T-SQL data is passed to the SQL-CLR stored procedure
- The SQL-CLR stored procedure passes the key information the WCF service hosted in Azure
- The Azure service locates the last added row in the index Azure table
- The Azure service returns the last stored row from the Azure table and passes the xml fragment back to the SQL-CLR sproc
- The SQL-CLR sproc parses the xml fragment and pushes it to the local sql pipe
- The row, with the original rows and columns is accessible in T-SQL
So lets look at this in more detail…
The AddAuditRow SQL-CLR stored procedure takes a piece of xml as a string and contains the tableName, schema and the fields and values for the keys.
DECLARE @rowSpecification nvarchar(max)
SET @rowSpecification='
<rowSpecification tableName="ProductCategory" tableSchema="dbo">
<keys>
<key fieldName="productId" value="2" />
<key fieldName="categoryId" value="1" />
</keys>
</rowSpecification>'
EXEC AddAuditRow @rowSpecification
This is used by the RenderRowAsXElement function. This function essentially builds up a ‘SELECT *’ query using the values for the primary key and takes all the rows and creates some xml out of them...
for (int t = 0; t < sqlDataReader.VisibleFieldCount; t++)
{
rowData.Add(new XElement("fieldValue",
new XAttribute("fieldName", sqlDataReader.GetName(t)),
new XAttribute("dataType", sqlDataReader.GetSqlValue(t).GetType().ToString()),
new XAttribute("value", sqlDataReader.IsDBNull(t) ? "DBNull" : sqlDataReader.GetSqlValue(t).ToString()),
new XAttribute("dataTypeName", sqlDataReader.GetDataTypeName(t).ToString())
));
}
The input xml is also used to generate the RowKey, which is necessary for the rest of the application to store the row in the correct place. The RowKey is a concatenation of the schema, tablename and key fields. In the above example, the RowKey is ‘dbo.ProductCategory-2-1’. The AddAuditRow function then passes the xml fragment, with the RowKey to the CatFax WCF service. Since SQL-CLR doesn’t support WCF, but does support Web Services, the WCF binding is done as basicHttpBinding so that the SQL-CLR function can call it as a vanilla web service.
The CatFax.Entities namespace contains the classes needed for accessing Azure table storage. The classes use the patterns used in the Azure labs, making use of the Microsoft.Samples.ServiceHosting.StorageClient namespace. My initial intention was to have a row identifier that would take the SQL originated RowKey (e.g. dbo.ProductCategory-2-1) and append a sequence number every time a row was added, so that I would land up with row keys in the actual tables something like dbo.ProductCategory-2-1-1, dbo.ProductCategory-2-1-2, dbo.ProductCategory-2-1-3 and so on. So I created a table called AuditRows (implemented by AuditRowEntity and AuditRowEntityDataServiceContext) which just has the RowData property containing the xml. The plan was to search through the row keys in the table using a wildcard and then find the highest sequence number but unfortunately Azure table storage doesn’t support the like(), substring() or similar extension methods meaning that there is no way to search through the table. So I had to create my own indexes in the form of the AuditRowIndexes table (implemented by AuditRowIndexEntity and AuditRowIndexEntityDataServiceContext) which stores the last sequence number for the particular RowKey.
So in order to add a row, the service first looks in the index table and if a row is found increments the sequence by 1. If a row is not found, a new row is inserted with a sequence of 1. This sequence number is then appended to the RowKey to insert into the AuditRows table. Note: This implementation does not handle the locking of the index row if more than one session is trying to add a row to the index – it would probably be better to use a worker role and queues to generate the sequence numbers.
One irritation that I had was that the RESTful nature of Azure Tables means that it is not easy to detect if a row exists already because when requesting a row with a particular key, if the row does not exist a ‘Resource not found’ error is raised which, in a RESTful sense, makes sense, but means that you have to code by exception rather than being able to test for null. There has to be a better approach to this and I’d be interested in hearing about it. The code below is to check for the existence of a row.
public bool Exists(string rowKey)
{
try
{
AuditRowIndexEntity foundAuditRowIndex =
(from auditRowIndexEntity in this.CreateQuery<AuditRowIndexEntity>("AuditRowIndexes")
where auditRowIndexEntity.PartitionKey == "Index"
&& auditRowIndexEntity.RowKey == rowKey
select auditRowIndexEntity).FirstOrDefault();
}
catch (DataServiceQueryException ex)
{
return false; // Not interested in exception - assume that row does not exist
}
return true;
}
The rest of the code in the Azure Web Role simply looks for rows, adds rows and indexes and has been implemented mostly in the DataServiceContext classes. The data stored in the Azure Tables looks something like this (viewed with the Azure Storage Explorer)…
For the index table
and for the row table, with the xml fragment generated by the SQL-CLR procedure
In order to get a row out of the Azure storage, the same row specification xml is passed to the GetAuditRow stored procedure
DECLARE @rowSpecification nvarchar(max)
SET @rowSpecification='
<rowSpecification tableName="ProductCategory" tableSchema="dbo">
<keys>
<key fieldName="productId" value="2" />
<key fieldName="categoryId" value="1" />
</keys>
</rowSpecification>'
EXEC GetAuditRow @rowSpecification
The SQL-CLR sproc calls the WCF Azure service which locates the RowKey in the AuditRowIndexes table to find the last sequence number of the RowKey and retrieves the corresponding AuditRow. The data is returned as xml which is parsed by the SQL-CLR sproc. The sproc creates a SqlDataRecord based on the SqlMetaData that is read from the row data xml and the SqlDataRecord is sent back to SQL using the SqlContext.Pipe. The result is a row that is usable in T-SQL.
Tada!
There is obviously a lot more that can be done before catfax can be turned into anything usable but it was my first non-lab Azure project and took about three days to code, most of the time spent going down blind alleys. It does demonstrate how the cloud and traditional data stores can work together and interoperate relatively seamlessly.
If you want to have a closer look at the code, it is up on codeplex at catfax.codeplex.com.
Simon Munro
@simonmunro