Welcome to EMC Consulting Blogs Sign in | Join | Help

Sanjay Kapoor's Blog

  • Transaction Audits - SQL Server 2008 R2

    One of our projects presented us with an interesting conundrum. The application was browser based, the technology stack being WIF, MVC, EF, and SQL Server 2008 R2 (EE) (read Simon Evans’ blog for more detail on WIF usage and exceptions). Here's the dilemma. How do we capture which user made changes to data in SQL Server, considering that authentication is done through claims based authorization? The service layer is always aware of user credentials; however, connection to SQL Server is made through a common service account. We could percolate the user's credentials to SQL Server, but contemplating that there could be hundreds of users making changes at any time, how could we guarantee that a particular change, or transaction, belonged to a specific user? Add to that the following; 
     
    We wanted to make minimal changes to the data model while ensuring that we maintained a history of changes in data
    The customer specification required transaction audits
    The customer also required financial data comparisons- year on year, quarter on quarter
     
    We looked at CDC (Change Data Capture), a new feature that's only available in Developer, Enterprise and Datacenter editions of SQL Server 2008, and that actually addressed most of our concerns listed above. We did not have to make any changes to the data model since once we enable CDC on a database, we have very granular control over which tables, and attributes, we want to capture. SQL Server provides the stored procedures that build the necessary table valued functions that allow you to retrieve changes. No extra tables to be built. 
     
    A word of caution though; the table structure returned through the mentioned function calls still need a bit of work to make the underlying data user viewable (friendly). Also, there's a limit to how long changes to data are captured by CDC. The default is 3 days, and is configurable, after which capture data starts being recycled. One may tend to think that we could simply increase the number of days for which we are capturing change data, but considering that CDC works by interrogating the transaction log, which is great for guaranteeing that only transactions that are committed will be captured, the flip side is managing  your backup cycles (it will interfere, think carefully about the implications), and importantly, transaction log bloat. 
     
    Write wrapper functions around native function calls; in fact, SQL Server provides a stored procedure sp_cdc_generate_wrapper_function to start you off.  Happily, functions created by this procedure are all named as in the template fn_all_changes_<schema>_<table>. However, depending on the number of tables you intend to capture data changes on, the number of functions you deal with increase by a factor of 2 per table, 3 if you want to capture net changes. Create a change table (ChangeId, TransactionId, TableName, ChangeColumn, OldValue, NewValue, ChangeBy, ChangeDateTime)  that would store changes to all tables, preferably, in another schema and in a separate filegroup. 
     
    Next step then...use SSIS to ring fence the entire lot. 
     
    Here is the sequence;
     
    Specify the interval for which we want to retrieve the data
    Make a call into the database schema to get a list of all tables that are being captured 
    Loop through the list of tables, substituting the appropriate table name in the function  fn_all_changes_<schema>_<table>, make function calls to retrieve data and insert into the change table created earlier
    Clean up transaction log 
     
    What we have now is;
     
    A complete history of changes made to all tables in a single table, in a readable format
    A transactional audit of changes
    We have tackled the log bloat issue referred to above
    Backups can be easily synchronized
    The ability to create history tables for every table in the database schema to cater to your BI requirements. You may be thinking more SSIS work...you're right.
     
    You must have picked up the fact that, so far, I've avoided mentioning the inevitable...which user made the change and how to marry it to a change transaction. The clue came from the word transaction itself; remember that CDC functions interrogate the transaction log? We made a leap of faith around that and assumed that given a series of changes to data in multiple tables that are in a transaction, it should figure under one lsn  (log sequence number) within the transaction log. It did, although internally, SQL Server creates implicit transactions for every table that is being changed, thus affording you greater control over proceedings (not that we require it for our purposes). Now back to the user.
     
    Since we were using EF we wondered whether we could leverage that to achieve our purpose. We had to find a way to coerce EF to include the user that was propagating changes at the point changes were being written. Simon Ince's blog here was a great help. Our solution's slightly different to his, but we got some real traction after reading it. 
     
    We created a simple table (TableName, UserName and ChangeDateTime) in the database to log who made the change to what, and when. This allowed EF to surface the table, along with others, as an entity object. We then created a partial class for the generated object context class and subscribed to the SavingChanges event. In the event handler, we created a new entity to represent the tablename/username/changedatetime table, and added this to the context so that it is persisted at the same time as the saving of the main entity that was modified, and importantly, it occured in the same transaction.  There we had it, true transaction audits in SQL Server.
     
    This blog is the first of a series in which I plan to go into more detail, depending upon responses. Many thanks to Simon Evans, Owain Wragg and Michael Ciba who proved invaluable sounding boards on the WIF, MVC and EF, and to Owain who wrote and tested the code in EF. 

Powered by Community Server (Personal Edition), by Telligent Systems