|
|
Moans, Groans and Scary-tales...
-
Good things come to those who measure! I have been recently asked to investigate ways to reduce bandwidth utilisation of a WCF service and thought that it would not be a bad idea to share the results. The test was a relatively simple method call (search operation) performed on the client which would return ~500 objects from the server side. I tested the app using various WCF bindings and the table below contains the sizes of the payloads for the bindings I have tested. (the results have been obtained using Wireshark): | Binding | Payload size (bytes) | Relative size | | WS HTTP with message security | 2747082 | 100% | | NET-TCP with message security | 2068024 | 75% | | NET-TCP with transport level security and encryption | 393205 | 14% | | NET-TCP with transport level security and no encryption | 392948 | 14% | Your mileage will obviously vary, this example however illustrates that you can get substantial reduction of payload sizes by changing a couple of configuration files. This is what I call a result! :)
|
-
A friend of mine who was once involved in a test role at play.com recently mentioned to me a very interesting story. In case you did not know them, play.com sell a lot of stuff over the internet. At the time my friend worked there, play.com had in their dev centre a big button displayed on an even bigger screen indicating the number of orders they were getting every second. As play.com never sleeps, they knew that when the number of orders/sec dropped below a certain threshold and the button went red, alarms had to be raised, SOS emails had to be sent out and the fat controller had to be woken up as something was clearly going wrong. (OK, I made up the fat controller but it’s a different story).
I have no idea how the whole system was implemented but it clearly illustrates a very good use of software instrumentation. If techies at play.com they were as clever as I think they were, the simplest method to get the “button” to work is to use bog standard performance counters as they may serve at least several purposes:
- As illustrated by creative folk at play.com, performance counters may act as an early warning system. Excessive values may raise automatic alerts through monitoring tools and this is usually highly appreciated by the "ops" people responsible for day to day operation of the software.
- Performance counters allow us and our clients to trace execution times, throughputs etc which may be the only way to verify whether a third party provides appropriate level of service (SLA). This comes especially handy when your app relies heavily on services provided by others.
- As all things related to instrumentation performance counters are often immensely helpful in diagnosing software problems when something goes, or is about to go wrong
- Performance counters give us true picture of behaviour of our software: how many logins, saves, loads does the application perform? How does the workload differ from the requirements and/or expectations?
Knowing that most of us find instrumentation as exciting as having the root canal done, I came up with a plan to use PostSharp AOP to ease some, if not all of this pain. If you are interested in publishing performance data, read on as you may find that the entire process can be made very simple and easy with a couple of AOP tricks.
Anatomy of performance counters
Before we start publishing any performance data we need to "register" our performance counter(s) in the operating system: this can be done in a number of ways but personally I prefer to do it programmatically (if possible). The following snippet illustrates how to do it:
Code Snippet
- var counterCreationDataCollection = new CounterCreationDataCollection();
- var valueCounter = new CounterCreationData("Counter name", "Help text if any?",
- PerformanceCounterType.NumberOfItems32);
- counterCreationDataCollection.Add(valueCounter);
- PerformanceCounterCategory.Create("Name of category",
- "Sample performance counter category.",
- PerformanceCounterCategoryType.SingleInstance,
- counterCreationDataCollection);
The problem with programmatic approach is security: in order to execute the code, we not only need to have admin privileges but on some operating systems (Vista and Windows 7) the program has to be “Run as admin” which may be a bit tricky. To get around the pesky UAC issues you may need to modify application’s manifest file which is described very nicely here. Other than using programmatic approach, performance counters can be installed by a setup program with a bit of help from custom installer class derived from PerformanceCounterInstaller. Unfortunately once created, PerformanceCounterCategory hosting our counters cannot be modified so if new counters were to be added, the entire category has to be recreated. For this very reason during development I usually have a piece of code which reinstalls all required performance counters on application start-up as over the lifecycle of the project both numbers and types of performance counters are very likely to change. When the application reaches it's final stages of development the task of creating performance counters can be delegated to a setup program.
Usage
Once we have our counters in place we have a couple of ways to interact with them but the choice is relatively simple: we can either use thread safe Increment/Decrement/IncrementBy methods or directly manipulate counter's RawValue as the following sample illustrates. According to MSDN using thread safe approach is about 5 times slower than directly accessing RawValue, but you have to be sensible about it: if you are instrumenting a particularly “hot” piece of code, direct access may indeed be an option but I would not go this route unless profiler indicates that it may provide some tangible benefits (measure first!).
Code Snippet
- var counter = new PerformanceCounter("Category", "Counter name", false);
- // We can increment the value in a thread safe maner...
- counter.Increment();
- // ... or set the value directly
- counter.RawValue = 123;
Once you start digging into System.Diagnostics documentation you will surely discover that there are 28 different types of performance counters. The list available on MSDN may seem very daunting at first but from the publishing perspective counters really fall into two categories:
- Performance counters representing "number of items". Examples may include "total number of logins", "sales transactions processed" etc. When publishing this sort of data we usually have to simply capture the number of invocations of a particular operation and use PerformanceCounter.Increment() method. If the number may go down as well as up we can use either Decrement() method or directly set the RawValue property.
- Performance counters related to time of execution. The time has to be expressed in terms of "timer ticks" and we have to measure how long the operation took and then simply call PerformanceCounter.IncrementBy(executionTime)
Now you may ask how on Earth is it possible to get 28 types of performance counters given that we only set either the number or the time of execution? The answer is simple: majority of the counters use some sort of a magic formula involving one or two counters to come up with a final value. The best example is performance counter of type RateOfItems32. Every time performance monitor samples it's value, it checks how much the value has changed since the last sample was collected and then divides the difference by the time elapsed since the last sample. This gives nice Rate/sec and all we have to do to publish it is to simply increment a counter every time an operation takes place. Slightly more complex scenario utilises values of two different performance counters; imagine that we want to expose "cache hit ratio" from a caching component. In order to achieve this we have to create two counters of types RawFraction and RawBase respectively. Whenever we have a "cache hit" we increment RawFraction and every operation (whether it is a hit or a miss) increments RawBase. This way performance monitor will calculate hit ratio as RawFraction/RawBase and we're done. More interesting things happen when we start dealing with times; we can for example expose a counter which represents number of threads executing given method at any time by simply exposing timer of type CounterTimer. Here is how it is done: imagine that an operation takes 200ms and performance monitors checks the value of the counter every second. If between two samples we have executed the operation 20 times each time incrementing the value of the counter by 200ms, at the end of our sampling interval the total accumulated value will increase by 20 executions times 200 ms = 4000ms. Given this increase, performance monitor will divide the delta by sampling interval which tells us that, on average, during our sampling interval we were executing 4000ms/1000ms = 4 operations simultaneously. Genius! Full list of supported counter types is obviously available on MSDN so I won’t be repeating it here, the only thing you have to pay special attention to is whether you are required to increment counter value by number of executions (or other items) or the time of the execution.
Performance of performance counters
If you read my previous posts you may remember the paradox of Schrödinger's cat. The paradox states that by measuring a value we may affect it. This is very true with performance counters as there is indeed overhead associated with using them. Having said that results captured by my favourite profiler indicate that unless we are doing hundreds of threads executing a method thousands of times per second there is nothing to worry about. This of course does not mean that that every single method of every single class has to be instrumented! As with any tool it makes sense only when it is used wisely: monitoring critical operations, 3rd party services etc.
Putting it all together
Knowing that we can cover 90% of performance counter scenarios by capturing either execution time or number of executions, we can wrap it all up in two simple PostSharp aspects. If you are new to PostSharp I would strongly suggest that you have a look at the website, but the idea is to simply decorate a method with a couple of attributes and let the PostSharp interception mechanism do the magic as the following code illustrates:
Code Snippet
- [CountingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.NUMBER_OF_CALLS)]
- [CountingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.CALL_RATE)]
- [CountingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.EXECUTION_TIME_BASE)]
- [TimingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.ACTIVE_THREADS)]
- [TimingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.EXECUTION_TIME)]
- private void InstrumentedMethod()
- {
- Thread.Sleep(_pause);
- }
As you can imagine there is no rocket science behind the aspects themselves. Here is the code of the TimingPerformanceCounter which is slightly more involved then the other one:
Code Snippet
- using System;
- using System.Diagnostics;
- using PostSharp.Laos;
- namespace EMC.PostSharp.PerformanceCounters
- {
- /// <summary>
- /// Implements aspect publishing method's execution time through performance counters
- /// </summary>
- [Serializable]
- public class TimingPerformanceCounterAttribute : PerformanceCounterAttributeBase
- {
- /// <summary>
- /// Initializes a new instance of the <see cref="CountingPerformanceCounterAttribute"/> class.
- /// </summary>
- /// <param name="categoryName">Name of the category.</param>
- /// <param name="performanceCounterName">Name of the performance counter.</param>
- public TimingPerformanceCounterAttribute(string categoryName, string performanceCounterName)
- : base(categoryName, performanceCounterName)
- {
- }
- /// <summary>
- /// Method executed <b>before</b> the body of methods to which this aspect is applied.
- /// </summary>
- /// <param name="eventArgs">Event arguments specifying which method
- /// is being executed, which are its arguments, and how should the execution continue
- /// after the execution of <see cref="M:PostSharp.Laos.IOnMethodBoundaryAspect.OnEntry(PostSharp.Laos.MethodExecutionEventArgs)"/>.</param>
- /// <remarks>
- /// If the aspect is applied to a constructor, the current method is invoked
- /// after the <b>this</b> pointer has been initialized, that is, after
- /// the base constructor has been called.
- /// </remarks>
- public override void OnEntry(MethodExecutionEventArgs eventArgs)
- {
- // Record the start time here
- eventArgs.MethodExecutionTag = Stopwatch.GetTimestamp();
- base.OnEntry(eventArgs);
- }
- /// <summary>
- /// Method executed <b>after</b> the body of methods to which this aspect is applied,
- /// even when the method exists with an exception (this method is invoked from
- /// the <b>finally</b> block).
- /// </summary>
- /// <param name="eventArgs">Event arguments specifying which method
- /// is being executed and which are its arguments.</param>
- public override void OnExit(MethodExecutionEventArgs eventArgs)
- {
- base.OnExit(eventArgs);
- long start = (long) eventArgs.MethodExecutionTag;
- // Increment the counter by the value of time elapsed
- PerformanceCounter.IncrementBy(Stopwatch.GetTimestamp() - start);
- }
- }
- }
In case you want to take the whole lot for a spin, the file attached to this post contains all the necessary ingredients including some unit tests. To build the app you will need the latest version of PostSharp installed on your machine. The sample app uses multiple threads and throttling to expose some typical scenario of performance counter utilisation (Calls per/sec, Active threads, Total calls and Average execution time).

|
-
On my way to work from London’s Waterloo Station one day I noticed a building on Southwark St which got me intrigued: “Kirkaldy’s Testing and Experimenting Works”. As it turns out there is a rather fascinating industrial history behind the building but it would not be worth mentioning here if it was not for a motto above the entrance: “Facts Not Opinions”. I can hardly imagine an area of software development where the motto would be more applicable than performance engineering. You see, far too many problems with performance come from the fact that we spend our time and resources “optimizing” code in areas which do not need optimization at all. We oftentimes do it because “everybody knows that you should do XYZ” or because we want to mitigate perceived performance risks by taking “proactive” action (aka premature optimization). If we were to follow the mantra of Mr Kirkaldy, we could avoid all of the above by doing just one thing: testing and measuring (and perhaps experimenting). So if you were to stop reading just now, please take this no 1 rule of performance optimization with you: measure first. Measuring is not only important when fixing code: it is also vital if you want to evaluate risk of potential design approach. So instead of doing “XYX because everybody knows we should”, whack a quick prototype and take it for a spin in a profiler One of my favourite performance myths is that you should “always cache WCF service proxies because they are expensive to create” (and of course everybody knows that). As I have heard this technique specifically mentioned in context of ASP .NET web app running in IIS I could immediately hear alarm bells ringing for miles… The problems with sharing proxies between IIS sessions/threads are numerous but I will not bother you with the details here, my main doubt was if WCF proxy can be efficiently shared between multiple threads using it (executing methods on it) at the same time. So I created a simple WCF service with one method simulating 5 sec wait. I set the instance mode “per call” and then started calling the service from 5 threads on the client side using the same proxy shared between all of the them. I used a ManualResetEvent to start the threads simultaneously and expected them to finish 5 seconds later (give or take a millisecond or two). Guess what: they did not, as they blocked each other on some of the WCF internals and the whole process took 20 seconds instead of 5. So now imagine what would have happened if you used this approach on a busy website: your “clients” would effectively be queuing to get access to the WCF service and you would end up with potentially massive scalability issue. To make things worse creating WCF proxies is nowadays relatively cheap (provided that you know how to do it efficiently). The moral of the story is simple: when in doubt – measure. Do not apply performance “optimisations” blindly simply because everybody knows that you should…. As good and beneficial as performance “measuring” can be, when doing so you may often come across a phenomenon known in quantum physics as the paradox of Schrödinger's Cat. To put it simply by measuring you may (and most likely will) influence the value being measured. It is important to mention it here as profiling a live system may become infeasible simply because it would slow it down to an unacceptable level. The level of performance degradation may vary from several percent (in case of tracing SQL being executed using SQL Profiler) to several hundred percent when using code profiler. Keep that in mind when testing your software as this once again illustrates that it is far better to do performance testing in development rather than fight problems in production when your ability to measure may be seriously hampered. On of the funniest performance bugs I have ever come across was caused by a “tracing infrastructure” which strangely enough took extreme amount of time to do its job. As it turned out someone decided that it would be great to produce output in XML so that it can be processed later in a more structured way than a plain text. The only problem was that XmlSerializer used to create this output was created every time anyone tried to produce some trace output. In comparison with WCF proxies XmlSerializers are extremely expensive to create and this obviously had detrimental impact on application using tracing extensively. I find it rather amusing as tracing is one of the basic tools which can help you measure performance, as long of course as it does not influence it too much…:) If there is one thing which is certain about software performance though, it is the fact that you can take pretty much any piece of code and make it run faster. For starters if you do not like managed code and overheads of JIT and garbage collection you can go unmanaged and rewrite the piece in say C/C++. You could take it further and perhaps go down to assembler. Still not fast enough? So how about assembler optimised for a particular processor making use of its unique features? Or maybe offload some of the workload to the GPU? I could go along these lines for quite a while but the truth is that every step you make along this route is exponentially more expensive and at a certain point you will make very little progress for a lot of someone’s money. So the next golden rule of performance optimisation is make it only as fast as it needs to be (keep it cheap). This rule sort of eliminates vague performance requirements along the lines “the site is slow” or “make the app faster please”. In order to tackle any performance problem, the statement of it has to be more precise, more along the lines of “the process of submitting an order is taking 15 sec server-side and we want it to take no more than 3 seconds under average load of 250 orders/minute”. In other words you have to know exactly what the problem is and what is the acceptance criteria before you start any work. I have to admit here that oftentimes I am tasked with "just sorting this out” when performance of a particular part of the application becomes simply unacceptable form user’s perspective. Lack of clear performance expectations in such cases is perhaps understandable: it is quite difficult to expect the end user to state that “opening a document should take 1547ms per MB of content”. Other than this the acceptability will depend on how often the task has to be performed, how quickly the user needs it done etc. So sometimes you just have to take him/her through iterative process which stops when he says “yeah, that’s pretty good, I can live with that”. So say that you have a clear problem statement, agreed expectations, you fire up a profiler and method X() comes up on top of the list consuming 90% of the time. It would be easy to assume that all we have to do now is to somehow optimise X() but surprisingly this would probably be… a mistake! The rule no 4 of code optimisation is to fully understand the call stack before you start optimising anything. Way too many times I have seen developers “jump into action” and try and optimise the code of a method which could be completely eliminated! Elimination is by far the cheapest option: deleting code does not cost much and you immediately improve performance by almost infinite number of percent (I’ll leave it to you to provide a proof for the latter statement:). It may seem as I am not being serious here but you would be surprised how many times I have seen an application execute a piece of code just to discard the results immediately. And last but not least as developers we sometimes fall into a trap of gold plating: it is often tempting to fix issues you may spot here and there while profiling but the first and foremost question you should be asking is what will be the benefit of it? A method may seem to be inefficient (by the looks of the code), say sequential search which could be replaced with a more efficient dictionary-type lookup, but if profiler indicates that the code is responsible for 1% of overall execution time, my advice is simple: do not bother. I have fallen into this trap in the past and before you know it you end up with “small” changes in 50 different source files and suddenly none of the unit tests seem to work. So the last rule is: go for maximum results with minimum changes, even if it means that you have to leave behind some ugly code which you would love to fix. Once your bottleneck has been eliminated, sure as hell another one will pop its ugly head so keep tackling them one by one until you reach acceptable results. And when you reach a situation when making one thing faster slows something else, as it often happens in database optimization, it means that you are “herding the cats” as we call it on my project and you probably have to apply major refactoring exercise. My current project has a dialog box with a tree view which used to take several seconds to open. On closer investigation we realised that the problem lies in how child elements of each tree node are retrieved: the algorithm used sequential search through a list of all elements stored in memory along the lines of var myChildren = allElements.Where( element => element.ParentID == this.ID).ToList(). As the dialog used WPF with hierarchical data template, each element in the list had to perform sequential search for its children which gives not so nice o-n-squared type of algorithm. The performance was bad with ~1000 of elements but when the number of elements increased overnight to 4000, resulting 16 fold increase in execution time was unacceptable. You may think that the solution would be to rework the algorithm and this indeed was considered for a while. But in line with “measure” , “keep it cheap” and “make it as only fast as it needs to be” rules the fix proved to be very simple. As it turned out the major problem was not the algorithm as such but the fact that ParentID property was expensive to evaluate, and even more so if it had to be invoked 16 000 000 times. The final solution was a new 3 lines of code long method IsChildOf(int parentID) which reduced the execution time by a factor of 60. Now that is what I call a result: 6000% improvement for 3 lines of code.
|
-
The very definition of software performance will vary depending on whom you ask. If you asked the end user he would immediately mention the “speed” of the application he has to work with. If you asked the CIO he would probably define performance as “throughput” measured in transactions per seconds. Finally if you asked an IT guy who has to deal with the hardware end of the system he would say that he needs scalability so that his duties are limited to provisioning more hardware when demand increases. All these elements: response time, throughput and scalability are desired components of software performance. I have spent last 12 months working pretty much continuously on performance optimisation and James Saul asked me to share some of my findings with a wider audience. To start somewhere I went on to dig up some resources on wikipedia and came across an interesting article on performance engineering. According to the article one of the objectives of this discipline is to “Increase business revenue by ensuring the system can process transactions within the requisite timeframe”. In other words performance is money and there is probably no better example of how it is lost than total meltdown of the Debenham’s website which took place just before last Christmas. I have to admit here that I have no idea what went wrong at Debenhams but I can easily imagine a number of ways to build a software product which breaks under heavy load. As they say there is more than one way to skin a cat and build poor quality software but this time round I will focus primarily on the “process” issues, rather than particular technical aspects. Small database syndrome (aka SDS) Personally I think that SDS is the major contributor towards building poorly performing programmes: if the development team works against a tiny database, they are very likely to get in serious trouble further down the line and there are a number of reasons for it. The most obvious is the fact that there will be more data (surprise, surprise) so naturally more work will be required to get whatever you want out of the database. Secondly, query plans will be turned upside down in light of larger tables and distribution of data will influence it heavily as well. And last but not least when working against a small datasets it is impossible to spot any potential performance problems as everything will (or at least should) execute rather quickly. The best example of spectacular “volume related” failure I have witnessed not so long ago is an application which when fired for the first time against fully populated database, executed 40 000 SQL queries during its start-up and the whole procedure took a better part of 40 minutes. To add insult to an injury, some of the tables involved in the queries were missing rather crucial indexes while others were indexed incorrectly (not that it matters a lot when you execute 40 000 queries to start one instance of the app). This potential fiasco made everyone involved in the project somewhat embarrassed and steps were taken to avoid such mishaps in the future. Luckily for the team this accident happened early enough in the project lifecycle and fixing it was relatively cheap and easy. But as you can hopefully see from this example SDS is a serious risk and I find it somewhat difficult to understand that people oftentimes try to find all possible excuses not to use properly sized database for development and/or testing. The one I hear most often is related to cost, measured in terms of either time or money; resources which someone has to spend to produce the data. But given the availability of data generation tools like the one provided by Redgate this is indeed a very poor explanation. It is even worse considering that cost of maintaining such a dataset is just a fraction of the total cost of the project. “We will have better hardware in production” This is another one of my favourites which I hear a lot when people testing an application realise that something is not quite right performance-wise. Accepting that the app is sluggish usually means that someone has to admit to a failure of sorts and nobody likes it. So people usually go into denial and try to find excuses not to tackle the problem now and then. If you consider that most of us developers work on single processor machines, it is not hard to see how people may fall into this trap, but even so basic calculations often prove that hoping to kill the problem with hardware may be nothing more than wishful thinking. Let me illustrate it with an example: lets consider a sample operation which takes 10 seconds on a modern single processor PC with plenty of RAM. It is easy to imagine that production hardware may be 20 times more powerful, leading to a false conclusion that in production the same process will take 1/20th of 10 seconds i.e. 500 ms. Job done. The failure in such reasoning is first of all an assumption that the production hardware will be serving one user at a time or that concurrent user load will have no influence on performance. Secondly the more powerful hardware may be indeed 20 times the capacity of the PC, but this capacity will be available only when you are able to parallelise the algorithm! If the original process is sequential (single-threaded) in nature, adding more processors to the server will not change response time at all. So the only conclusion we can draw from running software on inferior hardware is that if it works on a PC, there is a chance that it will work on a big server, provided of course that the software is scalable. On the other hand if it does not work well on your PC, the chances that it will ever work anywhere else under substantially heavier user load are close to zero. “We have no [time|requirement|money|resources] for performance testing” Some wise people say that if you have no time for testing, than you better have time for fixing last minute bugs and patching the app. The same is pretty much true when it comes to performance. When building systems which will potentially face high user load it is absolutely imperative that load and stress testing have to be executed unless you want to face similar fate as the website I have mentioned earlier. I may be biased here because I like to load test software, but load testing the app is probably the best way to make sure that it actually works. Let me give you an example here: about 18 months ago I participated in a POC at Microsoft’s working on a a website for an airline. Together with another guy from EMC we were responsible for doing the back end of the system: the database and WCF based app server. As we finished our job earlier than expected I decided that it would not be a bad idea to actually make use of available resources and take that thing for a spin a see what it can do. The app server was running on an 8-way 64 bit machine with ample amounts of RAM so I whacked some unit tests simulating users’ journeys through the website, plugged the whole lot to a VSTS load testing machine and pressed the green button. As soon as I pressed it we discovered that the whole thing grinds to a rather embarrassing halt within several seconds from being started… After a bit of head scratching we decided that it is a rather good idea to close server connections once you are done with them and repeated the test scoring a rather measly result of 100 method invocation per second. To cut the long story short during the next few days we have discovered that from performance point of view it is actually wiser to use ADO .NET rather than LINQ to SQL, that when building high performance systems it is better to have network cables which work at full capacity of the switch they are connected to, and that SQL Server 2008 rocks and it would take load from 3 app servers before it became fully saturated. In the meantime our load testing machine ran out of puff and we needed to use two more 4-way boxes in order to generate enough load to saturate the app server. The end result of this exercise was 18 fold (sic!) increase in system performance not to mention the fact that it was happy working for hours with no end. And when it came to presentation of the finished website everyone was raving about how quick the whole thing was. The moral of the story is however that things will inevitably break under heavy load. If you load test them before handing them to the users, chances are that they will get much more robust system and you as application developer will save yourself potentially huge embarrassment. PS: I know that this post is barely technical, but I promise to improve next time round :)
|
-
Integration testing WCF services can be right pain and I experienced it firsthand on my current project. If you imagine a hypothetical person repository service exposing 4 CRUD operations, it would make sense to test all four of them. If the operations were to be tested in random order, it would be perfectly feasible that testing update after a delete may fail if the object to be updated has been deleted by a previous test. The same obviously applies to reads following updates etc. In other words tests are dependent on each other and this dependency is really evil for a number of reasons: first of all it usually means that you cannot run tests in isolation as they may depend on modifications made by other tests. Secondly, one failing test may cause a number of failures in the tests which follow and thirdly, by the end of the test run underlying database is in a right mess as it contains modified data, forcing you to redeploy it should you wish to re-run the tests. Considering the fact that running integration tests is usually time consuming exercise, this vicious circle of deploy/run/fix becomes extremely expensive as the project goes on. TransactionScope to the rescue Fortunately for us WCF supports distributed transactions and if there is one place where they make perfect sense it is in the integration testing. Imagine a test class written along the following lines: The idea behind it is that whenever a test starts, a new transaction gets initiated. When the test completes, regardless of its outcome, the changes are rolled back leaving the underlying database in pristine condition. This means that we can break dependency between tests, run them in any order and rerun the whole lot without the need for redeploying the database. Holy grail of integration testing :) To make it work however, the service needs to support distributed transactions (which is usually a not a bad idea anyhow). Having said that have to be aware of various and potentially serious gotchas which I will cover later. To make a service "transaction aware" following changes have to be made (I assume default, out of the box WCF project): first of all the service has to expose an endpoint which uses a binding which in turn supports distributed transactions (e.g. WsHttpBinding) and the binding has to be configured to allow transaction flow. This configuration has to be applied on both client (unit test project) and the server side: Secondly, all operations which are supposed to participate in transaction have to be marked as such in the service contract: The TransactionFlowOptions enumeration includes NotAllowed, Allowed and Required flags which I hope are self explanatory. Using Allowed flag is usually the safest bet as the operation will allow callers to call the service with or without transaction scope. Making the service transaction aware as illustrated above is usually enough to make this whole idea work. The third change which is optional but I highly recommend it, is to decorate all methods which accept inbound transactions with [OperationBehaviorAttribute(TransactionScopeRequired=true, TransactionAutoComplete = true)]. By doing so we state that regardless of the client "flowing" transaction or not, the method will execute within transaction scope. If the scope is not provided by the client, WCF will simply create one for us which means that code remains identical regardless of the client side transaction being provided. The TransactionAutoComplete option means that unless the method throws an exception, the transaction will commit. This also means that we do not have to worry about making calls to BeginTransaction/Commit/Rolback anymore. The default for TransactionAutoComplete is true so strictly speaking it is not necessary to set it but I did it here for illustration purposes. The attached sample solution contains a working example of person repository and may be useful to get you started. The small print Important feature of WCF is the default isolation level for distributed transactions which is Serializable. This means that more often than not, your service is likely to suffer badly from scalability problems should the isolation level remain set to the default value. Luckily for us WCF allows us to adjust it; the service implementation has to simply specify required level using ServiceBehaviorAttribute. Unless you know exactly what you are doing I would strongly recommend setting the isolation level to ReadCommitted. This is the default isolation level in most SQL Server implementations and it also gives you some interesting options. Having done this the caller has to explicitly specify its required isolation level as well when constructing transaction scope. An interesting "feature" of using transaction scope, in testing in particular, is the fact that your test may deadlock on itself if not all operations being executed within the transaction scope participate in it. The main reason for which this may happen is lack of TransactionFlowAttribute decorating the operation in service contract. In the test below if the GetPerson operation was not supporting transactions, yet the DeletePerson was, then an attempt to read the value deleted by another transaction would cause a deadlock. Feel free to modify the code and try it for yourself. Distributed transactions will require MSDTC running on all machines participating in the transaction i.e. the client, the WCF server and the database server. This is usually the first stumbling block as MSDTC may be disabled or may be configured in a way which prevents it from accepting distributed transactions. To configure MSDTC you will have to use "Administrative Tools\Component services" applet from the control panel. MSDTC configuration is hidden in the context menu of "My Computer\Properties". Once you activate this option you will have to navigate to MSDTC tab and make sure that security settings allow "Network access" as well as "Inbound/Outbound transactions". Performance One issue which people usually raise with regards to distributed transactions is performance: these concerns are absolutely valid and have to be given some serious consideration. The first problem is the fact that if the service has to involve transaction managers (MSDTC) in order to get the job done it usually means some overhead. Luckily, the transaction initiated in TransactionScope does not always need to use MSDTC. Microsoft provides Local Transaction Manager which will be used by default as long as the transaction meets some specific criteria: transactions involving just one database will remain local incurring almost no overhead (~1% according to my load tests). As soon as your transaction involves other resources (databases or WCF services) it will be promoted to distributed and will get a performance hit (in my test case it is 25% decrease in performance but your mileage may vary). To check if a method executes within local or distributed transaction you may inspect Transaction.Current.TransactionInfo.DistributedIdentifier: value equal to Guid.Empty means that transaction is local. The second issue affecting performance is the fact that transactions will usually take longer to commit/rollback meaning that database locks will be held for longer. In case of WCF services the commit will happen when the results have been serialized back to the client which can introduce serious scalability issues due to locking. This problem can be usually alleviated by using ReadCommitted isolation level and row versioning in the database. Parting shots The project I am currently working on contains some 2500 integration tests, 600 of which test our WCF repository. In order to make sure that every test obeys the same rules with regard to transactions we have a unit test in place which inspects all test classes in the project and makes sure all of them derive from the common base class which is responsible for setting up and cleaning the transaction. I would strongly recommend to follow this approach in any non trivial project as otherwise you may end up with some misbehaving tests breaking the whole concept. Happy testing!
|
-
Simon Evans some time ago questioned if given the parametric nature of queries generated by ORMs, stored procedures still have a place in developers toolbox? At the end of the day why bother if ORMs seem these days seem to do everything for free? It looks to me that statement that "stored procedures are dead" (although Simon has never said that) has gotten a life of its own and some people made it into a mantra of sorts... Argument, which Simon quotes, that ORM generated parameterised queries are as performant as stored procedures is absolutely valid and from this point of view stored procedures are redundant: the days when we had to implement every single CRUD operation as a sproc are gone. People seem to forget however that using sprocs for performance was not the only reason to use them. The primary reasons why stored procedures are still worthwhile to be considered is the principle of isolation: personally I strongly believe that as much as we can, we should avoid dependencies between business and database layers. Dave Hanson with whom I work on a project for a large executive recruitment agency posted his doubts on his blog and his views are very close to mine, these days we often find ourselves on the receiving end of these dependencies when the DBA decides to change schema of the database and we have to refactor large portions of program logic to deal with the change. Having said that dependencies are not the only problem with ORM based persistence layer. First major problem which is a "feature" of all ORMs is the problem of lost updates, as their default mode of operation goes along the lines of: As any SQL aficionado will immediately recognise, this approach is a potential disaster in a multi-user system. If two of these batches were to execute concurrently, there is a fairly good chance that one would override the changes made by the other and instead of final value of initial balance + 20 we would end up with much less. Surely, this problem can be overcome by implementing optimistic locking, running the entire procedure within a transaction with sufficient isolation level etc but why bother when the problem can be solved easily with running a piece of code along the lines of: Not only is the second approach simpler, it also works well in a multi-user environment and minimises lock contention. Yet another reason why stored procedures are helpful is simply the power of SQL which is extremely efficient tool when it comes to handling large sets of data: when writing a stored procedure a developer has at his disposal all the weird and wonderful inventions of SQL including aggregate functions, non-equi joins, common table expressions etc. and the list goes on. On a recent POC for easyjet.com I used SQL tricks which would be simply awkward if not impossible to implement as operations on table based entities. Strangely enough results of those queries were successfully mapped to a set of WCF data contracts using LINQ to SQL mapping attributes because if anything is certain, it is the fact that manually mapping datasets to objects is a thing of the past (unless of course you are hardly pressed for performance which is another story). Technorati Tags: SQL, ORM, .NET
|
-
First of all I'd like to assure
non-developers accidentally reading this post that it has nothing to do with lysergic
acid diethylamide, an A-class substance commonly known as LSD or "acid". It's
all about ACID properties of database transactions in general (boring stuff),
and the "I" property and scalability in particular. Now that we have the
legalities behind us let's get back to business.
According to Wikipedia the "I" in the
acronym has the following meaning:
Isolation
refers to the ability of the application to make operations in a transaction
appear isolated from all other operations. This means that no operation outside
the transaction can ever see the data in an intermediate state; a bank manager
can see the transferred funds on one account or the other, but never on both -
even if he ran his query while the transfer was still being processed. More
formally, isolation means the transaction history (or schedule) is
serializable. This ability is the constraint which is most frequently relaxed
for performance reasons.
The last sentence is of particular interest
as it implies that isolation comes at a cost (concurrency vs. consistency) and
this basic fact prompted me to do some experiments and in result write this post.
The
Problem
The main problem with maintaining isolation
is that resources which are supposed to be isolated have to be locked. As I am
sure you are aware of it locking in general, and in databases in particular, means
lower concurrency and throughput.
To see exactly how database performance is
affected by concurrent reads and updates I devised a simple "bank" scenario:
while one thread moves money between hypothetical accounts the other tries to
get the total amount of money held in the bank (which has to remain constant).Various
approaches to this problem are the subject of the rest of this post and as
you'll hopefully see they produce very different results. Although the topic
may seem trivial (and rightly so) it represents a wider class of problems where
the same table is read and updated concurrently.
All
that's needed to test the "bank" scenario is a table defined as follows:

With two stored procedures, one for moving
the money and the other which gets the total accumulated:

Once the table has been populated with
10000 rows (and equal amount of money in each account) I ran the test app to
see how many times I could concurrently execute both stored procedures within
20 seconds and what results would I get (see attachment for the test app and
SQL setup script).
First
run
The first run of the program with default
SQL Server settings (isolation level set to READ_COMMITTED) produced following
results:
|
Operation
|
Total
# of executions (average of 3 runs)
|
|
Reads (totals retrieved)
|
1656
|
|
Writes (transfers executed)
|
19300
|
|
Inconsistent results
|
1120
|
The interesting thing in this case is the
number of inconsistent results, i.e. the database reported incorrect total of
all balances held in our "bank". In spite of the fact that the movement of
money happens within a transaction, the "reader" is clearly able to see "half-committed"
data. In case you were wondering why this happens have a look at the following
table which illustrates the cause of the problem.
|
Time
|
Reader
|
Writer
|
|
1
|
Reads row ID=1, gets the balance = 100
and releases the lock.
|
|
|
2
|
|
Updates row ID=1 sets the
balance=balance-10 (90) and exclusively locks the row.
|
|
3
|
|
Updates row ID=2 sets the
balance=balance+10 (110) and exclusively locks the row.
|
|
4
|
|
Commits and releases both locks.
|
|
5
|
Reads the row ID=2, gets the balance of
110 producing total of 210!
|
|
Due to different locking strategies the
reader hits rows updated by the other statement (and quite possibly the other
way around). The reader never reads uncommitted data so in principle the
database obeys the READ_COMMITED isolation level, the end result however may be
far from desired and many people will find it surprising.
Approach
no 1
The first possible approach to produce
consistent results which came to my mind was to set the isolation level for the
GetTotal stored procedure to REPEATABLE READ. In this case the locks are held
on the data for the duration of the transaction in order to prevent updates.
The outcome however was an immediate and somewhat surprising deadlock which the
following table explains:
|
Time
|
Reader
|
Writer
|
|
1
|
Reads row ID=1, gets the balance = 100
and holds shared lock on the row.
|
|
|
2
|
|
Updates row id=2 with balance = balance -
10 and holds the lock
|
|
3
|
Tries to read the row ID = 2 but it's
already exclusively locked by the writer so the statement is waiting for the
lock to be released.
|
|
|
4
|
|
Tries to update row id=1 but the row is
locked by the other statement. Transaction deadlocks.
|
|
5
|
Reader is chosen as the deadlock victim
as there is less work to "rollback".
|
|
Approach
no 2
The reason for the deadlock above is the
"progressive" locking of the rows as the SELECT query continues along the
table. The simple solution is to use the
TABLOCK hint in the query to make sure that the entire table is locked in one
"go".

This approach works exactly as expected (no
inconsistent results) the downside however is an immediate drop in "writer" performance
as the following table illustrates:
|
Operation
|
Total
# of executions (average of 3 runs)
|
|
Reads (totals retrieved)
|
3823
|
|
Writes (transfers executed)
|
3072
|
|
Inconsistent results
|
0
|
Approach
no 3
The third approach was something I was very
keen to test: in SQL Server 2005 there is a magic option (actually a couple of
them) which enables row versioning.
Row versioning is not a new thing (in fact
Oracle RDBMS used it "by definition" for as far as I care to remember) and the
whole concept works (very roughly) as follows:
|
Time
|
Reader
|
Writer
|
|
1
|
|
Writer transaction starts
|
|
2
|
Reader transaction starts
|
|
|
3
|
|
Row ID=1 Updated with balance+10=110, new
version number applied to the row and the
old version of the row gets stored in the "version store"
|
|
4
|
Tries to retrieve row ID=1, the engine
realizes that the row version is different than it was at T=2 so retrieves
previous version of the row from the "version store" with balance = 100.
|
|
|
5
|
|
Row ID=2 Updated with balance-10=90, new
version "number" applied and the old
version of the row gets stored in the "version store"
|
|
6
|
Tires to retrieve row ID = 2, the engine
realizes again that the row has been updated after the statement started so
retrieves previous version from the "version store" with balance = 100.
|
|
|
7
|
Produces correct total of 200 exactly as
it was at the time the statement started.
|
|
The simplest approach to enable statement
level row versioning is to use READ_COMMITTED_SNAPSHOT database option.

Executing the above statement enables row
versioning for statements (not transactions) executing at READ_COMMITED
isolation level which incidentally is the default isolation level. This means
that no changes in the application are usually necessary to benefit from this
type of row versioning. After setting the option the test app produced
following results:
|
Operation
|
Total
# of executions (average of 3 runs)
|
|
Reads (totals retrieved)
|
3103
|
|
Writes (transfers executed)
|
24855
|
|
Inconsistent results
|
0
|
As you can see this approach not only
solves the problem of inconsistent result but also substantially increases
throughput of the app. This is because one of the interesting side effects of row
versioning is the fact that no locks are applied during execution of SELECT
statements. When READ_COMMITTED_SNAPSHOT is active only "writers" block "writers"
which is in stark contrast to default SQL Server behaviour where everybody
locks just about everybody else (only readers do not block readers).
Approach
no 4
READ_COMMITTED_SNAPSHOT works at the
statement level i.e. consistent view of the data is maintained relative to the
start of the statement. In case of our test application this is perfectly
sufficient because GetTotal () stored procedure executes a single select
statement.
If we
wanted to maintain consistency at transaction level, the
ALLOW_SNAPSHOT_ISOLATION option has to be set to ON. The downside is that in
such a case row versioning has to be explicitly requested by setting the transaction
isolation level to SNAPSHOT.

Once the reader procedure has been modified
the test app produced following results:
|
Operation
|
Results
(average)
|
|
Reads (totals retrieved)
|
3156
|
|
Writer (transfers executed)
|
22697
|
|
Inconsistent results
|
0
|
The differences in performance of both
approaches using row versioning should be considered negligible as the
performance varied quite widely between runs.
Wrap
Up
As these results hopefully illustrate row
versioning not only solves some annoying result "consistency" problems but also
substantially increases performance of our sample app (see the following
graph).

I would not recommend blindly applying row
versioning strategies in every case as they put additional stress o the TEMPDB
(all updates have to save old row versions in there) and have some other
surprising properties but it's clearly an option worth considering for
scenarios where concurrency (locking and/or deadlocks) becomes an issue. Following
two graphs illustrate number of lock requests with row versioning disabled and
enabled. As you can clearly see the differences are substantial and so will be
the scalability of a system with frequent and concurrent reads and updates.
Lock requests and waits with READ_COMMITTED isolation level.

Lock requests and waits with READ_COMMITTED_SNAPSHOT isolation level (read line
illustrates processor usage during the test as the number of lock requests and is
minimal)

|
-
SQL Server 2008 is upon us and if there is one feature about which I am somewhat excited (sad geek, I know) is the plethora of new data types designed purely to store dates and times. To explain the reasons for this excitement let me just list the problems with existing SQL types.
The first major issue is that of resolution: Smalldatetime has resolution of one minute which makes it pretty much useless if you want to store in it anything but the date. The Datetime on the other hand offers a fixed resolution of 0.00333 sec which seems to be more than enough for most applications. However it is not the fact the resolution is too high or too low, it is the fact that it is fixed which is the problem. Such relatively high resolution is not required by most apps and because it is given by default, whether we like it or not, leads us directly to problem no two.
The second issue is that of size: 8 bytes long Datetime is an awful lot of space especially when used as a partitioning field of a FACT table in a data warehouse with hundreds of millions of rows. If you think that these days size of the data type is not that important think twice: why do you think Microsoft introduced VARDECIMAL in SP2 for SQL Server 2005?
The third major issue is that of range: the range of values that can be stored in a Datetime column does not match the range of .NET’s System.DateTime. Because the minimum value for Datetime supported in SQL 2005 (01/01/1753) is different from System.DateTime.MinValue (01/01/0001) as soon as you try to hand the latter to SQL Server, the program blows up nicely.
The fourth problem is related to storage of both date and time in the same column. What if someone wants to see all the transactions that took place after 06PM? The only way to achieve this is to use the DATEPART function to extract the relevant portion of the field and this immediately rules out any index usage (lets not get into calculated columns and function based indexes at this point). The end result is that although the query may return only a fraction of all the rows we’re still looking at a full table scan. Now let’s look at the problem of filtering by date: if we want to extract all the rows where date is equal to 13/11/2007 what we really say is 13/11/2007 00:00:00. This means that querying for events that took place on a particular day becomes awkward exercise along the lines of Transaction_Time between ‘13/11/2007 00:00:00’ and ‘13/11/2007 23:59:59’. The common approach to solving this problem is to use two separate fields to store date and time, this however leads to increased storage requirements.
SQL Server 2008 finally offers an elegant answer to all of these problems: there are separate Date and Time data types and to make things even better the Time has varying "precision". Both of them are tiny (3 bytes for the Date and 3-5 bytes for the Time depending on precision) so we can store date and time separately in just 6 bytes (assuming time resolution of 1 sec).
To make things even better there is now a “proper” Datetime2 data type which is perfectly “aligned” with .NET’s System.DateTime. This alignment is important for the reasons I've already mentioned: loss of precision and problems when trying to query/insert/update data using DateTime.MinValue are two major examples.
And last but not least the there is a Datetimeoffset data type which is identical in range and resolution to Datetime2 but stores the time zone information as well. This comes very handy when data stored in central database is accessed from different locations (countries). In such a case users across various time zones will be able to see the values stored in the database converted to their local times: 12:00 in London is 13:00 in Warsaw etc.
I'd love to say that people from SQL Server team did an excellent job as they truly went an extra mile to solve all of the date and time problems, however as soon as I put all those new data types to test I discovered some teething issues. New data types work as expected with ADO .NET when using Dataset or IDataReader: the values returned are sensibly mapped to .NET types (DateTime, TimeSpan or DateTimeOffset). My biggest issue however is that some of them do not work with LINQ to SQL, or at least I was not able to make them work. When mapping Time to either TimeSpan or DateTime, InvalidCastException gets thrown. Things get even more interesting when mapping Datetimeoffset column to a field of equivalent .NET type: all you get is System.Security.VerificationException with a mesage that says: "Operation could destabilize the runtime". The only way (with which I was able to come up so far) to overcome the Time mapping issue is to pass varchars/strings instead of Time as arguments to stored procedures and merge Times with Dates on the way back to produce a Datetime. In spite of all these issues in an "airline project" on which I'm working right now, finally having separate data types and columns for dates and times is a godsend.
|
|
|
|