This post is all about a technical issue, an obscure delay when starting up a web service. It's going to be technical, and it's the kind of technical information that's not of great interest until you have puzzled over the same issue, in which case it's extremely valuable. I'll try to keep it interesting anyway. As with most obscure issues, it's caused by the unexpected interaction of several features.
I can't claim credit for working out what was going on – that was done by some dedicated people on the client site, but here is the whole setup:
We had made a web service in ASP.Net 2.0 for the client. The web service made use of a third-party component to generate images. This component was a library that was available from various different environments, including .Net. Technically it was a .net wrapper library for a COM dll. The COM dll was digitally signed, as was the .Net wrapper. Remember that point, it will be important later on.
On the client site they found that once it was installed on the testing server, the first call to the web service took over 30 seconds. This was long enough to cause other parts of the system to time out and fail. The cause was not obvious, since the same code would return in 2-3 seconds on the first call on a lowly development laptop, and tracing revealed that the delay was not in the code that we had written, it was in service start up.
More sleuthing exposed something even more worrying – during start up, some code was trying to contact a remote server and send data onto the network. We certainly didn't require the code to do that. The idea of spyware crossed our minds.
It turns out that the cause of all this is an interaction that is known, but not well documented.
When a digitally signed COM DLL is loaded into IIS, it is given higher trust than an unsigned one. Part of the design of digital signing is "certificate revocation", whereby if a signed binary is found to be up to no good, it can be added to a worldwide "Certificate Revocation list". This blacklist is stored on the signing authority's server. When the signed binary is loaded, the host will contact this server and see if the binary's signature is still valid. If so, the success is recorded so that the blacklist server is not contacted again for that binary, at least until a timeout period of several days expires and another check is done to see if the binary has been blacklisted in the meantime.
"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." – Leslie Lamport
But there is a third possibility: what if the blacklist server cannot be contacted? The server could be down, the network between client and server could be out of action or the client might not be on a network at all. It is not acceptable for a signed binary to stop working altogether simply because a remote server is down, but it is also not OK for a blacklist to be circumvented by unplugging your own network cable. So in this case, the binary is loaded as if it was not signed, and no success is recorded. Another attempt will be made to contact the blacklist server the next time the binary is loaded into memory.
In our case, the testing server where out web service was being installed was behind a comprehensive firewall that did not permit the code to connect to arbitrary web servers – and we didn't need that functionality, or so it seemed. So each time the third-party com DLL was loaded, The API function WinVerifyTrust was called twice, and the whole process spent thirty seconds waiting for the network connection to time out: each call had three retries each with a five-second timeout. There doesn’t seem to be any way to configure the count or timeout of WinVerifyTrust.
The attempted network connections that we were worried about turned out to be to crl.verisign.com and crl.thawte.com
The solution
There are three possible ways to extract yourself from this situation, should you find yourself in it:
1) Open up the server's firewall enough to let the blacklist server query succeed. In our case, this wasn't going to be an easy option since we had no direct control over the administration of the testing and production servers.
2) Turn off certificate revocation checking for the whole server (links below). I haven't tried this, but it looks like it might work. This is, however, a drastic step that might compromise the security of the server. We didn't do this since it wasn't our server to configure. It wasn't just one server either.
3) Get versions of the binary files that are not digitally signed. Fortunately the vendor was by this time as pleased to find out what was going on as we were, and happily supplied us with these files. It seems odd that disabling the security features is necessary to get it to run, but there it is.
Links: A discussion of the same issue with a different vendor's control:
http://blogs.xceedsoft.com/plantem/PermaLink,guid,3dde0262-1b7f-45d3-9a6e-164c842e422d.aspx
How to turn off certificate revocation checking:
http://digital.ni.com/public.nsf/allkb/18e25101f0839c6286256f960061b282
http://forums.ni.com/ni/board/message?board.id=232&message.id=827&page=3
WinVerifyTrust API function: http://msdn2.microsoft.com/en-us/library/aa388208.aspx