When the SQL CLR, a technology enabling .net assemblies to be called from within SQL Server, was released in 2005 a collective gasp went up from DBAs all around the world. And rightly so, the idea that developers would be writing rogue little .net apps than run inside the database is worrying - DBAs wouldn't be able to scrutinise the code and (apparently) developers know precious little about databases to start with, so letting them loose using their favourite language was (and is) worrying.
SQL's solution to this concern was to lock the SQL CLR down as much as possible and they did this by writing a specialised CLR - a process virtual machine that was specifically for SQL. The idea of process VM's is not new and is the foundation of modern interpreted languages and frameworks such as .net (the CLR is the VM) and Java (the Java Virtual Machine is the VM). These VM's then expose certain functionality that has been abstracted away from the host hardware and OS - allowing the code to play in its own sandbox without messing directly with the disk, memory, network or anything else on the host environment that it is not supposed to.
.Net developers are forgiven for forgetting that they write code that runs in a VM, the CLR that runs on Windows is transparent, seamless and is not too restrictive - developers are still, for example, able to do some disk IO with the host machine. SQL CLR reminds developers very quickly that their apps are running in a VM - if the normal windows VM that we are used to has a big sandbox, the size of a large beach, then the CLR is a little cat litter box with very little space to play.
SQL CLR Sandbox | Windows CLR Sandbox |
| |
Other databases support the execution of other languages from the database - Oracle, for example, allows the execution of Java code in the database and Oracle spins up a Java VM on the host machine to execute it. In Oracle's case, if that Java process gets out of hand, then the database has no control because the VM is running outside of the database process (in the host OS), and the potential exists that a rogue Java application can bring the database server to its knees. This is not the case with SQL CLR.
The SQL CLR is hosted within the SQL process and the database has full control over memory utilization, security, processing cycles and everything else that a .net app may need. The result is a tightly managed environment that is tuned to database performance and prioritisation of database operations before .net execution. So if the database server is under load, the .net apps running in the CLR will be throttled back to provide priority to the database and if a .net app has gone Rambo in the CLR or seem to have fallen asleep, SQL will swing its axe and take out the .net app - allowing the database to operate normally.
| Rogue .net Assemblies |
|
The upside of all of this is obvious, the downside is that you can't just take any old .net library and execute it within the CLR. CLR .net libraries need to be designed to work within the CLR - the library has to be lean on memory, not access external resources, not do any external threading and, of course, not have a dependency on any other assembly that may do this. In order to demonstrate this, let me explain the steps I needed to go through to get my app working.
The requirement is to call a memory intensive pure .net API that reads binary data and produces tables from that data. The first thing I tried to do was create a small assembly that referenced the API. So I registered the assembly with CREATE ASSEMBLY and got an error something like this:
Warning: The Microsoft .Net frameworks assembly 'AssemblyName' you are registering is not fully tested in SQL Server hosted environment.
where AssemblyName referred to some Enterprise Library assembly. Just to get it working I thought I would try registering the Enterprise Library assembly, but then received a similar message relating to 'System.Management'. I've never been a big fan of the Enterprise Data Access Block but I am even less so now - having a dependency on System.Management is a bit smelly. Never mind, I thought, let me at least get this working, so I registered the System.Management assembly and everything went fine.
I managed to get the code running in the SQL CLR but very quickly ran into memory problems, as expected, and tried to allocate more memory to the SQL CLR - but that seems as close to impossible as makes no difference. SQL CLR is, for very good reason, a mean and stingy beast. So I thought that if I have something that is memory intensive and SQL CLR is only giving me a litter box to play in, why not run my stuff out of the SQL CLR process where I have a sandbox the size of a large beach that I can romp around in? Simple enough, I can create a WCF service and call out to that service.
Not so simple.
It seems that SQL CLR doesn't support WCF - unofficially it is possible but the official list of supported .net assemblies does not have WCF anywhere. Apparently it has to do with the use that WCF makes of threads and SQL CLR is very particular about who uses its threads.
It seems, if you look at the list of supported assemblies, that 'System.Web.Services' is supported. So I created a asmx service that did all the heavy lifting and called it from SQL CLR. So I created a SQL CLR assembly that referenced the web service and registered the assembly in SQL, only to receive an error message something along these lines:
System.InvalidOperationException: Cannot load dynamically generated serialization assembly. In some hosting environments assembly load functionality is restricted, consider using pre-generated serializer.
It seems that SQL CLR is not happy with doing the serialization from the web service in the litterbox all by itself so you have to create a serialization assembly using:
sgen /a:<some assembly>.dll
Which creates another dll, which can be created as an assembly in the SQL CLR:
CREATE ASSEMBLY [<some assembly>.XmlSerializers] from '<somepath>\<some assembly>.XmlSerializers.dll'
It is very clever to load the dependant assembly as well (your original assembly) but if you execute
select * from sys.assemblies
you will see that the assembly is marked as is_visible = 0, so you have to make it visible:
ALTER ASSEMBLY [<some assembly>] WITH Visibility=ON
...and then it all (hopefully) works. Note: All of this is done with WITH PERMISSION_SET = UNSAFE to get around other restrictions. In a production environment this is not what you want so you need to make sure that your assemblies can run in safe mode as much as possible.
So the SQL CLR satisfies the requirement of enabling .net assemblies to be 'written in SQL' and has some great features for doing so, providing fast access to the database context amongst other things. The CLR itself seems to have been engineered to satisfy other design goals of creating an execution platform that will not break, trip up or create vulnerabilities in the database - the last place that you want problems in any architecture. I think that the right balance has been found although I would like a little more control as to how much memory to make available to the litterbox or to set the priority of SQL CLR over/under other database operations (which SQL2008 has intrinsic support for)
From a .net perspective you need to architect, design and engineer you assemblies to function within the restricted environment, and not force functionality, or assemblies that can go rogue into the CLR. That way developers and DBAs should be able to maintain the fragile truce that they currently do.
Simon Munro