One of the many reasons I love consulting is that you get the opportunity to work in diverse businesses, review and inspect various implementations and experience unique challenges that come with each client visit. My latest engagement has involved working with one of the largest Teradata installations in the world, which acts as the primary data source to our application. To give you an idea of the scale; this system has over 40 Billion rows in just one of the fact tables… 40. Billion. That's a lot of rows. During this engagement we encountered a rather bizarre issue that doesn't seem to be well understood so I thought I'd blog about it. Get ready for a mind job...
The Scenario
We had been pulling data from this monster into our System. The table in question had a paltry 900 million rows in it which we had been incrementally loading into for a while. All was well. Then suddenly out of nowhere we got a deadlock during the load process. We re-ran the load and it went through successfully the second time. Odd. We didn't get another deadlock for a while and then at about 1 Billion rows we were getting them all the time. What on earth was going on? Time to put away the violin, stick on the deer stalker and investigate...
Before proceeding it's probably worth sharing a bit about the implementation (without actually divulging any industry details of course) so that you can understand the context of these deadlocks and why they were so strange. Remember though this is a locking problem not an ETL problem. The load pattern caused the issue to manifest but in itself isn’t responsible for its occurrence. The data itself was completely clean and when the process was executed sequentially this issue did not arise.
The Table
| Column Name | Data Type | Key |
| Date | date time | Clustered |
| CountryID | tinyint | Clustered |
| GroupID | smallint | Clustered |
| CodeID | int | Clustered |
| Other Columns | ... | no additional indexes |
The Data
To illustrate the point of the uniqueness of the data below are two data sets. You can see that each row is in itself a unique set of values. These four rows can and will cause a deadlock based on row locks. I promise.
Data Loaded by Parameters 2009-05-19 & CountryID 2
| Date | CountryID | GroupID | CodeID |
| 2009-05-19 | 2 | 4271 | 5835066 |
| 2009-05-19 | 2 | 4619 | 2546652 |
…..
Data Loaded by Parameters 2009-05-19 & CountryID 3
| Date | CountryID | GroupID | CodeID |
| 2009-05-19 | 3 | 4245 | 2651987 |
| 2009-05-19 | 3 | 4657 | 5744053 |
The Load Pattern
Stored Proc Pseudo Code
INSERT INTO BillionRowTable
( [Date] DATETIME
,CountryID TINYINT
,GroupID SMALLINT
,CodeID INT
)
SELECT [Date]
,[CountryID]
,[GroupID]
,[CodeID]
FROM dbo.25MillionRowTable
/*
Where Clause filters rows from 25 Million for the day
to 5 million for the Date/CountryID Combination
*/
WHERE [Date] = @Date
AND CountryID = @CountryID
In short then we were loading 5 sets of data in parallel through a parameterised stored procedure. The data set used in each stored procedure execution was for a different country on a given day. A country could have as many as 5 million rows in it making for a total of 25 million rows in each parallel load. Given that there was no other processing on this server it was therefore more than a little odd that we would get a deadlock. At the risk of repeating myself, each procedure was wholly responsible for its own set of data. How on earth could a deadlock occur? The data was clean and the logic was fine so what could have been the problem? We even ran the stored procedures sequentially without a problem. The issue only arose when running the same set of data in parallel.
Deadlocking Recap
So what then is a deadlock? This is when two processes each hold a lock or set of locks that is incompatible with a lock that the other process is trying to acquire. As neither one process can complete the two processes are said to be deadlocked. Rather than look on in morbid fascination SQL Server chooses a victim based on which of the two processes is easiest to roll back based on the amount of resources it has consumed. The victim is then rolled back.
If you have trace flags 1204, 1222 running or are using profiler/SQL Trace to capture deadlock events you can gather information on what processes were involved in the deadlock and which one or ones were chosen as the victim. This normally makes identifying the deadlock and why it occurred a relatively straightforward process. Resolving the deadlock can be another matter entirely but perhaps that's a subject for another post.
However, in this case the deadlock info was not so straightforward. First though let’s look at a ‘normal’ scenario.
A Simple Graph – A Community Example
Below is a deadlock generated by code from fellow MVP Peter Ward’s blog. The code to reproduce this example can be found here. I have used profiler to reveal the deadlock graph.
![clip_image001[7] clip_image001[7]](http://blogs.conchango.com/blogs/jamesrowlandjones/clip_image0017_thumb_5910F4CF.png)
In this instance we can see that SPID 64 requested an Update lock on a row that was exclusively locked by SPID 65. However, SPID 65 needed to update a row on associated object 3602879703904288768 (this is the objects hobt ID). Unfortunately for SPID 65 this was already locked exclusively by SPID 64. Neither can therefore proceed. SPID 65 has been chosen as the victim.
By hovering over the graph (see above) you can also see the last statement in the batch for each SPID and by querying sys.partitions (see below) you can see how the associated objid maps back to the table in question.
USE tempdb;
SELECT OBJECT_NAME(p.[object_id]) AS ObjectName
,p.[object_id] AS ObjectID
,i.name AS IndexName
,hobt_id AS HoBTID
,partition_id AS PartitionID
,partition_number AS PartitionNo
FROM sys.partitions p
JOIN sys.indexes i
ON p.index_id = i.index_id
AND p.[object_id] = i.[object_id]
WHERE hobt_id IN (3602879703904288768,3530822109825400832);
The result set for these two queries is as follows:
My Graph – Not So Obvious
So now we have re-capped the basics and can understand the essential deadlock information we are likely to be presented with it’s worth looking at the graph that was making my eyes bleed. Perhaps you can test yourself to see if you can see what, on the face of it, is so wrong about this. N.B. I have had to change some of the names for security and privacy reasons.
So what is this graph saying? It is telling me that SPID 82 is deadlocked against SPID 85. It is saying that they were both trying to get exclusive locks for a ROW in hobt 72057594693222400.
How was this possible? Why should SPID 82 have a lock on any data being loaded by SPID 85? There was no overlapping data that would allow this to happen. Why was this only starting to happen now? It just didn’t make any sense. Initially you might think that this is due to page splits or some other aspect of the storage where the data ranges may overlap but you will see these are both “Key Locks”.
I ran this query hoping for some clarification.
SELECT OBJECT_NAME(p.[object_id]) AS ObjectName
,p.[object_id] AS ObjectID
,i.name AS IndexName
,hobt_id AS HoBTID
,partition_id AS PartitionID
,partition_number AS PartitionNo
FROM sys.partitions p
JOIN sys.indexes i
ON p.index_id = i.index_id
AND p.[object_id] = i.[object_id]
WHERE hobt_id = 72057594693222400;
This just returned the name of the target table and the primary key, so not much help to be found there. What I can tell you was that both queries were loading into the same partition in the same hobt_id on the same index.
What could we conclude at the point? Well we knew the deadlock was occurring on the same clustered index of the same table. What could we do next?
First things first we needed to find out what lock resources were involved in the deadlock. If you look at the community sample and / or my graph you will not see the lock resource hash value in the graphical depiction. We had to find another way to get this information. We decided to crack open the deadlock graphs to see what we could see…
Did you know you can export your deadlock graphs from profiler and save them as xml files (with a .xdl file extension)? This is helpful for a number of reasons but the biggest reason I have found is that you can then open the file in notepad, management studio, or even your favourite XML editor (mine is XML Spy from Altova) and get even more information!
My Deadlock Graph – in text format
<deadlock-list>
<deadlock victim="process389e718">
<process-list>
<process id="process384e478" taskpriority="0" logused="500719924" waitresource="KEY: 7:72057594693222400 (0801bf3924dc)" waittime="60718" transactionname="INSERT" XDES="0x80028c80" lockMode="X" schedulerid="8" kpid="360" status="suspended" spid="85" sbid="0" ecid="0" priority="0" transcount="2" isolationlevel="read committed (2)" lockTimeout="4294967295">
<executionStack>
...
</executionStack>
<inputbuf>INPUTBUFFER</inputbuf>
</process>
<process id="process389e718" taskpriority="0" logused="350193272" waitresource="KEY: 7:72057594693222400 (0801a4a3d7f1)" waittime="4000" transactionname="INSERT" XDES="0x800ce8f0" lockMode="X" schedulerid="16" kpid="4552" status="suspended" spid="82" sbid="0" ecid="0" priority="0" transcount="2" isolationlevel="read committed (2)" lockTimeout="4294967295">
<executionStack>
</executionStack>
<inputbuf>INPUTBUFFER</inputbuf>
</process>
</process-list>
<resource-list>
<keylock hobtid="72057594693222400" dbid="7" objectname="DB.DBO.TABLE" indexname="TABLE_PK" id="lock59b304700" mode="X" associatedObjectId="72057594693222400">
<owner-list>
<owner id="process389e718" mode="X"/>
</owner-list>
<waiter-list>
<waiter id="process384e478" mode="X" requestType="wait"/>
</waiter-list>
</keylock>
<keylock hobtid="72057594693222400" dbid="7" objectname="DB.DBO.TABLE" indexname="TABLE_PK" id="lock5df249400" mode="X" associatedObjectId="72057594693222400">
<owner-list>
<owner id="process384e478" mode="X"/>
</owner-list>
<waiter-list>
<waiter id="process389e718" mode="X" requestType="wait"/>
</waiter-list>
</keylock>
</resource-list>
</deadlock>
</deadlock-list>
I have highlighted the two key pieces of information for this diagnosis and have copied them below for ease of reference. The values between the parenthesises are the lock resource hash values.
waitresource="KEY: 7:72057594693222400 (0801a4a3d7f1)
waitresource="KEY: 7:72057594693222400 (0801bf3924dc)
The question is what keys do these values relate to? My data is working on unique keys in each insert so there must be something else going on. But what?
When Locks Get Physical :–
Using %%lockres%%
I was struggling to come up with an answer. So I thought I’d run the scenario past my good friend Simon Sabin to see if he had any ideas. Sure enough he came up with an undocumented feature in SQL Server 2005 and above (courtesy of the late Ken Henderson) that helped us understand this particular problem. The feature is called %%lockres%%.
When people think about a row lock they often think about the DDL of their table. In my case I had a composite clustered index that was telling me I had no dupes across Date, CountryID, GroupID and CodeID. It was logical therefore to assume that SQL Server was using this key when saying that a lock has occurred. Well… The answer is that it is … and it isn’t.
Locks in SQL Server are often said to be a logical concept whereas Latches are seen as physical. After being bitten by this issue I don’t think it’s wholly fair to conclude this. My reasoning is as follows. Locking may be a logical concept but it has a physical implementation. It’s true that Latches look after the physical movement of data between addresses but I don’t think it’s fair to draw this parallel. The Lock physical implementation is based on how we, the designers, design our table. There is an internal mechanism involved here that translates my DDL definition into something that works for SQL Server – and most notably of all performs for SQL Server irrespective of the number of columns I specify in my key.
When SQL Server needs to lock a row it creates a hash value that is based on the key values of the table. It is this hash value that is used by the lock manager and means it has a single value to look at when checking to see if a row is already locked. The values in my key then (which was 16 bytes in length) were being distilled to a 6 byte hash. So you see Locking has a physical side as well – to think of Locking as a purely logical concept would be to miss an important underlying concept that translates the definition of our DDL into information that can be consumed by the lock manager.
This was the missing link (and I do not mean in the Ida context http://news.bbc.co.uk/1/hi/sci/tech/8057465.stm). Perhaps the really perceptive people can already see where this is going. The hypothesis then was that SQL Server was generating a duplicate hash value across the countries I was loading.
Now we needed to prove it; and %%lockres%% was the key. %%lockres%% always returns the appropriate waitresource no matter if you are scanning a clustered index or a heap. If it’s a heap then %%lockres%% will return the waitresource for the RID lock. Conversely, if it is a clustered index being scanned then the waitresource will be for the key lock. (You can also use it to get the hashes for non clustered indexes by using an index hint. However, Simon is going to blog about this in a separate post.)
We could have queried the entire day’s processing and done a group by to see if this gave us anything but there was another way to filter out the data and I’d rather not do huge queries scanning a billion rows if I can help it. (Table was much larger than available RAM and so would have resulted in a huge amount of I/O).
By filtering our query for the day and also by the lock resource hashes that had given us this problem we could really cut into this dataset. As the Date field is the first key of the clustered index this meant a clustered index seek was used and so %%lockres%% returned the key hashes for our clustered index which was want we wanted.
select *,%%lockres%%
from dbo.Table(nolock)
where %%lockres%% IN('(0801a4a3d7f1)','(0801bf3924dc)')
and Date >= '2009-05-18'
Sure enough we had the same hash value appearing in different countries for the same data. Even though the data itself was unique SQL Server’s hashing algorithm was not able to achieve a unique result. When this occurs it is called a collision.
One clash though wouldn’t have been enough for a deadlock to occur. It would have slowed the process down as one process would have had to finish before the waiting process continued – a classic blocking scenario. We needed to have two. This is exactly what we got – how unlucky was that! Now normally you need to be at serious scale to hit this. Remember we had loaded happily to 900 million rows before we encountered this problem. With a 6 byte hash there are 281,474,976,710,656 permutations of hashes. EDIT : Please see Remus Rusanu’s blog for the statistics and probability on this clash. He’s written a great follow up post that did a far better job on explaining it!
However, once you have the values you can see the problem with only 4 rows of data.
Check this out.
CREATE TABLE #tmp
( [Date] DATETIME
,CountryID TINYINT
,GroupID SMALLINT
,CodeID INT
)
INSERT INTO #tmp
SELECT '20090519'
,2
,4271
,5835066
UNION ALL
SELECT
'20090519'
,2
,4619
,2546652
UNION ALL
SELECT
'20090519'
,3
,4245
,2651987
UNION ALL
SELECT
'20090519'
,3
,4657
,5744053;
CREATE UNIQUE CLUSTERED INDEX tmp_PK
ON #tmp
( [Date]
,CountryID
,GroupID
,CodeID
)
SELECT %%lockres%% as LockHash
,*
FROM #tmp
ORDER BY %%lockres%%
,CountryID
DROP TABLE #tmp
The results are there for all to see:
CountryID 2 could deadlock with CountryID 3!
If you want to get this to actually happen for yourself just follow the steps below
1) Run this script to create the table and the clustered index
CREATE TABLE LockResCheck
( [Date] DATETIME
,CountryID TINYINT
,GroupID SMALLINT
,CodeID INT
)
CREATE UNIQUE CLUSTERED INDEX LockResCheck_PK
ON LockResCheck
( [date]
,CountryID
,GroupID
,CodeID
)
2) Open two connections to the database in which you created LockResCheck
3) In Connection 1 Execute the following (N.B. Only Data in CountryID 3 is inserted in Connection 1)
BEGIN TRANSACTION
INSERT INTO LockResCheck
SELECT
'20090519'
,3
,4245
,2651987
4) In Connection 2 Execute the following (N.B. Only Data in CountryID 2 is inserted in Connection 2)
BEGIN TRAN
INSERT INTO LockResCheck
SELECT '20090519'
,2
,4271
,5835066
UNION ALL
SELECT
'20090519'
,2
,4619
,2546652
5) In Connection 1 Execute the following (N.B. Still Only Data from CountryID 3 is inserted in Connection 1)
INSERT INTO LockResCheck
SELECT
'20090519'
,3
,4657
,5744053;
Voila – Deadlock!!
Mitigation – If this happens to you what are your options
So we have seen the problem but what about the resolution. The truth is that when you get into big volumes and you have composite keys then there is no “complete” answer. However, you can take steps to make this much less likely.
Change your key to a surrogate integer based key
By having a unique integer based key then you can guarantee this won’t happen to you. However, if you need a BIGINT one must assume that this might start happening to you (but only if you are loading an obscene amount of data in parallel). Remember the hash value is 6 bytes and whilst an INT is 4 bytes a BIGINT is 8.
Additional Value into the Key
We could also add a value into the composite key to mix things up a bit in the hashing algorithm and add a big more diversity into the data. If you look at those initial results I don’t think I am really helping SQL Server in terms of unique values. Saying that values of this nature are very common at scale where intelligent keys are prevalent.
In the example below I have extended the original example by adding on an extra id field to the data. You will see from the results below that this has had the desired effect but remember it has worked for the four rows in question. There is no guarantee that this will eliminate all collisions.
CREATE TABLE #tmp
( [Date] DATETIME
,CountryID TINYINT
,GroupID SMALLINT
,CodeID INT
,ID INT
)
INSERT INTO #tmp
SELECT '20090519'
,2
,4271
,5835066
,1
UNION ALL
SELECT
'20090519'
,2
,4619
,2546652
,2
UNION ALL
SELECT
'20090519'
,3
,4245
,2651987
,3
UNION ALL
SELECT
'20090519'
,3
,4657
,5744053
,4;
CREATE UNIQUE CLUSTERED INDEX tmp_PK
ON #tmp
( [Date]
,CountryID
,GroupID
,CodeID
,ID
)
SELECT %%lockres%% as LockHash
,*
FROM #tmp
ORDER BY %%lockres%%
,CountryID
DROP TABLE #tmp
Partitioning
If partitioning is an option for you then this is possibly the one to go for.
It is worth remembering that partitioning is an “Enterprise Only” feature and so is not always an available option.
By partitioning the data you will get one hobt_ID per partition. SQL Server guarantees that locks in different partitions will never collide. As part of the process of taking a key lock SQL Server stores two pieces of information:
- 6 byte hobt_id
- 6 byte hash key value.
This ensures that the hash is scoped to the hobt_id to which it belongs. However, that wouldn’t necessarily mean you would be 100% out of the woods as you’d still have a chance of hitting this issue inside the scope of a hobt. Having said that, by keeping the partitions reasonably tight you would be greatly minimising your chances. Remember we had successfully loaded 900 million records before this problem surfaced!
To complete the series of examples and repro’s please find below an example that shows how partitioning can deliver unique lock resource hash values.
CREATE DATABASE LockResCheck
GO
Use LockResCheck
CREATE Partition Function tmp_PF1(int) as Range Right for values (1,2,3,4,5)
CREATE Partition Scheme tmp_PS1 AS PARTITION tmp_PF1 ALL TO ([PRIMARY])
CREATE TABLE tmp
( [Date] DATETIME
,CountryID TINYINT
,GroupID SMALLINT
,CodeID INT
,PartitionID INT
) ON tmp_PS1 (PartitionID)
INSERT INTO tmp
SELECT '20090519'
,2
,4271
,5835066
,2 %15
UNION ALL
SELECT
'20090519'
,2
,4619
,2546652
,2 %15
UNION ALL
SELECT
'20090519'
,3
,4245
,2651987
,3 %15
UNION ALL
SELECT
'20090519'
,3
,4657
,5744053
,3 %15;
CREATE UNIQUE CLUSTERED INDEX tmp_PK
ON tmp
( [Date]
,CountryID
,GroupID
,CodeID
,PartitionID
) on tmp_PS1(PartitionID)
SELECT *
FROM sys.partitions
WHERE [object_id] = OBJECT_ID('LockResCheck.dbo.tmp')
SELECT %%lockres%% as LockHash
,*
FROM tmp
ORDER BY %%lockres%%
,CountryID
Change your insert batch size
Our issue occurred because we have a single insert statement which is processed as a single transaction. If we changed this to use a batched approach the likelihood of having the collisions wouldn’t have been completely removed but we could have drastically reduced the likelihood of the collisions being locked at the same time.
In our scenario each process/procedure was inserting 5 million rows. This could have been split into x batches of 10,000 rows, each taking seconds to complete. The likelihood of deadlocks is small. In the event of rows actually deadlocking we could use SSIS to resubmit the failed data, using the error output of the OLEDB Destination. One limitation here though is that the MICS parameter is not able to be set via an expression so changing it would mean making a code, as opposed to a configuration, change. You would also need to implement either a recovery process or a retry process in the event of a partial load to back out the data or compensate for the failures.
Wrap up
At the end of the day what can I say about this issue. Whilst challenging and fun I think it’s fair to say that diagnosing this issue wasn’t easy. I’d like to see the following things improved in the next version.
- Make %%lockres%% a documented feature – without this functionality I don’t think it’d have been possible to diagnose this issue.
- Make the deadlock graph more intuitive. At least show the lock resource in the GUI – Ideally it’d show the key values.
- Extend Books online to cover deadlocking more comprehensively. Writing this post I have looked at the examples in there and they are based on custom code that isn’t part of the sample database AdventureWorks or AdventureWorks2008.
- Make this less likely to happen in the first place!
If you think that any of this is as important as I do then please head over to connect and vote for the feedback Item I have posted.
- %%LockRes%% a documented feature - https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=458076
- Deadlock graph more intuitive - https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=458080
- Improve BOL coverage of deadlocks - https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=458084
- Reduce likelihood of collisions - https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=458091
Finally
I think now would be an appropriate moment to say that I am co-authoring a SQL Server Internals book with Christian Bolton, Justin Langford & Brent Ozar with contributions from Cindy Gross and Jonathan Kehayias. You might say that this issue hit me at the perfect time as I am about to start work on a Locking and Latching Chapter. If you are interested I will be posting updates on my authoring adventures here and on twitter. Hash tag #NewSQLBook ID @jrowlandjones
Thanks again to Simon Sabin and Marcel van der Holst for their kind help resolving this issue and to Paul, Peter and Kevin for providing the quality control on the editorial!
Cheers, James
@jrowlandjones