I have a requirement to move data between two databases, but before sending all the data we want to compare the rows that are needed first. So the implementation calls for the generation of a hash of important (but not necessarily key) fields in one database and comparing them with another. Without going into the whys and wherefores, my .net app receives an XML document of the hashed values from the messaging system and needs to do the comparison against the current database. Also, being a more generic solution, I do not have the luxury of writing sprocs or using table valued parameters (which are typed) and need some efficient T-SQL that can be generated.
--Create and populate a table
DECLARE @TargetTable TABLE(
Id int,
Name nvarchar(50)
)
INSERT INTO @TargetTable VALUES(1,'Joe')
INSERT INTO @TargetTable VALUES(2,'Jack')
--This is the xml document to compare. The second hash is different
DECLARE @HashKeys XML
SET @HashKeys='<hashKeys>
<hashKey><id>1</id><hash>0x0807E6F86C0EAFADCA390091FDF6</hash></hashKey>
<hashKey><id>2</id><hash>0x5741920EB59DA289F36A868B50D6</hash></hashKey>
</hashKeys>'
--This is the single query that does the comparison
SELECT dest.Id,
src.HashKey
FROM @TargetTable dest
LEFT JOIN
(SELECT t.c.value('./id[1]','int') Id, t.c.value('./hash[1]','nvarchar(max)') HashKey
FROM @HashKeys.nodes('//hashKeys/hashKey') as t(c)) src ON dest.Id = src.Id
AND CONVERT(nvarchar,HASHBYTES('MD5',CONVERT(VARBINARY(MAX),dest.Id)+0x0+CONVERT(VARBINARY(MAX),dest.Name)),1) = src.HashKey
WHERE src.HashKey is null
-- Output
-- Id HashKey
-- ----------- ------------
-- 2 NULL
Some highlights of the above query
- The HashBytes function hashes a string concatenation of the fields I am interested in
- New in SQL 2008 is the binary style of convert. This allows the bytes returned by the hash function to be easily converted to ascii characters so that they can be compared with the string in the XML document
- @HashKeys.nodes and t.c.value('.. shreds the XML document using XQuery
- The LEFT JOIN gives me the rows on the left that I am interested in
Performance, design and other issues aside, you can't argue that T-SQL is not up to the task of being a good language for set based operations.
Update
After feedback from David Portas I converted the fields to binary instead of strings (HASHBYTES performs an implicit cast to VARBINARY so it would be more efficient to cast each value to VARBINARY rather than NVARCHAR - David). I also put in a delimeter of 0x0 in the concatenation (you need to delimit the concatenated values otherwise “AB”+”C” will get the same hash as “A”+”BC” - David).
Simon Munro