Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

SS: Calculate running totals

SQL Server guru Adam Machanic posted a really interesting post on his blog yesterday in regard to calculating running totals in SQL Server. I'm sure Adam won't mind me saying he seemed a little embarrassed that his own logic had been outperformed by that most resource voracious of SQL Server constructs - a cursor.

Adam's own set-based solution had to be cancelled at 5 minutes whilst the cursor based solution executed in 14 seconds. I actually ran the same on my machine and it took 8.5 seconds (yes, I cleared the caches first). Not bad at all for a cursor over 100000+ rows.

I reckon I've come up with an even quicker solution though and Adam will be delighted to know that there isn't a cursor in sight. Here's my code:

DBCC DROPCLEANBUFFERS -- Clears the data cache
DBCC
FREEPROCCACHE -- Clears the procedure cache
GO

declare @start datetime
set
@start = getdate()
 
 
declare @runningtotal money
set
@runningtotal=0.0


DECLARE @Results TABLE
(
    TransactionID INT NOT NULL PRIMARY KEY,
    ActualCost MONEY,
    RunningTotal MONEY
)


insert into @Results (TransactionID, ActualCost)
select TransactionID, ActualCost
from Production.TransactionHistory
order by TransactionID

update @Results
set @runningtotal = RunningTotal = @runningtotal + ActualCost

select *
from @Results
ORDER BY TransactionID

print 'time taken = ' + cast(datediff(ms, @start, getdate()) as varchar(20))

 

As you can probably see, the real power here is in the UPDATE statement which updates @runningtotal at the same time as updating the table. Its a little-known feature of SQL Server but one that can be very powerful. In my tests this executed in 4 seconds, less than half the time of cursor solution.

5 minutes and counting down to 4seconds. Very handy indeed!

-Jamie

P.S. You'll need the AdventureWorks sample DB installed somewhere in order to make it work!

 

Published Tuesday, February 28, 2006 1:36 PM by jamie.thomson
Filed under:

Comments

 

Adam Machanic said:

Hi Jamie,

Thanks for the followup. That UPDATE syntax can be very powerful, but the problem is that you're relying on it to process the data based on the order of the clustered index on the temporary table. Unfortunately, the order in which it visits rows is not documented--and therefore, is unsupported--and may not be deterministic. I'm fairly sure that in this case it will always work, but I'm not 100% sure. For instance, what happens if the update goes parallel (can the update go parallel in this case? I'm not sure of that, either.) I would be wary of using this in a production app due to the risk of incorrect data in some of these edge cases...
February 28, 2006 3:44 PM
 

Sal Terillo said:

Jamie,

The first time I ran your query it took 45 seconds to complete. Subsequent runs took 5 seconds. I toggled my SQL Server service and I was back to 45 seconds for the initial run. I definitely included the DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE in all the tests.

Sal Terillo
February 28, 2006 9:01 PM
 

adam machanic :: data manipulation for fun and profit said:

Back again!  Fourth post for the month of February, making this my best
posting month in, well,...
March 1, 2006 3:08 AM
 

adam machanic :: sql server programming said:

Back again!  Fourth post for the month of February, making this my best posting month in, well,...
March 1, 2006 3:08 AM
 

Jamie Thomson - Life, the universe and SSIS! said:

You may have read my post yeterday talking about a method of calculating running totals that was based...
March 1, 2006 8:19 AM
 

Bruno Z said:

Just a comment :
Some SQLDBs support UPDATE ... ORDER BY ..., which guaranties that the update will run in the proper order.
March 10, 2006 4:01 PM
 

SimonS' SQL Server Stuff said:

If you've been following the trillogy (Adams posts 1,2 Jamies post ) of attempts at getting a running...
March 12, 2006 10:53 PM
 

Paul McMillan said:

BTW - I had understood that the undocumented UPDATE statement was added to SQL Server to allow it to speed up some of the processing in the TPC benchmarking tests... Can anyone confirm or deny this?... Or knows the TPC tests one way or another to say "ah yes it would speed up the results"... Anyway.. I would agrue if it IS in their to support TPC tests it should not be a undocumented feature....
March 14, 2006 12:24 PM
 

Mirko Marovic said:

Hello,

I discovered this discussion today. I wrote an article (http://www.sql-server-performance.com/mm_cursor_friendly_problem.asp) about the running total in sql problem and a few possible solutions including one posted here. On sql-server-performance.com there was small discussion if even solutions I offered guaranties update order. I'm pretty convinced it does and I would like you people to take a look and let me know your opinion.
April 21, 2006 6:14 PM
 

SimonS' SQL Server Stuff said:

If you've been following the trillogy (Adams posts 1,2 Jamies post ) of attempts at getting a running...
May 23, 2006 10:16 AM
 

Adam Machanic said:

Originally posted here.

Back again!  Fourth post for the month of February, making this my best...
July 13, 2006 5:53 AM
 

Adam Machanic said:

Originally posted here . Back again! Fourth post for the month of February, making this my best posting

January 8, 2007 7:26 PM
 

SimonS SQL Server Stuff said:

If you've been following the trillogy (Adams posts 1 , 2 Jamies post ) of attempts at getting a running

August 12, 2007 12:54 AM
 

GRM said:

I have tried to come up with a solution that didn't require a cursor to calculate a running total,the following solution ran in 57 secs on my machine and does not use a cursor.

I have used the Production.TransactionHistory table in the AdventureWorks database.

Declare @iCounter int

Declare @i int

Declare @mintrans int, @TransId int, @Count1 int

Set @i = 1

Select @iCounter = Count(*) From production.TransactionHistory

Select @mintrans = Min(transactionId) From production.TransactionHistory

While @i <= @iCounter

Begin

Select @transId = TransactionId From production.TransactionHistory  where Transactionid = @mintrans

Update th1

Set th1.RT = ISNULL(th1.ActualCost + th2.RT, th1.ActualCost)

From production.TransactionHistory th1

Join production.TransactionHistory th2 on th2.TransactionId = th1.TransactionId - 1

Where th1.TransactionId = @transId

Set @mintrans = @mintrans + 1

Set @i = @i+1

End

Select th1.TransactionId, th1.ActualCost, ISNULL(th1.ActualCost + th2.RT, th1.ActualCost) AS RT

From production.TransactionHistory th1

Join production.TransactionHistory th2 on th2.TransactionId = th1.TransactionId - 1

There are improvements that can be made to this - but this was my first attempt at trying to do this without a cursor, even though a look is used to perform the update statement, which is the first stage of this task.

November 27, 2007 4:38 PM
 

GRM said:

Forgot to mention, that you will have to add a new column to the table, i have called it RT and it's type is Money.

November 27, 2007 11:05 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems