|
|
Resolution to SQL Server problems. From the front line.
-
Don't get me wrong I love SQL Server, I have been using it for many years in all its Windows favours...
SQL Server 4.21a, 6.0, 6.5, 7, 2000, 2005, 2008, 2008 R2 and 2012
If fact I trained under someone who used SQL Server 1 on OS/2 (there you go revealing my age)
However, there are number of things that get under my skin, and here are my top 3 favourites
1) Actual Execution Plan
Why is this only displayed once the query has finished executing and returned all the results. This is particularly annoying if you have a never ending query, and the reason you are displaying the Actual Execution Plan is to endeavor to find the reason for your poor performance in the first place! A classic Catch 222 situation.
I know you can display the Estimated Execution Plan, but this is only an estimation of what is going to happen not 100% what actually is happening or has happened. I don't know the internal mechanics of SQL Server enough, but there must be a point when the query engine has finished all its work and passing on its findings to the execution engine. Why can't the execution plan be displayed then!
2) Inconsistent (I believe) DDL Commands
While you can create a Table and an Index in another database. Syntax is
CREATE TABLE AnyDataBase.AnySchema.AnyTable
You can not create a View (or Procedure or Function) in another database!
CREATE VIEW AnySchema.AnyView CREATE PROCEDURE AnySchema.AnyProcedure CREATE FUNCTION AnySchema.AnyFunction
I can't see any reason for the limit on this restriction other than maybe working out the metadata across databases to ensure all objects bind correctly
Also while you always CREATE TABLE and DROP TABLE, CREATE VIEW DROP VIEW etc you
ALTER TABLE ADD COLUMN rather than ALTER TABLE CREATE COLUMN but you still ALTER TABLE DROP COLUMN
To be fair this is probably an ANSI Standard but it is inconsistent!
3) Inconsistent USE of the Command Termination ';' Semicolon Command BEGIN; PRINT 'Hello World'; END;
is OK! However if you run:
USE AdventureWorks2008R2; GO
BEGIN TRY -- Table does not exist; object name resolution -- error not caught. SELECT * FROM NonexistentTable; END TRY; BEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber ,ERROR_MESSAGE() AS ErrorMessage; END CATCH; Results in
Msg 102, Level 15, State 1, Line 6 Incorrect syntax near ';'. Msg 102, Level 15, State 1, Line 11 Incorrect syntax near 'CATCH'.
But
USE AdventureWorks2008R2; GO
BEGIN TRY; -- Table does not exist; object name resolution -- error not caught. SELECT * FROM NonexistentTable; END TRY BEGIN CATCH; SELECT ERROR_NUMBER() AS ErrorNumber ,ERROR_MESSAGE() AS ErrorMessage; END CATCH;
works OK!
If you read this posting here this gives some explanation. It is partly a BUG, but also partly by design. http://connect.microsoft.com/SQLServer/feedback/details/498085/msft-mso-end-try-throws-an-error-if-terminated-with-a-semicolon
But I just want consistency!
Another one is that CTE insist that the previous line is correctly terminated, but (AFAIK) NO other function, command does! And before someone says it been rolled out incrementally via newer functionality, the even newer MERGE statement does NOT insist that the Termination character is present at the end of the previous statement!
I maybe returning to this issue again in the future
|
-
The full error message is SQL Server Error 18452, "Login failed. The login is from an untrusted domain and cannot be used with Windows Authentication"
I have just got this error message again on a Windows 7 Workstation, that I am using SQL Server Management Studio on.
It reminded me that I wanted to blog about the resolution to this and also add how to solve this for Named Instances as well.
Firstly some background to the issue:
I have a Windows 7 Workstation that I use SQL Server Management Studio to connect to SQL Servers in both a Production (PROD) and TEST Domains. However, I log on to the Workstation using a Local Windows Account. I have to do this as the PROD and Test Domains are predominantly XP, and the Group Policy that was developed for XP is different to that used in Vista and Windows 7. Currently, we are not able to operate in a mixed mode domain as a result the current roaming profiles we have DO NOT WORK for Window 7. This is a major problem because it means whenever you log off you have to re-assign Outlook Exchange connections and you lose all your internet favourites etc.
So I have 3 Accounts all with the same Username
LOCAL\MyUser
PROD\MyUser
TEST\MyUser
I log onto my Windows WorkStation using LOCAL\MyUser and then want to connect seamlessly to both PROD and TEST SQL Servers using SQL Management Studio using Windows Authentication (Not SQL Authentication). I can either do one of three things:
1) I can attempt to use the Shift Key and then Right Click on SQL Management Studio and 'Run as Different User' and use either PROD or TEST Accounts
However, I cannot then intermix in SSMS Local SQL Server and different Domain SQL Servers - I am stuck with just those SQL Servers in the Domain I logged in on.....
2) I can attempt to use a Line Command as detailed by Jason Follas here:
http://www.jasonfollas.com/wiki/(X(1)S(zucey4jfr2sr4maocepgrn55))/Default.aspx?Page=SQL%20Management%20Studio%20on%20Vista&AspxAutoDetectCookieSupport=1
However, again I am limited to using just those SQL Servers in the one Domain, and secondly also I am prompted for the password each time as /savecred does not work with /netonly at the same time.
3) Finally I can attempt to use Windows Credential Manager to store the various Domain Credentials for me. However, when I did this (the first time) it did not work and I still had the SQL Server Error 18452 Message. I could not find out the answer until Google popped up the following link and Arsine's answer (thanks again Arsine!)
http://social.msdn.microsoft.com/Forums/hu-HU/sqlsecurity/thread/c05a90e4-cb16-46f6-9072-37083c65696d
I have to create a Windows Credential and add the SQL Server Port (1443) to the end of the Internet or Network Address in the form :<SQL_Port_Number>
So to seamlessly connect to SQL Servers in Any Domain from Windows 7 in SQL Management Studio you need to • Use Control Panel, User Accounts, Manage Your Credentials, Add a Windows Credential • Add all the SQL Servers you want by Fully Qualified Name but you must include the PORT Number as say :1433 • Example: • ANYSERVER.ANYCOM.TEST:1433 • Username: TEST\AnyUser • Password: TEST Password • Press Add Button Note: Each SQL Server will have to be added separately Note: If there are any Password changes then all references in Windows Credentials will need to be updated
This way you can control all SQL Servers in ALL Domains seamlessly from one instance of SQL Server Management Studio I would like to add two additional points to Arsine's excellent posting
A) I find that it is best to Fully Qualified Name the ServerName in Windows Credentials and SQL Management Studio rather than just the Servername . So use ANYSERVER.ANYCOM.TEST rather than just ANYSERVER
B) If you have a NAMED INSTANCE of SQL Server you must create multiple entries in Windows Credentials, one using the default SQL Server Port 1433 and one for each Port allocated for the Named Instance such as 2335 so use
ANYSERVER.ANYCOM.TEST:1433
as well as
ANYSERVER.ANYCOM.TEST:2335
Note: Of course this will have to be defined as ANYSERVER.ANYCOM.TEST\NamedInstanceName in SQL Management Studio
To find out the Port that SQL is running on (there are many methods) I suggest you read the link below by Colin Stasiuk which contains by far the most comprehensive solutions I can find. I didn't like hunting throug Registry Entries and looking at the SQL Server Logs was by far the quickest solution (noting the 'possible gotcha' detailed )
http://sqlserverpedia.com/blog/sql-server-bloggers/what-tcp-port-is-sql-server-running-under/
Finally I know I could have used SQL Server Authentication without a problem to achive the same thing but I did not want to go through this less secure method.
PS. To avoid having to use the Shift Key when Right-Clicking for 'Run As Different User" in Windows 7:
-
Just go to registry run regedit.exe, search for “runasuser”, then under it you’ll find empty value REG_SZ named “Extended”.
-
Delete every “Extended” (there should be around 5 of them
-
Result is that there is no need to press shift key anymore, the option will ALWAYS appear when you Right-Click
|
-
Version 2
In my previous post
http://consultingblogs.emc.com/paulmcmillan/archive/2012/01/03/find-the-total-lines-of-t-sql-code-in-a-sql-database.aspx
I said that the code was not perfect because it just counts the number of Line Feeds and you could have a number of lines which do not have any text and as such these will be added to the count line total. Therefore you need to eliminate any lines with just "whitespace" to get the "real" line count value from the actual line count. I said that perhaps I would have a go at doing this in the future...
Well I realised that this was quiet easy to do, by extending the logic
The logic of the code is to get the total length of text and also the total length of text with all the Line Feeds removed NCHAR(10). The difference minus 1 is the Number of Lines in the Text.
To eliminate all Whitespace lines (those with no code in) which could consist of any or all of the following
Spaces Characters NCHAR (32) Tab Characters NCHAR (9) Carriage Returns NCHAR (13) Corrupted Carriage Return and Line Feed Lines NCHAR (13) + NCHAR (10) + NCHAR (10) - I have seen instances of this I would need to remove these instances of 'Noise' from the text first and then find the total length after removing the Line Feeds from what remained after the noise was removed.
I have also added in the Schema Name to the Object Name as the initial version did not have this information and I noticed it was possible to skew the results if Objects had the same name but in different Schemas
The Version 2 of this code is below....
CREATE PROCEDURE [dbo].[esp_GetObjectNumberOfRows]
(
@DatabaseName SYSNAME = '',
@GroupingType NCHAR (1) = 'N', -- N = Object Name, T = Object Type, D = Database
@ExecuteSQL BIT = 1, -- Default Execute SQL On
@PrintSQL BIT = 0 -- Default Debug SQL Off
)
/*
Paul McMillan
EMC Consulting 2012
Version 1
My Project Manager asked me to find out 'How many lines of code have we in the System'. I could not
believe that no one has ever had to do this before so I googled 'sql server stored procedure number of lines' and found
the following posting from Gordon Bell (many thanks!) on Stack Overflow in November 2008. See link below
http://stackoverflow.com/questions/291574/query-to-list-sql-server-stored-procedures-along-with-lines-of-code-for-each-pro
However, the solution Gordon provided did not allow you to pass in a DatabaseName as a Parameter (and I had to find this
out for all the databases in a system) - I could have run it in each database individually but I thought I would take
the base code and allow a DatabaseName to be passed in as a parameter.
Furthermore, I thought I would update the code to use the latest SQL sys.objects and sys.sql_modules system tables
rather than older sysobjects ans syscomments system tables.
Additionally, I thought I would add a Grouping Parameter, so you can return either the
1) Total Lines of Code for each Stored Procedure and each Function for any Database. GroupingType = 'N' - Default
2) Total Lines of Code for all Stored Procedures and all Functions for any Database. GroupingType = 'T' - Default
3) Total Lines of Code for all Stored Procedures and Functions for any Database . GroupingType = 'D'
Finally as this is written using Dynamic SQL I have a Switches for Print SQL (Debug) and Execute SQL
Version 2
I said that the code was not perfect because it just counts the number if newlines and you could have a number of
lines which do not have any text and as such these will be added to the count line total. Therefore you need to
eliminate any lines with just "whitespace" to get the "real" line count value. I said that prehaps I would
have a go at doing this in the future...
Well I realised that this was quiet easy to do, by extending the logic
The logic of the code is to get the total length of text and also the total length of text with all the Line Feeds removed
NCHAR(10). The difference minus 1 is the Number of Lines in the Text.
To eliminate all Whitespace lines (those with no code in) which could consist of any or all of the following
Spaces Characters NCHAR (32)
Tab Characters NCHAR (9)
Carriage Returns NCHAR (13)
Corrupted Carriage Return and Line Feed Lines NCHAR (13) + NCHAR (10) + NCHAR (10) - I have seen instances of this
I would need to remove these instances of 'Noise' from the text first and then find the total length after removing the Line Feeds from what remained after the noise was removed.
I have also added in the Schema Name to the Object Name as the initial version did not have this information and I noticed
it was possible to skew the results if Objects had the same name but in different Schemas
*/
AS
BEGIN ;
IF @DatabaseName = ''
SET @DatabaseName = DB_NAME(); -- Default is current database
DECLARE @SQLMessage NVARCHAR (MAX);
-- Testing Code
--DECLARE @DatabaseName SYSNAME = 'master';
--DECLARE @GroupingType NCHAR (1) = 'T'; -- N = Object Name, T = Object Type, D = Database
SET @SQLMessage = '
SELECT
' + NCHAR (39) + @DatabaseName + NCHAR (39) + ' AS DatabaseName,
' +
CASE WHEN @GroupingType = 'N' THEN 'T.ObjectName, ' ELSE '' END +
CASE WHEN @GroupingType IN ('N', 'T') THEN '
CASE WHEN T.ObjectDescription = ''Stored Procedure''
THEN SUM(T.ProcedureCount)
ELSE SUM(T.FunctionCount)
END AS ObjectCount,' ELSE 'SUM(T.ProcedureCount) + SUM(T.FunctionCount) AS ObjectCount, ' END +
CASE WHEN @GroupingType IN ('N', 'T') THEN 'T.ObjectDescription, ' ELSE '' END + '
SUM(T.ActualLinesOfCode) - 1 AS TotalActualLinesOfCode,
SUM(T.RealLinesOfCode) - 1 AS TotalRealLinesOfCode
FROM
(
SELECT
CASE WHEN o.type = ''P'' THEN 1 ELSE 0 END AS ProcedureCount,
CASE WHEN o.type IN (''FN'', ''IF'', ''TF'') THEN 1 ELSE 0 END AS FunctionCount,
s.name + ''.'' + o.name AS ObjectName,
(LEN(C.definition) - LEN(REPLACE(C.definition, NCHAR(10), ''''))) AS ActualLinesOfCode,
(LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(C.definition, '' '', ''''), NCHAR(9), ''''), NCHAR(13), ''''), NCHAR(10) + NCHAR(10), NCHAR(10)), NCHAR(10) + NCHAR(10), NCHAR(10))) - LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(C.definition, '' '', ''''), NCHAR(9), ''''), NCHAR(13), ''''), NCHAR(10) + NCHAR(10), NCHAR(10)), NCHAR(10) + NCHAR(10), NCHAR(10)), NCHAR(10), ''''))) AS RealLinesOfCode,
CASE WHEN o.type = ''P'' THEN ''Stored Procedure'' -- STORED_PROCEDURE
WHEN o.type IN (''FN'', ''IF'', ''TF'') THEN ''Function'' -- SQL_SCALAR_FUNCTION, SQL_INLINE_TABLE_VALUED_FUNCTION, TABLE_FUNCTION
END AS ObjectDescription
FROM ' + @DatabaseName + '.sys.objects o
INNER JOIN ' + @DatabaseName + '.sys.sql_modules c
ON c.object_id = o.object_id
INNER JOIN ' + @DatabaseName + '.sys.schemas s
ON o.schema_id = s.schema_id
WHERE o.type in (''P'', ''FN'', ''IF'', ''TF'')
) AS T
' +
CASE WHEN @GroupingType IN ('N', 'T') THEN 'GROUP BY ' ELSE '' END +
CASE WHEN @GroupingType = 'N' THEN 'T.ObjectName, ' ELSE '' END +
CASE WHEN @GroupingType IN ('N', 'T') THEN 'T.ObjectDescription' ELSE '' END + '
ORDER BY 1' + CASE WHEN @GroupingType IN ('N', 'T') THEN ', T.ObjectDescription DESC' ELSE '' END
+ CASE WHEN @GroupingType IN ('N') THEN ', T.ObjectName ASC' ELSE '' END + ';';
IF @PrintSQL = 1
PRINT @SQLMessage;
IF @ExecuteSQL = 1
EXEC (@SQLMessage);
END ;
|
-
My Project Manager asked me to find out 'How many lines of code have we in the System'. I could not believe that no one has ever had to do this before so I googled 'sql server stored procedure number of lines' and found the following posting from Gordon Bell (many thanks!) on Stack Overflow in November 2008. See link below
http://stackoverflow.com/questions/291574/query-to-list-sql-server-stored-procedures-along-with-lines-of-code-for-each-pro
However, the solution Gordon provided did not allow you to pass in a DatabaseName as a Parameter (and I had to find this out for all the databases in a system) - I could have run it in each database individually but I thought I would take the base code and allow a DatabaseName to be passed in as a parameter.
Furthermore, I thought I would update the code to use the latest SQL sys.objects and sys.sql_modules system tables rather than older sysobjects ans syscomments system tables.
Additionally, I thought I would add a Grouping Parameter, so you can return either the
1) Total Lines of Code for each Stored Procedure and each Function for any Database. GroupingType = 'N' - Default 2) Total Lines of Code for all Stored Procedures and all Functions for any Database. GroupingType = 'T' - Default 3) Total Lines of Code for all Stored Procedures and Functions for any Database . GroupingType = 'D'
Finally as this is written using Dynamic SQL I have a Switches for Print SQL (Debug) and Execute SQL
Note: The code is not perfect because it counts the number if newlines and you could have a number of these with no text on them and these will be added to the count line total. Therefore you need to eliminate any lines with just whitespace from them to get the "real" line count value. Perhaps I will have a go at doing this in the future.
The code to run this is below
and it is called via
EXEC dbo.esp_GetObjectNumberOfRows
@DatabaseName = 'master',
@GroupingType = 'N',
@ExecuteSQL = 1,
@PrintSQL = 0;
CREATE PROCEDURE dbo.esp_GetObjectNumberOfRows
(
@DatabaseName SYSNAME = '',
@GroupingType NCHAR (1) = 'N', -- N = Object Name, T = Object Type, D = Database
@ExecuteSQL BIT = 1, -- Default Execute SQL On
@PrintSQL BIT = 0 -- Default Debug SQL Off
)
/*
Paul McMillan
EMC Consulting 2012
My Project Manager asked me to find out 'How many lines of code have we in the System'. I could not
believe that no one has ever had to do this before so I googled 'sql server stored procedure number of lines' and found
the following posting from Gordon Bell (many thanks!) on Stack Overflow in November 2008. See link below
http://stackoverflow.com/questions/291574/query-to-list-sql-server-stored-procedures-along-with-lines-of-code-for-each-pro
However, the solution Gordon provided did not allow you to pass in a DatabaseName as a Parameter (and I had to find this
out for all the databases in a system) - I could have run it in each database individually but I thought I would take
the base code and allow a DatabaseName to be passed in as a parameter.
Furthermore, I thought I would update the code to use the latest SQL sys.objects and sys.sql_modules system tables
rather than older sysobjects ans syscomments system tables.
Additionally, I thought I would add a Grouping Parameter, so you can return either the
1) Total Lines of Code for each Stored Procedure and each Function for any Database. GroupingType = 'N' - Default
2) Total Lines of Code for all Stored Procedures and all Functions for any Database. GroupingType = 'T' - Default
3) Total Lines of Code for all Stored Procedures and Functions for any Database . GroupingType = 'D'
Finally as this is written using Dynamic SQL I have a Switches for Print SQL (Debug) and Execute SQL
*/
AS
BEGIN ;
IF @DatabaseName = ''
SET @DatabaseName = DB_NAME(); -- Default is current database
DECLARE @SQLMessage NVARCHAR (MAX);
-- Testing Code
--DECLARE @DatabaseName SYSNAME = 'master';
--DECLARE @GroupingType NCHAR (1) = 'T'; -- N = Object Name, T = Object Type, D = Database
SET @SQLMessage = '
SELECT
' + NCHAR (39) + @DatabaseName + NCHAR (39) + ' AS DatabaseName,
' +
CASE WHEN @GroupingType = 'N' THEN 'T.ObjectName, ' ELSE '' END +
CASE WHEN @GroupingType IN ('N', 'T') THEN '
CASE WHEN T.ObjectDescription = ''Stored Procedure''
THEN SUM(T.ProcedureCount)
ELSE SUM(T.FunctionCount)
END AS ObjectCount,' ELSE 'SUM(T.ProcedureCount) + SUM(T.FunctionCount) AS ObjectCount, ' END +
CASE WHEN @GroupingType IN ('N', 'T') THEN 'T.ObjectDescription, ' ELSE '' END + '
SUM(T.LinesOfCode) - 1 AS TotalLinesOfCode
FROM
(
SELECT
CASE WHEN o.type = ''P'' THEN 1 ELSE 0 END AS ProcedureCount,
CASE WHEN o.type IN (''FN'', ''IF'', ''TF'') THEN 1 ELSE 0 END AS FunctionCount,
o.name AS ObjectName,
(LEN(C.definition) - LEN(REPLACE(C.definition, NCHAR(10), ''''))) AS LinesOfCode,
CASE WHEN o.type = ''P'' THEN ''Stored Procedure'' -- STORED_PROCEDURE
WHEN o.type IN (''FN'', ''IF'', ''TF'') THEN ''Function'' -- SQL_SCALAR_FUNCTION, SQL_INLINE_TABLE_VALUED_FUNCTION, TABLE_FUNCTION
END AS ObjectDescription
FROM ' + @DatabaseName + '.sys.objects o
INNER JOIN ' + @DatabaseName + '.sys.sql_modules c
ON c.object_id = o.object_id
WHERE o.type in (''P'', ''FN'', ''IF'', ''TF'')
) AS T
' +
CASE WHEN @GroupingType IN ('N', 'T') THEN 'GROUP BY ' ELSE '' END +
CASE WHEN @GroupingType = 'N' THEN 'T.ObjectName, ' ELSE '' END +
CASE WHEN @GroupingType IN ('N', 'T') THEN 'T.ObjectDescription' ELSE '' END + '
ORDER BY 1' + CASE WHEN @GroupingType IN ('N', 'T') THEN ', T.ObjectDescription DESC' ELSE '' END
+ CASE WHEN @GroupingType IN ('N') THEN ', T.ObjectName ASC' ELSE '' END + ';';
IF @PrintSQL = 1
PRINT @SQLMessage;
IF @ExecuteSQL = 1
EXEC (@SQLMessage);
END ;
|
-
Previously, I have written a blog post in which a colleague alerted me to a technique for replacing a OR on a INNER JOIN with a UNION ALL - Which solved a performance issue we had with a SQL Server stored procedure http://consultingblogs.emc.com/paulmcmillan/archive/2011/08/01/sql-performance-improvements-talk-to-your-colleagues.aspx Gareth also alerted me to the possibilities of using a SQL Server Filtered Index to further improve the code performance , as we could put an Filtered Index on the TABLE_B_FK_1_ID and TABLE_B_FK_2_ID columns to enable the query engine to evaluate the condition from the Index rather than the base table. http://technet.microsoft.com/en-us/library/cc280372.aspx CREATE NONCLUSTERED INDEX TABLE_B_IX_1 ON dbo.TABLE_B (TABLE_B_FK_1_ID) WHERE TABLE_B_FK_1_ID IS NOT NULL;
CREATE NONCLUSTERED INDEX TABLE_B_IX_2 ON dbo.TABLE_B (TABLE_B_FK_2_ID) WHERE TABLE_B_FK_2_ID IS NOT NULL;
In this case I did not have time to try this concept out but I will store this technique in my head for future use... Especially as I note you can have INCLUDE columns in the Filtered Index.. and this in a neat way of avoiding Key Look-ups However, the fact that I do now know about them, does illustrate the point about using 'accrued knowledge' to improve performance.. As soon as Gareth had mentioned that a OR could be replaced with a UNION ALL and we proved this could work... my thoughts immediately translated to another issue I was having. Originally, the client wanted to roll up a complex business rule in real time. So that as soon as query was made (in this case on Persons record') the result of a complex data query would be returned to the user. However, this data query was so complex that it could not be resolved in satisfactory time to be undertaken in real time. In the end, the design evolved to wrap the Business rule in Summary State table that was maintained via a Job that ran every minute, which meant that because of the time lag and the fact that the data was not closely coupled with underlying transactions... that there was a risk that it was not 100% accurate all the time. I looked at the code and a number of technique, single code path execution, materialised views, one case of functional reduction identified by an expert user (to eliminate a un-necessary join) and using OUTER APPLY - I managed to get the complex rule down to time where it could be moved from a Job into a Trigger. Thus the Summary State table became more closely coupled to the underlying transaction data. The only downside was that for a small number of users, on a small number of updates the time to commit data would be increased, but at least the data was 100% accurate for everyone.. However, the trouble with the Complex Business rule was that it involved a LEFT SELF JOIN to a table and that was on a OR condition join and whatever we tried it still spent 70 to 80% of the code duration in this part of the querty.. By the way you may know that Materialised Views do not allow LEFT JOINS but you may not also realise that they do not allow SELF JOINS either. Anyway, as soon as Gareth mentioned his get around for replacing an OR with a UNION ALL - I immediately realised we could use the same technique on this code block.. We tried it and it worked we now have the duration of complex business rule down to the point where it could be considered an option to calculate it back in Real time and remove the Summary State table completely.. What has happened is I used knowledge accrued on X and realised that it could be applied it Y.....
|
-
Last week I was looking at a performance issue with a piece of code that had never been identified as having an issue before. In these cases you say 'whats changed' well the answer in this case was nothing with the code.. But a lot with the data we were running through it.. Basically because we were running a one-off data clean up exercise though the code we were processing thousands of rows per run rather than tens or rows all this in SQL Server 2008 Anyway, because we are highly trained IT professionals we tried the data fix in our test lab first, and that's where we found that the code while finishing OK was hanging at one step.. The actual step was found using the code that Merrick Chaffer has made available here: http://merrickchaffer.blogspot.com/2011/07/currently-executing-sql-view.html Anyway, it was a basically an Insert into a temporary table which had an OR in the join logic for one table.. I will not use the actual code but the issue can be illustrated with the example below INSERT INTO #Temp_Update ( TABLE_B_ID, TABLE_B_Column ) SELECT B.TABLE_B_ID, B.TABLE_B_Column FROM dbo.TABLE_A A INNER JOIN dbo.TABLE_C B ON A.TABLE_A_ID = B.TABLE_A_FK_ID INNER JOIN dbo.TABLE_C C ON ( B.TABLE_B_ID = C.TABLE_B_FK_1_ID OR B.TABLE_B_ID = C.TABLE_B_FK_2_ID ) WHERE A.TABLE_A_Column = 1 AND ( C.TABLE_B_FK_1_ID IS NOT NULL OR C.TABLE_B_FK_2_ID IS NOT NULL ); On its own with no much data throughput on driver TABLE_A when the join was very quick.. When we had thousands on the driver TABLE A the querty took an age to resolve even with ZERO rows to return. Bascially because the Query Parser could not resolve the OR statement well enough it was generating a complex execution plan with very poor execution time.. I was chewing the cud with this with a colleague Gareth Keen, and he said.. I have seen an article somewhere where an complex OR join was replaced with a UNION ALL statement... "but I don't think you can use it in this case'.. I said I maybe, because all I am actually doing at a lower level is testing that the data 'exists and I can replace the last join with an EXISTS LEFT SEMI JOIN (which is can be faster anyway). To see what this would look like I have converted the above.. INSERT INTO #Temp_Update ( TABLE_B_ID, TABLE_B_Column ) SELECT B.TABLE_B_ID, B.TABLE_B_Column FROM dbo.TABLE_A A INNER JOIN dbo.TABLE_B B ON A.TABLE_A_ID = B.TABLE_A_FK_ID WHERE EXISTS ( SELECT 1 FROM dbo.TABLE_C C WHERE B.TABLE_B_ID = C.TABLE_B_FK_1_ID AND C.TABLE_B_FK_1_ID IS NOT NULL UNION ALL SELECT 1 FROM dbo.TABLE_C C WHERE B.TABLE_B_ID = C.TABLE_B_FK_2_ID AND C.TABLE_B_FK_2_ID IS NOT NULL ) AS C The result was that the code stormed through this part and we solved the bottleneck.. Thanks Gareth, for dragging this out of your memory and I do note both your comments on this technique 1) "We really need to test it first for a number of conditions because I am sure the article said it may not work in all cases and we may be making things much worse'. - We did test it for different conditions 2) Always put the condition that is more likely to success the top part of your UNION ALL statement, because as we are using a EXISTS SEMI-JOIN the database execution will stop as soon as we find a match.. We want this to happen as quickly as possible so make sure that code block where the strongest chance of a match is placed first. To see a blog entry on how to replace a UNION ALL with an OR see the link below - I have no idea if this was the article Gareth remembered, but it shows you how to do it.. http://www.databasechannel.com/AccessArticles/Article_ORAND_UseUnionInsteadofOR.html So in this case talking to my Colleague has shown me a new way of doing something.. A collective knowledge base is always greater than a single persons... BTW I will be returning to this technique shortly
|
-
Once upon a time, I was sent to publishing company who published many titles, who were having some performance issues with an application that they used to maintain subscription information. This was a form in 'in house' Customer Relationship Management (CRM) System that had a Front End application that talked to a back end SQL Server database.. Anyway, (apparently) there was some 'unspecified' performance issue and they got in an external consultancy to have a look and write a report... Anyway, I got the gig and I rocked up to the office where the client was based. I met the senior management team who were keen for me to get stuck in with a code review etc.. I however, wanted to find out what the main pain point was... They said well, it is when the Customer rings up to talk to us about their subscriptions etc.. There was basically a call centre in the office and there was a small number of staff who would take the customers call and try and help them... But... it was talking ages for them to search and find the actual Customer records.. Oh I said.... have you a copy of the data model (or can I see the database) and can I talk to the end users.. "Why do you want to talk to the users they said'.. well I just want to see how they use the system... Anyway, they said OK and sent me to see a lady on the next floor.., I asked her what happens when a customer calls.. She says.." oh we ask them for their unique client number" and "how many of them know that" I said not many, in fact very rarely do they know that - Ermmnnn.. I said fully knowing I may know by Bank Account number or even my credit card number of pat.. But I would never know my client reference number for a magazine.. Unless I had the magazine covering wrapper handy... Oh I said, what do you do when they don't know their number... "we usually ask them for their name and maybe their address and try and find them from that" she replied.... (of course they did everyone knows their own name and address)... thanks for your time I said.. I had one more look at the database/data model and went to see the management team.. I think I have solved you main problem I said... Your database has been structured to assume that the key search is you Client Reference Number, where actually the main search is by Name.. On your Customer table you have a Clustered Index on the Client Reference Number, and a non clustered index on the Surname. But you almost always search on the latter rather than the former... Can I suggest we change the Clustered Index on the Customer table (explaining that the data would now be sorted on this column now so all the Browns and Smiths were together rather than spread out across the table) and evaluate the impact.. We tried it later in the morning on a test system and it seemed to have worked OK with no degradation elsewhere.. The senior management then changed the live system in the afternoon and the main performance issue went away completely they were finding people in sub-seconds rather than the seconds it was taking before and the client was very happy.. So the moral of this story is always talk to the real users.. whatever you think may be happening, however you believe as system is working, however it is designed to work one way.. in real life it may be something completely different..
|
-
This is a
post (well a series of posts) I have been thinking about for some time.
Basically, you can add Indexes, Covering Indexes. Materialised Views, filtered
Indexes... Partitioning etc.. to your heart’s content... But sometimes it just
helps if you know your Business logic....
I once
worked on the Data Warehouse system for one of the major uk supermarkets. In
this case while the client sold both non-food and food products the food data
was dealt with by its own Data Warehouse system. We had a batch process to
aggregate data at the Product Store Day level. In other words how much of each
Product was sold at each Store on each Day. We also had to work out margin
based on a Cost and Selling Prices which changed according to type of Branch
(differential pricing).
We also had
to deal with the concept of VAT (Value Added Tax) a European Wide form of
Purchase Tax. However, in the UK (unlike many countries has the concept of Zero
rating most food). I say most most because some foodstuffs have VAT applied
(most famously Chocolate Covered Biscuits 'aka the famous Jaffa Cake case),
some have Zero Rated VAT (as I say most foods, but surprising Cakes) - and some
Foods have VAT applied to some of the Product but not all... The example I was
given was that if you but a packet of tea in a box, there is not VAT to pay..
However, if you but the same amount of Tea in a Tin, you have to pay VAT on the
cost of the Tin (i.e. a proportion). I have actually now read the VAT Rules
from the Inland Revenue as part of my research for this post. Available here:
http://customs.hmrc.gov.uk/channelsPortalWebApp/channelsPortalWebApp.portal?_nfpb=true&_pageLabel=pageLibrary_ShowContent&id=HMCE_CL_000118&propertyType=document
See Section 6. Mixed supplies: mixtures, promotional linked items and
packaging
By the way, having
just re-read it I actually think the Tea Tin example does not work - Because
the apportion bit has to more than the more than a certain percentage or cost
(say £1) to kick in... But it will still work for the business concept I am
attempting to describe.
BTW to see
why Cakes (and Jaffa Cakes in particular) do not have VAT applied, but
Chocolate Covered Biscuits do see here:
http://en.wikipedia.org/wiki/Jaffa_Cakes
Anyway, back
to my story...
So to
process VAT we have effectively to store 2 pieces of information. A flag
(Yes/No) to say if VAT is applicable to the Product and a numeric field to hold
the VAT Percentage applicable for those VATable items (100%for Fully VAT, say
10% for our Tea in Tin example or NULL or Zero (Not applicable or Zero).
Note: I accept we could technically hold all this
information in 1 numeric field (100 = YES, 0 = NO and a Percentage in between),
but this is counter intuitive.
Anyway, each
night we had the VAT Process running (via SQL Server Stored Procedures), which
took all our sales for the day (we are talking a couple of million records a
day here) and calculated the VAT figures... The VAT process was taking a significant
portion of our overnight window, and I was asked to look at it to see if I
could speed up. This area had been written before I had been on the project, so
they probably considered a fresh pair of eyes could help. The first concern I
had with the code is that I noticed it was updating the data. Everyone knows
that in SQL Server (as with other RDBMS) that an Update statement is costly. As
it involves both a physical Delete and an Insert, so immediately you have twice
the log file workload. However, while I realised that while I could have
probably re-written an update as an Insert into a new table… what the real
problem was that it was doing to many updates..
The VAT Flag and Percentage Apportion Field for the ProductStoreDay had
been initialised to assume that VAT was applicable (1 and 100%) but we checked
our reference product data we then toggled the field to be either (Off – Set ***
to be Zero, or set the Apportion field).
But as I
said at the top of his article in the UK – MOST
Food is Zero rated for VAT. So what the batch process was doing was updating
around 1.8 Million records setting a Flag to be 0 (from 1) and a field to be
NULL (from 100%) and leaving 200,000 records unchanged.
I realised
that if I actually initiated the data to so that the Flag to be 0 and the
field to be NULL, and only updating those records which had to have VAT
applied, I would cut 80% from the VAT Batch Run… and that is what I did.. It is
quicker to update 200,000 records rather than 1.8 million…
I used my Business
Knowledge (the fact that I knew most Food is Zero rated for VAT) to change the
logic of code and cause a 5 fold increase in performance… and not an Index
change in sight….
|
-
After getting 100,000 rows of Data using the 4 main flavours of Insert Statement I attempted to determine which method was the fastest.. To ensure that the results were not skewed by database re-sizing actions or cached data. I ran each query multiple times and used the following command to clear the CACHE
CHECKPOINT
DBCC FREEPROCCACHE -- to clear the procedure cache
DBCC DROPCLEANBUFFERS -- to test queries with a cold buffer cache
I said 'attempted' because I found out that with Method 3 Insert with Select Values Clause and using 100,000 rows on my laptop I could not get the query to complete - It just 'hung' (I gave up after 40 minutes) I assume because it needed to materialise the 100,000 rows in memory before it could attempt to Insert them..
I decided that to make it a test whereby at least I got a result - I therefore limited the number of rows to 1000, this also mean't that it was within the limits of the Insert Statement with Row Constructor method. I put in a Statistics statements and executed the code a number of times.
The average results (10 runs) were as follows:
Method 1 Classic Insert:
-- 274 ms
Method 2 Insert with row constructor:
-- 1413 ms
Method 3 Insert with Select Values Clause:
-- 1540 ms
Method 4 Merge Statement:
-- 1574 ms
As can be seen the fastest method (reduction of at least 80%) is the original Classic Insert Statement this was really surprisng to me, I wondered why - One obvious change is the locking involved. The Classic Insert Statement was taking Exclusive (X) on Index Key Locks and Intent Exclusive (IX) Locks on Pages and Tables while the other types seem to be taking Schema Stability (Sch-S) Locks on Table and Metadata and Shared (S) Database Locks. Probably because they ALL have to materialise the Insert data in some form before they marshall it to the Insert statement...
Conclusions
1) Many ways to Insert Data in SQL Server
2) Easiest to write probably Insert with Row Constructor - Though this is limited to 1000 rows per batch
3) Best for maintainability and control is the MERGE statement
4) Fastest method is Classic Insert - and Insert with Select Values Clause will probably never finish with a very large number of rows
See also
http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/how-many-different-ways-can-you-write-a-sql-insert-statement-1-2-3-6.aspx
http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/so-now-you-can-write-an-insert-statement-6-different-ways-why-do-you-need-to-know-this-and-why-is-merge-the-best.aspx
http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/a-gotcha-with-the-insert-statement-row-constructor.aspx
|
-
When I worked out there were at least 6 methods of Inserting data into a Table, I wanted to find out which was the fastest method... I will reveal the surprising fact in another posting. But I thought I would test the performance of each method with a large amount of data.. The first problem I had was how do I create a large number of Insert statements in many different forms? - Well writing Dynamic SQL seemed to be the best option, but to do this I still needed to create (I decided 100000) different rows from which I could build up my SQL Statement.
I decided that I needed to create a TALLY table (sometimes called a Numbers table) with 100000 different values in it.. I knew the fastest method do this was via a CROSS JOIN in a CTE - (I think it was in a Blog post by Jamie Thomson - But I can't find it now) so I googled CTE and Tally and came up with this example here..
http://beyondrelational.com/blogs/ozrenkrznaric/archive/2011/06/21/tally-table.aspx
I must admit I thought ot a bit hard to read, even though I knew what it was supposed to do, so I rewrote it slightly, using a 'units' based aliasing system to be understood a bit easier. The code for this is below
IF OBJECT_ID('dbo.Tally') IS NOT NULL
DROP TABLE dbo.Tally;
CREATE TABLE dbo.Tally
(
ID INT NOT NULL,
PRIMARY KEY CLUSTERED
(
ID
)
);
; WITH Ones (N)
AS
(
SELECT 1 AS N
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
)
, Tens (N)
AS
(
SELECT 1
FROM Ones A
CROSS JOIN Ones B
)
, Hundreds (N)
AS
(
SELECT 1
FROM Tens A
CROSS JOIN Tens B
)
, Thousands (N)
AS
(
SELECT 1
FROM Hundreds A
CROSS JOIN Tens B
)
INSERT INTO dbo.Tally (ID)
SELECT TOP 100000
N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM Thousands;
You can extend this concept to the millions and billions of rows if you so wish.. As an example of the speed it took just over 2 seconds on my laptop to generate the data..
I could then use the TALLY table to create my various different Insert Statements
e.g.
SELECT 'INSERT INTO dbo.InsertTest1 (ID, TextString) VALUES (' + CAST(ID AS NVARCHAR) + ' , ' + CHAR(39) + 'Value: ' + + RIGHT(CAST((1000000 + ID) AS NVARCHAR (7)), 6) + + CHAR(39) + ');'
FROM dbo.Tally;
SELECT 'INSERT INTO dbo.InsertTest2 (ID, TextString) VALUES'
UNION ALL
SELECT '(' + CAST(ID AS NVARCHAR) + ' , ' + CHAR(39) + 'Value: ' + + RIGHT(CAST((1000000 + ID) AS NVARCHAR (7)), 6) + CHAR(39) + ')' + CASE WHEN ID <> 100000 THEN ',' ELSE '' END
FROM dbo.Tally
UNION ALL
SELECT ';';
SELECT 'INSERT INTO dbo.InsertTest3 (ID, TextString)
SELECT T.ID, T.TextString
FROM
(
VALUES'
UNION ALL
SELECT '(' + CAST(ID AS NVARCHAR) + ' , ' + CHAR(39) + 'Value: ' + + RIGHT(CAST((1000000 + ID) AS NVARCHAR (7)), 6) + CHAR(39) + ')' + CASE WHEN ID <> 100000 THEN ',' ELSE '' END
FROM dbo.Tally
UNION ALL
SELECT ') AS T (ID, TextString);';
OK Then what is the GOTCHA ?
Well I on the INSERT Statement Row Constructor I got the following error message
Msg 10738, Level 15, State 1, Line 1021
The number of row value expressions in the INSERT statement exceeds the maximum allowed number of 1000 row values.
I had not appreciated that there is a LIMIT (of 1000) to the number of lines you can add to a single INSERT Statement when using the row (also known as table) constructor method...
So to insert 100,000 rows I would need 100 batches of 1000 rows using this method....
See also
http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/how-many-different-ways-can-you-write-a-sql-insert-statement-1-2-3-6.aspx
http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/so-now-you-can-write-an-insert-statement-6-different-ways-why-do-you-need-to-know-this-and-why-is-merge-the-best.aspx
http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/so-you-may-have-many-different-ways-of-inserting-data-but-which-is-the-fastest.aspx
|
|
|
|