Welcome to EMC Consulting Blogs Sign in | Join | Help

Third Abnormal Form

Resolution to SQL Server problems. From the front line.

  • 3 things that really get on my nerves with using SQL Server

    Don't get me wrong I love SQL Server, I have been using it for many years in all its Windows favours...

    SQL Server 4.21a, 6.0, 6.5, 7, 2000, 2005, 2008, 2008 R2 and 2012

    If fact I trained under someone who used SQL Server 1 on OS/2 (there you go revealing my age)

    However, there are number of things that get under my skin, and here are my top 3 favourites

    1) Actual Execution Plan

       Why is this only displayed once the query has finished executing and returned all the results.
     
       This is particularly annoying if you have a never ending query, and the reason you are displaying
       the Actual Execution Plan is to endeavor to find the reason for your poor performance in the first
       place! A classic Catch 222 situation.

       I know you can display the Estimated Execution Plan, but this is only an estimation of what is
       going to happen not 100% what actually is happening or has happened.
     
       I don't know the internal mechanics of SQL Server enough, but there must be a point when the query
       engine has finished all its work and passing on its findings to the execution engine. Why can't
       the execution plan be displayed then!

    2) Inconsistent (I believe) DDL Commands

       While you can create a Table and an Index in another database. Syntax is

       CREATE TABLE AnyDataBase.AnySchema.AnyTable

       You can not create a View (or Procedure or Function) in another database!

       CREATE VIEW AnySchema.AnyView
       CREATE PROCEDURE AnySchema.AnyProcedure
       CREATE FUNCTION AnySchema.AnyFunction

       I can't see any reason for the limit on this restriction other than maybe working out the metadata across databases
       to ensure all objects bind correctly

       Also while you always CREATE TABLE and DROP TABLE, CREATE VIEW DROP VIEW etc you

       ALTER TABLE ADD COLUMN rather than ALTER TABLE CREATE COLUMN but you still ALTER TABLE DROP COLUMN

       To be fair this is probably an ANSI Standard but it is inconsistent!

    3) Inconsistent USE of the Command Termination ';' Semicolon Command
     
       BEGIN;
       PRINT 'Hello World';
       END;

       is OK! However if you run:

      USE AdventureWorks2008R2;
    GO

    BEGIN TRY
        -- Table does not exist; object name resolution
        -- error not caught.
        SELECT * FROM NonexistentTable;
    END TRY;
    BEGIN CATCH
        SELECT
            ERROR_NUMBER() AS ErrorNumber
            ,ERROR_MESSAGE() AS ErrorMessage;
    END CATCH;
     
    Results in

    Msg 102, Level 15, State 1, Line 6
    Incorrect syntax near ';'.
    Msg 102, Level 15, State 1, Line 11
    Incorrect syntax near 'CATCH'.

    But

    USE AdventureWorks2008R2;
    GO

    BEGIN TRY;
        -- Table does not exist; object name resolution
        -- error not caught.
        SELECT * FROM NonexistentTable;
    END TRY
    BEGIN CATCH;
        SELECT
            ERROR_NUMBER() AS ErrorNumber
            ,ERROR_MESSAGE() AS ErrorMessage;
    END CATCH;

    works OK!

    If you read this posting here this gives some explanation. It is partly a BUG, but also partly by design.
      
        http://connect.microsoft.com/SQLServer/feedback/details/498085/msft-mso-end-try-throws-an-error-if-terminated-with-a-semicolon

        But I just want consistency!


        Another one is that CTE insist that the previous line is correctly terminated, but (AFAIK) NO other
        function, command does!
       
        And before someone says it been rolled out incrementally via newer functionality, the even newer MERGE
        statement does NOT insist that the Termination character is present at the end of the previous statement!

    I maybe returning to this issue again in the future 

     

  • SQL Server Error 18452, "Login failed. The login is from an untrusted domain" Windows 7

    The full error message is SQL Server Error 18452, "Login failed. The login is from an untrusted domain and cannot be used with Windows Authentication" 

    I have just got this error message again on a Windows 7 Workstation, that I am using SQL Server Management Studio on.

    It reminded me that I wanted to blog about the resolution to this and also add how to solve this for Named Instances as well.

    Firstly some background to the issue:

    I have a Windows 7 Workstation that I use SQL Server Management Studio to connect to SQL Servers in both a Production (PROD) and TEST Domains. However, I log on to the Workstation using a Local Windows Account. I have to do this as the PROD and Test Domains are predominantly XP, and the Group Policy that was developed for XP is different to that used in Vista and Windows 7. Currently, we are not able to operate in a mixed mode domain as a result the current roaming profiles we have DO NOT WORK for Window 7. This is a major problem because it means whenever you log off you have to re-assign Outlook Exchange connections and you lose all your internet favourites etc.

    So I have 3 Accounts all with the same Username

    LOCAL\MyUser

    PROD\MyUser

    TEST\MyUser

    I log onto my Windows WorkStation using LOCAL\MyUser and then want to connect seamlessly to both PROD and TEST SQL Servers using SQL Management Studio using Windows Authentication (Not SQL Authentication). I can either do one of three things:

    1) I can attempt to use the Shift Key and then Right Click on SQL Management Studio and 'Run as Different User' and use either PROD or TEST Accounts

    However, I cannot then intermix in SSMS Local SQL Server and different Domain SQL Servers - I am stuck with just those SQL Servers in the Domain I logged in on.....

     2) I can attempt to  use a Line Command as detailed by Jason Follas here:

    http://www.jasonfollas.com/wiki/(X(1)S(zucey4jfr2sr4maocepgrn55))/Default.aspx?Page=SQL%20Management%20Studio%20on%20Vista&AspxAutoDetectCookieSupport=1

    However, again I am limited to using just those SQL Servers in the one Domain, and secondly also I am prompted for the password each time as /savecred does not work with /netonly at the same time.

    3) Finally I can attempt to use Windows Credential Manager to store the various Domain Credentials for me. However, when I did this (the first time) it did not work and I still had the SQL Server Error 18452 Message. I could not find out the answer until Google popped up the following link and Arsine's answer (thanks again Arsine!)

    http://social.msdn.microsoft.com/Forums/hu-HU/sqlsecurity/thread/c05a90e4-cb16-46f6-9072-37083c65696d

    I have to create a Windows Credential and add the SQL Server Port (1443) to the end of the Internet or Network Address in the form :<SQL_Port_Number>

    So to seamlessly connect to SQL Servers in Any Domain from Windows 7 in SQL Management Studio you need to
     
    •         Use Control Panel, User Accounts, Manage Your Credentials, Add a Windows Credential
     
    •         Add all the SQL Servers you want by Fully Qualified Name but you must include the PORT Number as say  :1433
     
    •         Example:
     
    •         ANYSERVER.ANYCOM.TEST:1433
     
    •         Username: TEST\AnyUser 
    •         Password: TEST Password
     
    •         Press Add Button
     
    Note: Each SQL Server will have to be added separately
     
    Note: If there are any Password changes then all references in Windows Credentials will need to be updated

    This way you can control all SQL Servers in ALL Domains seamlessly from one instance of SQL Server Management Studio
     
    I would like to add two additional points to Arsine's excellent posting

    A) I find that it is best to Fully Qualified Name the ServerName in Windows Credentials and SQL Management Studio rather than just the Servername . So use ANYSERVER.ANYCOM.TEST rather than just ANYSERVER

    B) If you have a NAMED INSTANCE of SQL Server you must create multiple entries in Windows Credentials, one using the default SQL Server Port 1433 and one for each Port allocated for the Named Instance such as 2335 so use

    ANYSERVER.ANYCOM.TEST:1433

    as well as  

    ANYSERVER.ANYCOM.TEST:2335

    Note: Of course this will have to be defined as ANYSERVER.ANYCOM.TEST\NamedInstanceName in SQL Management Studio

    To find out the Port that SQL is running on (there are many methods) I suggest you read the link below by Colin Stasiuk which contains by far the most comprehensive solutions I can find. I didn't like hunting throug Registry Entries and looking at the SQL Server Logs was by far the quickest solution (noting the 'possible gotcha' detailed )

    http://sqlserverpedia.com/blog/sql-server-bloggers/what-tcp-port-is-sql-server-running-under/

    Finally I know I could have used SQL Server Authentication without a problem to achive the same thing but I did not want to go through this less secure method.

    PS. To avoid having to use the Shift Key when Right-Clicking for 'Run As Different User" in Windows 7:

    • Just go to registry run regedit.exe, search for “runasuser”, then under it you’ll find empty value REG_SZ named “Extended”.
    •  Delete every “Extended” (there should be around 5 of them
    • Result is that there is no need to press shift key anymore, the option will ALWAYS appear when you Right-Click

     

     

  • A way to find the Real Total Lines of T-SQL Code in a SQL Database

    Version 2

     In my previous post 

    http://consultingblogs.emc.com/paulmcmillan/archive/2012/01/03/find-the-total-lines-of-t-sql-code-in-a-sql-database.aspx

    I said that the code was not perfect because it just counts the number of Line Feeds and you could have a number of
    lines which do not have any text and as such these will be added to the count line total. Therefore you need to
    eliminate any lines with just "whitespace" to get the "real" line count value from the actual line count. I said that perhaps I would have a go at doing this in the future...

    Well I realised that this was quiet easy to do, by extending the logic

    The logic of the code is to get the total length of text and also the total length of text with all the Line Feeds removed
    NCHAR(10). The difference minus 1 is the Number of Lines in the Text.

    To eliminate all Whitespace lines (those with no code in) which could consist of any or all of the following

     Spaces Characters NCHAR (32)
     Tab Characters NCHAR (9)
     Carriage Returns NCHAR (13)
     Corrupted Carriage Return and Line Feed Lines NCHAR (13) + NCHAR (10) + NCHAR (10) - I have seen instances of this
     
    I would need to remove these instances of 'Noise' from the text first and then find the total length after removing the Line Feeds from what remained after the noise was removed.

    I have also added in the Schema Name to the Object Name as the initial version did not have this information and I noticed
    it was possible to skew the results if Objects had the same name but in different Schemas
     

    The Version 2 of this code is below....

     

    CREATE PROCEDURE [dbo].[esp_GetObjectNumberOfRows]

    (

    @DatabaseName SYSNAME = '',

    @GroupingType NCHAR (1) = 'N', -- N = Object Name, T = Object Type, D = Database

    @ExecuteSQL BIT = 1, -- Default Execute SQL On

    @PrintSQL BIT = 0 -- Default Debug SQL Off

    )

    /*

    Paul McMillan

    EMC Consulting 2012

    Version 1

    My Project Manager asked me to find out 'How many lines of code have we in the System'. I could not

    believe that no one has ever had to do this before so I googled 'sql server stored procedure number of lines' and found

    the following posting from Gordon Bell (many thanks!) on Stack Overflow in November 2008. See link below

    http://stackoverflow.com/questions/291574/query-to-list-sql-server-stored-procedures-along-with-lines-of-code-for-each-pro

    However, the solution Gordon provided did not allow you to pass in a DatabaseName as a Parameter (and I had to find this

    out for all the databases in a system) - I could have run it in each database individually but I thought I would take

    the base code and allow a DatabaseName to be passed in as a parameter.

    Furthermore, I thought I would update the code to use the latest SQL sys.objects and sys.sql_modules system tables

    rather than older sysobjects ans syscomments system tables.

    Additionally, I thought I would add a Grouping Parameter, so you can return either the

    1) Total Lines of Code for each Stored Procedure and each Function for any Database. GroupingType = 'N' - Default

    2) Total Lines of Code for all Stored Procedures and all Functions for any Database. GroupingType = 'T' - Default

    3) Total Lines of Code for all Stored Procedures and Functions for any Database . GroupingType = 'D'

    Finally as this is written using Dynamic SQL I have a Switches for Print SQL (Debug) and Execute SQL

    Version 2

    I said that the code was not perfect because it just counts the number if newlines and you could have a number of

    lines which do not have any text and as such these will be added to the count line total. Therefore you need to

    eliminate any lines with just "whitespace" to get the "real" line count value. I said that prehaps I would

    have a go at doing this in the future...

    Well I realised that this was quiet easy to do, by extending the logic

    The logic of the code is to get the total length of text and also the total length of text with all the Line Feeds removed

    NCHAR(10). The difference minus 1 is the Number of Lines in the Text.

    To eliminate all Whitespace lines (those with no code in) which could consist of any or all of the following

    Spaces Characters NCHAR (32)

    Tab Characters NCHAR (9)

    Carriage Returns NCHAR (13)

    Corrupted Carriage Return and Line Feed Lines NCHAR (13) + NCHAR (10) + NCHAR (10) - I have seen instances of this

     

    I would need to remove these instances of 'Noise' from the text first and then find the total length after removing the Line Feeds from what remained after the noise was removed.

    I have also added in the Schema Name to the Object Name as the initial version did not have this information and I noticed

    it was possible to skew the results if Objects had the same name but in different Schemas

    */

    AS

    BEGIN;

    IF @DatabaseName = ''

    SET @DatabaseName = DB_NAME(); -- Default is current database

    DECLARE @SQLMessage NVARCHAR (MAX);

    -- Testing Code

    --DECLARE @DatabaseName SYSNAME = 'master';

    --DECLARE @GroupingType NCHAR (1) = 'T'; -- N = Object Name, T = Object Type, D = Database

    SET @SQLMessage = '

    SELECT

    ' + NCHAR (39) + @DatabaseName + NCHAR (39) + ' AS DatabaseName,

    ' +

    CASE WHEN @GroupingType = 'N' THEN 'T.ObjectName, ' ELSE '' END +

    CASE WHEN @GroupingType IN ('N', 'T') THEN '

    CASE WHEN T.ObjectDescription = ''Stored Procedure''

    THEN SUM(T.ProcedureCount)

    ELSE SUM(T.FunctionCount)

    END AS ObjectCount,' ELSE 'SUM(T.ProcedureCount) + SUM(T.FunctionCount) AS ObjectCount, ' END +

    CASE WHEN @GroupingType IN ('N', 'T') THEN 'T.ObjectDescription, ' ELSE '' END + '

    SUM(T.ActualLinesOfCode) - 1 AS TotalActualLinesOfCode,

    SUM(T.RealLinesOfCode) - 1 AS TotalRealLinesOfCode

    FROM

    (

    SELECT

    CASE WHEN o.type = ''P'' THEN 1 ELSE 0 END AS ProcedureCount,

    CASE WHEN o.type IN (''FN'', ''IF'', ''TF'') THEN 1 ELSE 0 END AS FunctionCount,

    s.name + ''.'' + o.name AS ObjectName,

    (LEN(C.definition) - LEN(REPLACE(C.definition, NCHAR(10), ''''))) AS ActualLinesOfCode,

    (LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(C.definition, '' '', ''''), NCHAR(9), ''''), NCHAR(13), ''''), NCHAR(10) + NCHAR(10), NCHAR(10)), NCHAR(10) + NCHAR(10), NCHAR(10))) - LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(C.definition, '' '', ''''), NCHAR(9), ''''), NCHAR(13), ''''), NCHAR(10) + NCHAR(10), NCHAR(10)), NCHAR(10) + NCHAR(10), NCHAR(10)), NCHAR(10), ''''))) AS RealLinesOfCode,

    CASE WHEN o.type = ''P'' THEN ''Stored Procedure'' -- STORED_PROCEDURE

    WHEN o.type IN (''FN'', ''IF'', ''TF'') THEN ''Function'' -- SQL_SCALAR_FUNCTION, SQL_INLINE_TABLE_VALUED_FUNCTION, TABLE_FUNCTION

    END AS ObjectDescription

    FROM ' + @DatabaseName + '.sys.objects o

    INNER JOIN ' + @DatabaseName + '.sys.sql_modules c

    ON c.object_id = o.object_id

    INNER JOIN ' + @DatabaseName + '.sys.schemas s

    ON o.schema_id = s.schema_id

    WHERE o.type in (''P'', ''FN'', ''IF'', ''TF'')

    ) AS T

    ' +

    CASE WHEN @GroupingType IN ('N', 'T') THEN 'GROUP BY ' ELSE '' END +

    CASE WHEN @GroupingType = 'N' THEN 'T.ObjectName, ' ELSE '' END +

    CASE WHEN @GroupingType IN ('N', 'T') THEN 'T.ObjectDescription' ELSE '' END + '

    ORDER BY 1' + CASE WHEN @GroupingType IN ('N', 'T') THEN ', T.ObjectDescription DESC' ELSE '' END

    + CASE WHEN @GroupingType IN ('N') THEN ', T.ObjectName ASC' ELSE '' END + ';';

    IF @PrintSQL = 1

    PRINT @SQLMessage;

    IF @ExecuteSQL = 1

    EXEC (@SQLMessage);

    END;

  • Find the Total Lines of T-SQL Code in a SQL Database

    My Project Manager asked me to find out 'How many lines of code have we in the System'. I could not
    believe that no one has ever had to do this before so I googled 'sql server stored procedure number of lines' and found
    the following posting from Gordon Bell (many thanks!) on Stack Overflow in November 2008. See link below

    http://stackoverflow.com/questions/291574/query-to-list-sql-server-stored-procedures-along-with-lines-of-code-for-each-pro

    However, the solution Gordon provided did not allow you to pass in a DatabaseName as a Parameter (and I had to find this
    out for all the databases in a system) - I could have run it in each database individually but I thought I would take
    the base code and allow a DatabaseName to be passed in as a parameter.

    Furthermore, I thought I would update the code to use the latest SQL sys.objects and sys.sql_modules system tables
    rather than older sysobjects ans syscomments system tables.

    Additionally, I thought I would add a Grouping Parameter, so you can return either the

    1) Total Lines of Code for each Stored Procedure and each Function for any Database. GroupingType = 'N' - Default
    2) Total Lines of Code for all Stored Procedures and all Functions for any Database. GroupingType = 'T' - Default
    3) Total Lines of Code for all Stored Procedures and Functions for any Database . GroupingType = 'D'

    Finally as this is written using Dynamic SQL I have a Switches for Print SQL (Debug) and Execute SQL

    Note: The code is not perfect because it counts the number if newlines and you could have a number of these with no text on them and these will be added to the count line total. Therefore you need to eliminate any lines with just whitespace from them to get the "real" line count value. Perhaps I will have a go at doing this in the future.

    The code to run this is below 

    and it is called via 

    EXEC dbo.esp_GetObjectNumberOfRows

    @DatabaseName = 'master',

    @GroupingType = 'N',

    @ExecuteSQL = 1,

    @PrintSQL = 0;

     

    CREATE PROCEDURE dbo.esp_GetObjectNumberOfRows

    (

    @DatabaseName SYSNAME = '',

    @GroupingType NCHAR (1) = 'N', -- N = Object Name, T = Object Type, D = Database

    @ExecuteSQL BIT = 1, -- Default Execute SQL On

    @PrintSQL BIT = 0 -- Default Debug SQL Off

    )

    /*

    Paul McMillan

    EMC Consulting 2012

    My Project Manager asked me to find out 'How many lines of code have we in the System'. I could not

    believe that no one has ever had to do this before so I googled 'sql server stored procedure number of lines' and found

    the following posting from Gordon Bell (many thanks!) on Stack Overflow in November 2008. See link below

    http://stackoverflow.com/questions/291574/query-to-list-sql-server-stored-procedures-along-with-lines-of-code-for-each-pro

    However, the solution Gordon provided did not allow you to pass in a DatabaseName as a Parameter (and I had to find this

    out for all the databases in a system) - I could have run it in each database individually but I thought I would take

    the base code and allow a DatabaseName to be passed in as a parameter.

    Furthermore, I thought I would update the code to use the latest SQL sys.objects and sys.sql_modules system tables

    rather than older sysobjects ans syscomments system tables.

    Additionally, I thought I would add a Grouping Parameter, so you can return either the

    1) Total Lines of Code for each Stored Procedure and each Function for any Database. GroupingType = 'N' - Default

    2) Total Lines of Code for all Stored Procedures and all Functions for any Database. GroupingType = 'T' - Default

    3) Total Lines of Code for all Stored Procedures and Functions for any Database . GroupingType = 'D'

    Finally as this is written using Dynamic SQL I have a Switches for Print SQL (Debug) and Execute SQL

    */

    AS

    BEGIN;

    IF @DatabaseName = ''

    SET @DatabaseName = DB_NAME(); -- Default is current database

    DECLARE @SQLMessage NVARCHAR (MAX);

    -- Testing Code

    --DECLARE @DatabaseName SYSNAME = 'master';

    --DECLARE @GroupingType NCHAR (1) = 'T'; -- N = Object Name, T = Object Type, D = Database

    SET @SQLMessage = '

    SELECT

    ' + NCHAR (39) + @DatabaseName + NCHAR (39) + ' AS DatabaseName,

    ' +

    CASE WHEN @GroupingType = 'N' THEN 'T.ObjectName, ' ELSE '' END +

    CASE WHEN @GroupingType IN ('N', 'T') THEN '

    CASE WHEN T.ObjectDescription = ''Stored Procedure''

    THEN SUM(T.ProcedureCount)

    ELSE SUM(T.FunctionCount)

    END AS ObjectCount,' ELSE 'SUM(T.ProcedureCount) + SUM(T.FunctionCount) AS ObjectCount, ' END +

    CASE WHEN @GroupingType IN ('N', 'T') THEN 'T.ObjectDescription, ' ELSE '' END + '

    SUM(T.LinesOfCode) - 1 AS TotalLinesOfCode

    FROM

    (

    SELECT

    CASE WHEN o.type = ''P'' THEN 1 ELSE 0 END AS ProcedureCount,

    CASE WHEN o.type IN (''FN'', ''IF'', ''TF'') THEN 1 ELSE 0 END AS FunctionCount,

    o.name AS ObjectName,

    (LEN(C.definition) - LEN(REPLACE(C.definition, NCHAR(10), ''''))) AS LinesOfCode,

    CASE WHEN o.type = ''P'' THEN ''Stored Procedure'' -- STORED_PROCEDURE

    WHEN o.type IN (''FN'', ''IF'', ''TF'') THEN ''Function'' -- SQL_SCALAR_FUNCTION, SQL_INLINE_TABLE_VALUED_FUNCTION, TABLE_FUNCTION

    END AS ObjectDescription

    FROM ' + @DatabaseName + '.sys.objects o

    INNER JOIN ' + @DatabaseName + '.sys.sql_modules c

    ON c.object_id = o.object_id

    WHERE o.type in (''P'', ''FN'', ''IF'', ''TF'')

    ) AS T

    ' +

    CASE WHEN @GroupingType IN ('N', 'T') THEN 'GROUP BY ' ELSE '' END +

    CASE WHEN @GroupingType = 'N' THEN 'T.ObjectName, ' ELSE '' END +

    CASE WHEN @GroupingType IN ('N', 'T') THEN 'T.ObjectDescription' ELSE '' END + '

    ORDER BY 1' + CASE WHEN @GroupingType IN ('N', 'T') THEN ', T.ObjectDescription DESC' ELSE '' END

    + CASE WHEN @GroupingType IN ('N') THEN ', T.ObjectName ASC' ELSE '' END + ';';

    IF @PrintSQL = 1

    PRINT @SQLMessage;

    IF @ExecuteSQL = 1

    EXEC (@SQLMessage);

    END;

  • SQL Improvements - Use your accrued knowledge

    Previously, I have written a blog post in which a colleague alerted me to a technique for replacing a OR on a INNER JOIN with a UNION ALL - Which solved a performance issue we had with a SQL Server stored procedure

    http://consultingblogs.emc.com/paulmcmillan/archive/2011/08/01/sql-performance-improvements-talk-to-your-colleagues.aspx

    Gareth also alerted me to the possibilities of using a SQL Server Filtered Index to further improve the code performance , as we could put an Filtered Index on the TABLE_B_FK_1_ID and TABLE_B_FK_2_ID columns to enable the query engine to evaluate the condition from the Index rather than the base table.

    http://technet.microsoft.com/en-us/library/cc280372.aspx


    CREATE NONCLUSTERED INDEX TABLE_B_IX_1
        ON dbo.TABLE_B (TABLE_B_FK_1_ID)
    WHERE TABLE_B_FK_1_ID IS NOT NULL;


    CREATE NONCLUSTERED INDEX TABLE_B_IX_2
        ON dbo.TABLE_B (TABLE_B_FK_2_ID)
    WHERE TABLE_B_FK_2_ID IS NOT NULL;

    In this case I did not have time to try this concept out but I will store this technique in my head for future use... Especially as I note you can have INCLUDE columns in the Filtered Index.. and this in a neat way of avoiding Key Look-ups

    However, the fact that I do now know about them, does illustrate the point about using 'accrued knowledge' to improve performance.. As soon as Gareth had mentioned that a OR could be replaced with a UNION ALL  and we proved this could work... my thoughts immediately translated to another issue I was having.

     Originally, the client wanted to roll up a complex business rule in real time. So that as soon as query was made (in this case on Persons record') the result of a complex data query would be returned to the user. However, this data query was so complex that it could not be resolved in satisfactory time to be undertaken in real time.

    In the end, the design evolved to wrap the Business rule in Summary State table that was maintained via a Job that ran every minute, which meant that because of the time lag and the fact that the data was not closely coupled with underlying transactions... that there was a risk that it was not 100% accurate all the time.

    I looked at the code and a number of technique, single code path execution, materialised views, one case of functional reduction identified by an expert user (to eliminate a un-necessary join) and using OUTER APPLY - I managed to get the complex rule down to time where it could be moved from a Job into a Trigger. Thus the Summary State table became more closely coupled to the underlying transaction data. The only downside was that for a small number of users, on  a small number of updates the time to commit data would be increased, but at least the data was 100% accurate for everyone..

    However, the trouble with the Complex Business rule was that it involved a LEFT SELF JOIN to a table and that was on a OR condition join and whatever we tried it still spent 70 to 80% of the code duration in this part of the querty..

    By the way you may know that Materialised Views do not allow LEFT JOINS  but you may not also realise that they do not allow SELF JOINS either. Anyway, as soon as Gareth mentioned his get around for replacing an OR with a  UNION ALL - I immediately realised we could use the same technique on this code block..

    We tried it and it worked we now have the duration of complex business rule down to the point where it could be considered an option to calculate it back in Real time and remove the Summary State table completely..

    What has happened is I used knowledge accrued on X and realised that it could be applied it Y..... 

  • SQL Performance Improvements - Talk to your Colleagues

    Last week I was looking at a performance issue with a piece of code that had never been identified as having an issue before. In these cases you say 'whats changed' well the answer in this case was nothing with the code.. But a lot with the data we were running through it.. Basically because we were running a one-off data clean up exercise though the code we were processing thousands of rows per run rather than tens or rows all this in SQL Server 2008

    Anyway, because we are highly trained IT professionals we tried the data fix in our test lab first, and that's where we found that the code while finishing OK was hanging at one step.. The actual step was found using the code that Merrick Chaffer has made available here:

     http://merrickchaffer.blogspot.com/2011/07/currently-executing-sql-view.html

     Anyway, it was a basically an Insert into a temporary table which had an OR in the join logic for one table.. I will not use the actual code but the issue can be illustrated with the example below

    INSERT INTO #Temp_Update
       (
       TABLE_B_ID,
       TABLE_B_Column
       )
    SELECT  B.TABLE_B_ID, 
       B.TABLE_B_Column
    FROM  dbo.TABLE_A   A
    INNER JOIN dbo.TABLE_C   B
    ON   A.TABLE_A_ID        = B.TABLE_A_FK_ID
    INNER JOIN dbo.TABLE_C   C   
    ON  (
      B.TABLE_B_ID        = C.TABLE_B_FK_1_ID
    OR  B.TABLE_B_ID        = C.TABLE_B_FK_2_ID
      )
    WHERE A.TABLE_A_Column       = 1 
    AND  (
      C.TABLE_B_FK_1_ID       IS NOT NULL
    OR  C.TABLE_B_FK_2_ID       IS NOT NULL
      );

    On its own with no much data throughput on  driver TABLE_A when the join was very quick.. When we had thousands on the driver TABLE A the querty took an age to resolve even with ZERO rows to return. Bascially because the Query Parser could not resolve the OR statement well enough it was generating a complex execution plan with very poor execution time.. 

    I was chewing the cud with this with a colleague Gareth Keen, and he said.. I have seen an article somewhere where an complex OR join was replaced with a UNION ALL statement... "but I don't think you can use it in this case'.. I said I maybe, because all I am actually doing at a lower level is testing that the data 'exists and I can replace the last join with an EXISTS LEFT SEMI JOIN (which is can be faster anyway). To see what this would look like I have converted the above..

     INSERT INTO #Temp_Update
       (
       TABLE_B_ID,
       TABLE_B_Column
       )
    SELECT  B.TABLE_B_ID,
       B.TABLE_B_Column
    FROM  dbo.TABLE_A   A
    INNER JOIN dbo.TABLE_B B
    ON   A.TABLE_A_ID        = B.TABLE_A_FK_ID
    WHERE EXISTS
     (
     SELECT 1
     FROM dbo.TABLE_C   C 
     WHERE  B.TABLE_B_ID        = C.TABLE_B_FK_1_ID
     AND  C.TABLE_B_FK_1_ID       IS NOT NULL
      
     UNION ALL
     
     SELECT 1
     FROM dbo.TABLE_C   C 
     WHERE   B.TABLE_B_ID        = C.TABLE_B_FK_2_ID
     AND  C.TABLE_B_FK_2_ID       IS NOT NULL
     ) AS C

    The result was that the code stormed through this part and we solved the bottleneck.. Thanks Gareth, for dragging this out of your memory and I do note both your comments on this technique

    1) "We really  need to test it first for a number of conditions because I am sure the article said it may not work in all cases and we may be making things much worse'. - We did test it for different conditions

    2)  Always put the condition that is more likely to success the top part of your UNION ALL statement, because as we are using a EXISTS SEMI-JOIN the database execution will stop as soon as we find a match.. We want this to happen as quickly as possible so make sure that code block where the strongest chance of a match is placed first.

    To see a blog entry on how to replace a  UNION ALL with an OR see the link below - I have no idea if this was the article Gareth remembered, but it shows you how to do it..

     http://www.databasechannel.com/AccessArticles/Article_ORAND_UseUnionInsteadofOR.html

     So in this case talking to my Colleague has shown me a new way of doing something.. A collective knowledge base is always greater than a single persons...

    BTW I will be returning to this technique shortly

  • SQL Performance Improvements - Talk to the Real Users...

    Once upon a time, I was sent to publishing company who published many titles, who were having some performance issues with an application that they used to maintain subscription information.

    This was a form in 'in house' Customer Relationship Management (CRM) System that had a Front End application that talked to a back end SQL Server database.. Anyway, (apparently) there was some 'unspecified' performance issue and they got in an external consultancy to have a look and write a report...

    Anyway, I got the gig and I rocked up to the office where the client was based. I met the senior management team who were keen for me to get stuck in with a code review etc.. I however, wanted to find out what the main pain point was... They said well, it is when the Customer rings up to talk to us about their subscriptions etc.. There was basically a call centre in the office and there was a small number of staff who would take the customers call and try and help them... But... it was talking ages for them to search and find the actual Customer records.. Oh I said.... have you a copy of the data model (or can I see the database) and can I talk to the end users.. "Why do you want to talk to the users they said'.. well I just want to see how they use the system...

    Anyway, they said OK and sent me to see a lady on the next floor.., I asked her what happens when a customer calls.. She says.." oh we ask them for their unique client number" and "how many of them know that" I said not many, in fact very rarely do they know that - Ermmnnn.. I said fully knowing I may know by Bank Account number or even my credit card number of pat.. But I would never know my client reference number for a magazine.. Unless I had the magazine covering wrapper handy... Oh I said, what do you do when they don't know their number... "we usually ask them for their name and maybe their address and try and find them from that" she replied....  (of course they did everyone knows their own name and address)... thanks for your time I said.. I had one more look at the database/data model and went to see the management team..

    I think I have solved you main problem I said... Your database has been structured to assume that the key search is you Client Reference Number, where actually the main search is by Name.. On your Customer table you have a Clustered Index on the  Client Reference Number, and a non clustered index on the Surname. But you almost always search on the latter rather than the former... Can I suggest we change the Clustered Index on the Customer table (explaining that the data would now be sorted on this column now so all the Browns and Smiths were together rather than spread out across the table) and evaluate the impact.. We tried it later in the morning on a test system and it seemed to have worked OK with no degradation elsewhere..

    The senior management then changed the live system in the afternoon and the main performance issue went away completely they were finding people in sub-seconds rather than the seconds it was taking before and the client was very happy..

    So the moral of this story is always talk to the real users.. whatever you think may be happening, however you believe as system is working, however it is designed to work one way.. in real life it may be something completely different..

  • Sometimes SQL Performance Improvements are not just code or infrastructure changes. No 1 - Know your Business

    This is a post (well a series of posts) I have been thinking about for some time. Basically, you can add Indexes, Covering Indexes. Materialised Views, filtered Indexes... Partitioning etc.. to your heart’s content... But sometimes it just helps if you know your Business logic....

    I once worked on the Data Warehouse system for one of the major uk supermarkets. In this case while the client sold both non-food and food products the food data was dealt with by its own Data Warehouse system. We had a batch process to aggregate data at the Product Store Day level. In other words how much of each Product was sold at each Store on each Day. We also had to work out margin based on a Cost and Selling Prices which changed according to type of Branch (differential pricing).

    We also had to deal with the concept of VAT (Value Added Tax) a European Wide form of Purchase Tax. However, in the UK (unlike many countries has the concept of Zero rating most food). I say most most because some foodstuffs have VAT applied (most famously Chocolate Covered Biscuits 'aka the famous Jaffa Cake case), some have Zero Rated VAT (as I say most foods, but surprising Cakes) - and some Foods have VAT applied to some of the Product but not all... The example I was given was that if you but a packet of tea in a box, there is not VAT to pay.. However, if you but the same amount of Tea in a Tin, you have to pay VAT on the cost of the Tin (i.e. a proportion). I have actually now read the VAT Rules from the Inland Revenue as part of my research for this post. Available here:

    http://customs.hmrc.gov.uk/channelsPortalWebApp/channelsPortalWebApp.portal?_nfpb=true&_pageLabel=pageLibrary_ShowContent&id=HMCE_CL_000118&propertyType=document

    See Section 6. Mixed supplies: mixtures, promotional linked items and packaging

    By the way, having just re-read it I actually think the Tea Tin example does not work - Because the apportion bit has to more than the more than a certain percentage or cost (say £1) to kick in... But it will still work for the business concept I am attempting to describe.

    BTW to see why Cakes (and Jaffa Cakes in particular) do not have VAT applied, but Chocolate Covered Biscuits do see here:

    http://en.wikipedia.org/wiki/Jaffa_Cakes

    Anyway, back to my story...

    So to process VAT we have effectively to store 2 pieces of information. A flag (Yes/No) to say if VAT is applicable to the Product and a numeric field to hold the VAT Percentage applicable for those VATable items (100%for Fully VAT, say 10% for our Tea in Tin example or NULL or Zero (Not applicable or Zero).

    Note: I accept we could technically hold all this information in 1 numeric field (100 = YES, 0 = NO and a Percentage in between), but this is counter intuitive.

    Anyway, each night we had the VAT Process running (via SQL Server Stored Procedures), which took all our sales for the day (we are talking a couple of million records a day here) and calculated the VAT figures... The VAT process was taking a significant portion of our overnight window, and I was asked to look at it to see if I could speed up. This area had been written before I had been on the project, so they probably considered a fresh pair of eyes could help. The first concern I had with the code is that I noticed it was updating the data. Everyone knows that in SQL Server (as with other RDBMS) that an Update statement is costly. As it involves both a physical Delete and an Insert, so immediately you have twice the log file workload. However, while I realised that while I could have probably re-written an update as an Insert into a new table… what the real problem was that it was doing to many updates..  The VAT Flag and Percentage Apportion Field for the ProductStoreDay had been initialised to assume that VAT was applicable (1 and 100%) but we checked our reference product data we then toggled the field to be either (Off – Set *** to be Zero, or set the Apportion field).

    But as I said at the top of his article in the UK – MOST Food is Zero rated for VAT. So what the batch process was doing was updating around 1.8 Million records setting a Flag to be 0 (from 1) and a field to be NULL (from 100%) and leaving 200,000 records unchanged.

    I realised that if I actually initiated the data to so that the Flag to be 0 and the field to be NULL, and only updating those records which had to have VAT applied, I would cut 80% from the VAT Batch Run… and that is what I did.. It is quicker to update 200,000 records rather than 1.8 million…

    I used my Business Knowledge (the fact that I knew most Food is Zero rated for VAT) to change the logic of code and cause a 5 fold increase in performance… and not an Index change in sight….

  • So you may have many different ways of Inserting Data - But which is the fastest?

    After getting 100,000 rows of Data using the 4 main flavours of Insert Statement I attempted to determine which method was the fastest.. To ensure that the results were not skewed by database re-sizing actions or cached data. I ran each query multiple times and used the following command to clear the CACHE

    CHECKPOINT

    DBCC FREEPROCCACHE -- to clear the procedure cache

    DBCC DROPCLEANBUFFERS -- to test queries with a cold buffer cache

    I said 'attempted' because I found out that with Method 3 Insert with Select Values Clause and using 100,000 rows on my laptop I could not get the query to complete - It just 'hung' (I gave up after 40 minutes) I assume because it needed to materialise the 100,000 rows in memory before it could attempt to Insert them..

    I decided that to make it a test whereby at least I got a result - I therefore limited the number of rows to 1000, this also mean't that it was within the limits of the Insert Statement with Row Constructor method. I put in a Statistics statements and executed the code a number of times.

    The average results (10 runs) were as follows:

    Method 1 Classic Insert:

     -- 274 ms

    Method 2 Insert with row constructor:

    -- 1413 ms

    Method 3 Insert with Select Values Clause:

    -- 1540 ms

    Method 4 Merge Statement:

    -- 1574 ms

    As can be seen the fastest method (reduction of at least  80%) is the original Classic Insert Statement this was really surprisng to me, I wondered why - One obvious change is the locking involved. The Classic Insert Statement was taking Exclusive (X) on Index Key Locks and Intent Exclusive (IX) Locks on Pages and Tables while the other types seem to be taking Schema Stability  (Sch-S) Locks on Table and Metadata and Shared (S) Database Locks. Probably because they ALL have to materialise the Insert data in some form before they marshall it to the Insert statement... 

    Conclusions

    1) Many ways to Insert Data in SQL Server

    2) Easiest to write probably Insert with Row Constructor - Though this is limited to 1000 rows per batch

    3) Best for maintainability and control is the MERGE statement

    4) Fastest method is Classic Insert - and Insert with Select Values Clause will probably never finish with a very large number of rows 

     

    See also

     http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/how-many-different-ways-can-you-write-a-sql-insert-statement-1-2-3-6.aspx

    http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/so-now-you-can-write-an-insert-statement-6-different-ways-why-do-you-need-to-know-this-and-why-is-merge-the-best.aspx

    http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/a-gotcha-with-the-insert-statement-row-constructor.aspx

     


  • A Gotcha with the INSERT Statement Row Constructor and using a TALLY Table to Generate Test Data

    When I worked out there were at least 6 methods of Inserting data into a Table, I wanted to find out which was the fastest method... I will reveal the surprising fact in another posting. But I thought I would test the performance of each method with a large amount of data.. The first problem I had was how do I create a large number of Insert statements in many different forms? - Well writing Dynamic SQL seemed to be the best option, but to do this I still needed to create (I decided 100000) different rows from which I could build up my SQL Statement.

    I decided that I needed to create a TALLY table (sometimes called a Numbers table) with 100000 different values in it.. I knew the fastest method do this was via a CROSS JOIN in a CTE - (I think it was in a Blog post by Jamie Thomson - But I can't find it now) so I googled CTE and Tally and came up with this example here..

     http://beyondrelational.com/blogs/ozrenkrznaric/archive/2011/06/21/tally-table.aspx

    I must admit I thought ot a bit hard to read, even though I knew what it was supposed to do, so I rewrote it slightly, using a 'units' based aliasing system to be understood a bit easier. The code for this is below

    IF OBJECT_ID('dbo.Tally') IS NOT NULL

     

    DROP TABLE dbo.Tally;

     

     

    CREATE TABLE dbo.Tally

    (

    ID INT NOT NULL,

    PRIMARY KEY CLUSTERED

    (

    ID

    )

    );

    ;WITH Ones (N)

    AS

    (

    SELECT 1 AS N

    UNION ALL SELECT 1

    UNION ALL SELECT 1

    UNION ALL SELECT 1

    UNION ALL SELECT 1

    UNION ALL SELECT 1

    UNION ALL SELECT 1

    UNION ALL SELECT 1

    UNION ALL SELECT 1

    UNION ALL SELECT 1

    )

    ,Tens (N)

    AS

    (

    SELECT 1

    FROM Ones A

    CROSS JOIN Ones B

    )

    ,Hundreds (N)

    AS

    (

    SELECT 1

    FROM Tens A

    CROSS JOIN Tens B

    )

    ,Thousands (N)

    AS

    (

    SELECT 1

    FROM Hundreds A

    CROSS JOIN Tens B

    )

    INSERT INTO dbo.Tally (ID)

    SELECT TOP 100000

    N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM Thousands;

    You can extend this concept to the millions and billions of rows if you so wish.. As an example of the speed it took just over 2 seconds on my laptop to generate the data..

    I could then use the TALLY table to create my various different  Insert Statements

    e.g.

    SELECT 'INSERT INTO dbo.InsertTest1 (ID, TextString) VALUES (' + CAST(ID AS NVARCHAR) + ' , ' + CHAR(39) + 'Value: ' + + RIGHT(CAST((1000000 + ID) AS NVARCHAR (7)), 6) + + CHAR(39) + ');'

    FROM dbo.Tally;

     

    SELECT 'INSERT INTO dbo.InsertTest2 (ID, TextString) VALUES'

    UNION ALL

    SELECT '(' + CAST(ID AS NVARCHAR) + ' , ' + CHAR(39) + 'Value: ' + + RIGHT(CAST((1000000 + ID) AS NVARCHAR (7)), 6) + CHAR(39) + ')' + CASE WHEN ID <> 100000 THEN ',' ELSE '' END

    FROM dbo.Tally

    UNION ALL

    SELECT ';';

     

    SELECT 'INSERT INTO dbo.InsertTest3 (ID, TextString)

    SELECT T.ID, T.TextString

    FROM

    (

    VALUES'

    UNION ALL

    SELECT '(' + CAST(ID AS NVARCHAR) + ' , ' + CHAR(39) + 'Value: ' + + RIGHT(CAST((1000000 + ID) AS NVARCHAR (7)), 6) + CHAR(39) + ')' + CASE WHEN ID <> 100000 THEN ',' ELSE '' END

    FROM dbo.Tally

    UNION ALL

    SELECT ') AS T (ID, TextString);';

     OK Then what is the GOTCHA ?

     Well I on the INSERT Statement Row Constructor I got the following error message

     

    Msg 10738, Level 15, State 1, Line 1021

    The number of row value expressions in the INSERT statement exceeds the maximum allowed number of 1000 row values.

    I had not appreciated that there is a LIMIT (of 1000) to the number of lines you can add to a single INSERT Statement when using the row (also known as table) constructor method...

    So to insert 100,000 rows I would need 100 batches of 1000 rows using this method....

     

    See also

     

    http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/how-many-different-ways-can-you-write-a-sql-insert-statement-1-2-3-6.aspx

    http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/so-now-you-can-write-an-insert-statement-6-different-ways-why-do-you-need-to-know-this-and-why-is-merge-the-best.aspx

    http://consultingblogs.emc.com/paulmcmillan/archive/2011/07/06/so-you-may-have-many-different-ways-of-inserting-data-but-which-is-the-fastest.aspx

Powered by Community Server (Personal Edition), by Telligent Systems