Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

Katmai: T-SQL Grouping sets

SQL Server 2005 includes the WITH CUBE and WITH ROLLUP syntaxes that are used to display summary information for combinations of columns in the GROUP BY clause. That's useful functionality but it didn't provide particularly fine-grained control over what summaries were returned. Essentially it was all-or-nothing.

In katmai, the next version of SQL Server, that all changes with the introduction of GROUPING SETS. Grouping sets enable us to define which summaries we wish to return. This is probably better explained with an example.Here is a traditional GROUP BY query (executes against AdventureWorksDW):

select    ProductKey

,       OrderDateKey

,       CustomerKey

,       PromotionKey

,       sum(UnitPrice)SumUnitPrice

,       SUM(OrderQuantity)SumOrderQuantity

from    dbo.FactInternetSales

group    by

        ProductKey

,       OrderDateKey

,       CustomerKey

,       PromotionKey;

and here is another logically identical query that uses grouping sets:

select    ProductKey

,       OrderDateKey

,       CustomerKey

,       PromotionKey

,       sum(UnitPrice)SumUnitPrice

,       SUM(OrderQuantity)SumOrderQuantity

from    dbo.FactInternetSales

group    by

        grouping sets

        (

            (   ProductKey

            ,   OrderDateKey

            ,   CustomerKey

            ,   PromotionKey

            )

        );

This second query has one grouping set that contains the same list of columns as in the first query - hence it returns the same data. In this scenario both queries do exactly the same thing so it isn't particularly useful of course but I've included it to show the differing syntax.

Grouping sets become really useful when we use them to specify different aggregations. Here's an example that does that:

select    ProductKey

,        OrderDateKey

,        CustomerKey

,        PromotionKey

,        sum(UnitPrice)SumUnitPrice

,        SUM(OrderQuantity)SumOrderQuantity

from    dbo.FactInternetSales

group    by

        grouping sets

        (

            --Aggregate by all columns in the select clause

            (    ProductKey

            ,    OrderDateKey

            ,    CustomerKey

            ,    PromotionKey

            ),

            --Aggregate by a subset of the columns in the select clause

            (    ProductKey

            ,    OrderDateKey

            ,    CustomerKey

            ),

            ()    --ALL aggregation

        );

This query has 3 grouping sets:

  • The first will aggregate UnitPrice & OrderQuantity over all columns in the select clause,
  • The second aggregates UnitPrice & OrderQuantity over a subset of columns
  • The third returns the aggregation of UnitPrice & OrderQuantity over all of the returned columns hence this grouping set only returns one row.

Where a grouping set does not include a column that is in the SELECT clause then a NULL will be returned. The resultset from the query is a UNION of the data returned by each grouping set.

Here's a screenshot that illustrates this a bit better:

First check out the last row. That's the row that is returned by the third grouping set (the ALL aggregation) hence it contains NULL for each of the non-aggregated columns. I also happen to know that none of the non-aggregated columns in the select clause contains NULLs therefore the other rows containing NULL values that you can see here are returned by the grouping set that is a subset of the columns in the select clause.

 

So that describes what grouping sets are but how would they be used? Well off the top of my head I know that this would be useful for Microstrategy which (as far as I know) allows users to define their own aggregations rather than letting the OLAP engine pick them. Can you think of any uses?

-Jamie

 

Posted to Microsoft (RSS feed), Katmai (RSS feed), SQL Server (RSS feed), T-SQL (RSS feed), & Grouping Sets (RSS feed)

 

Technorati tags: , ,
Published Monday, June 04, 2007 2:03 AM by jamie.thomson

Comments

 

David.Portas said:

Guess the results of the following two GROUP BY queries against an empty table. Are SQL Server's answers correct? :)

CREATE TABLE r (b INT);

SELECT 1 FROM r HAVING 1=1;

SELECT 1 FROM r GROUP BY ();

June 5, 2007 3:08 PM
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems