Arctus Blog: Advanced SQL: Removing sub-queries with WITH (SQL Server 2005) tag:blog.arctus.co.uk,2005:Typo Typo 2007-04-19T08:01:47-04:00 Adrian O'Connor adrian@arctus.co.uk urn:uuid:58d8d72f-eec3-4a27-9296-705fc881b7dc 2007-04-19T07:34:00-04:00 2007-04-19T08:01:47-04:00 Advanced SQL: Removing sub-queries with WITH (SQL Server 2005) <p>A recent addition to SQL Server&#8217;s support for ANSI SQL is the WITH keyword. This little gem lets you remove sub-queries from the body of your SELECT statements and turn them in to named tables.</p> <p>Sub-queries are bad for several reasons. Most of all, they make queries harder to read. There is no obvious way to indent a sub-query, so it kind of sits in the middle of what would otherwise be a well formated statement. This can obscure the true behaviour and intentions of the query. Furthermore, they can severely hurt performance. SQL Server might need to execute a complex sub-query seperately for every single row that is in an outer results set. If we&#8217;re interested in performance and scalability, we need SQL Server to be able to execute our entire query with just one plan.</p> <p>Some queries are often just uneccessary and ill-thought out. I have seen sub-queries where joins should have been used on many an occassion. This kind of subquery is always hurting perofrmance. As a general rule of thumb, if your sub-query is not performing any aggregate calculations (COUNT, AVG, MAX with GROUP BY) then it can probably be re-written as a join.</p> <p>Here, however, we are not intersted in that kind of mis-use. I am going to show you how to turn a query containing a genuine sub-query in to a far more readable block of code.</p> <p>The following query, complete with a typical subquery, returns the most expensive product for each category in a sample database (we are, of course, talking about NorthWind):</p> <pre><code>SELECT c.CategoryName, p.ProductName, p.UnitPrice FROM Categories c INNER JOIN (SELECT CategoryId, MAX(UnitPrice) AS MaxPrice FROM Products GROUP BY CategoryId) maxprice ON maxprice.CategoryId = c.CategoryId INNER JOIN Products p ON p.CategoryId = c.CategoryId AND p.UnitPrice = maxprice.MaxPrice ORDER BY MaxPrice DESC</code></pre> <p>What we will do, using WITH, is remove the subquery from the main body of our query, and reference it using an alias:</p> <pre><code>WITH MostExpensiveProducts (CategoryId, MaxUnitPrice) AS ( SELECT CategoryId, MAX(UnitPrice) FROM Products GROUP BY CategoryId ) SELECT c.CategoryName, p.ProductName, p.UnitPrice FROM Categories c INNER JOIN MostExpensiveProducts mep ON mep.CategoryId = c.CategoryId INNER JOIN Products p ON p.CategoryId = mep.CategoryId AND p.UnitPrice = mep.MaxUnitPrice ORDER BY mep.MaxUnitPrice DESC;</code></pre> <p>In our first query where we had a subquery, we now have the table name &#8220;MostExpensiveProducts&#8221;. It is much easier to see, at a glance, what this table is doing.</p> <p>Although it is only a hunch, I suspect that SQL Server is also better able to create a plan for the aliased query that will not become invalid. That&#8217;s just a hunch.</p>