Arctus Blog: Advanced SQL: Removing sub-queries with WITH (SQL Server 2005)tag:blog.arctus.co.uk,2005:TypoTypo2007-04-19T08:01:47-04:00Adrian O'Connoradrian@arctus.co.ukurn:uuid:58d8d72f-eec3-4a27-9296-705fc881b7dc2007-04-19T07:34:00-04:002007-04-19T08:01:47-04:00Advanced SQL: Removing sub-queries with WITH (SQL Server 2005)<p>A recent addition to SQL Server’s support for ANSI SQL is the WITH keyword. This little gem lets you remove sub-queries from the body of your SELECT statements and turn them in to named tables.</p>
<p>Sub-queries are bad for several reasons. Most of all, they make queries harder to read. There is no obvious way to indent a sub-query, so it kind of sits in the middle of what would otherwise be a well formated statement. This can obscure the true behaviour and intentions of the query. Furthermore, they can severely hurt performance. SQL Server might need to execute a complex sub-query seperately for every single row that is in an outer results set. If we’re interested in performance and scalability, we need SQL Server to be able to execute our entire query with just one plan.</p>
<p>Some queries are often just uneccessary and ill-thought out. I have seen sub-queries where joins should have been used on many an occassion. This kind of subquery is always hurting perofrmance. As a general rule of thumb, if your sub-query is not performing any aggregate calculations (COUNT, AVG, MAX with GROUP BY) then it can probably be re-written as a join.</p>
<p>Here, however, we are not intersted in that kind of mis-use. I am going to show you how to turn a query containing a genuine sub-query in to a far more readable block of code.</p>
<p>The following query, complete with a typical subquery, returns the most expensive product for each category in a sample database (we are, of course, talking about NorthWind):</p>
<pre><code>SELECT c.CategoryName, p.ProductName, p.UnitPrice
FROM Categories c
INNER JOIN (SELECT CategoryId, MAX(UnitPrice) AS MaxPrice
FROM Products GROUP BY CategoryId) maxprice
ON maxprice.CategoryId = c.CategoryId
INNER JOIN Products p
ON p.CategoryId = c.CategoryId
AND p.UnitPrice = maxprice.MaxPrice
ORDER BY MaxPrice DESC</code></pre>
<p>What we will do, using WITH, is remove the subquery from the main body of our query, and reference it using an alias:</p>
<pre><code>WITH MostExpensiveProducts (CategoryId, MaxUnitPrice) AS
(
SELECT CategoryId, MAX(UnitPrice)
FROM Products
GROUP BY CategoryId
)
SELECT c.CategoryName, p.ProductName, p.UnitPrice
FROM Categories c
INNER JOIN MostExpensiveProducts mep
ON mep.CategoryId = c.CategoryId
INNER JOIN Products p
ON p.CategoryId = mep.CategoryId
AND p.UnitPrice = mep.MaxUnitPrice
ORDER BY mep.MaxUnitPrice DESC;</code></pre>
<p>In our first query where we had a subquery, we now have the table name “MostExpensiveProducts”. It is much easier to see, at a glance, what this table is doing.</p>
<p>Although it is only a hunch, I suspect that SQL Server is also better able to create a plan for the aliased query that will not become invalid. That’s just a hunch.</p>