Arctus Blog: Learning SQL: Avoiding the use of IN and NOT IN http://blog.arctus.co.uk/articles/2007/04/20/learning-sql-avoiding-the-use-of-in-and-not-in en-us 40 Learning SQL: Avoiding the use of IN and NOT IN <p>IN and <span class="caps">NOT IN</span> are a feature of <span class="caps">SQL</span> that should be avoided at all cost. This is important to remember, especially if you use Microsoft <span class="caps">SQL</span> Server. The use of IN can cause a massive bottleneck in your application.</p> An example of IN: <pre>SELECT * FROM TableOne WHERE SomeId IN (SELECT SomeReferece FROM TableTwo)</pre> An example of <span class="caps">NOT IN</span>: <pre>SELECT * FROM TableOne WHERE SomeId NOT IN (SELECT SomeReferece FROM TableTwo)</pre> <p>Apart from being messy, the problem is this: <span class="caps">SQL</span> Server does not know ahead of time what values are going to be returned by the IN query. This means that it has to run the same query over and over again for each row in TableOne (in our example). That causes a massive overhead.</p> <p><span class="caps">NOT IN</span> is even worse. With IN, as soon as the sub-query returns a matching value it knows that it can stop searching and move on to the next row. <span class="caps">NOT IN</span> requires a check of every single row of the sub-query, because without checking every single row we can&#8217;t know if our value is not in that list.</p> <p>Imaging that TableOne returns 200 rows, and that TableTwo has 500 records. For both examples, <span class="caps">SQL</span> Server will run 201 queries. For the first example, we will need to scan between 220 and 100,000 (approx) rows to complete the query, based on the data. For the second example we will always need to scan 100,000 rows. That&#8217;s a lot.</p> <p>Indexes can cover-up the problem somewhat, but you can&#8217;t get away from the problem of number of queries &#8211; especially if this stops <span class="caps">SQL</span> Server from creating a &#8216;plan&#8217;. There are better, more elegant solutions. The two most common are shown here.</p> <p>To replace IN, we use an <span class="caps">INNER JOIN</span>. They are effectively the same thing. This code does exactly the same as example 1:</p> <pre>SELECT t1.* FROM TableOne t1 INNER JOIN TableTwo t2 ON t1.SomeId = t2.SomeReference</pre> <p>We will only ever get rows for t1 where there is a corresponding entry in t2.</p> <p><span class="caps">NOT IN</span> is effectly saying &quot;where t2 doesn&#8217;t have the corresponding row&quot;. That requires a left outer join. I shall explain <span class="caps">LEFT OUTER JOIN</span> more fully in another article, but what we need to know is that it will return data if it can, other wise it will return an empty result. We can check for the empty result by seeing if the joined column <span class="caps">IS NULL</span>. It&#8217;s much easier than it sounds.</p> <p>The following code achieves exactly the same results as example 2 above:</p> <pre>SELECT t1.* FROM TableOne t1 LEFT OUTER JOIN TableTwo t2 ON t1.SomeId = t2.SomeReference WHERE t2.SomeReference IS NULL</pre> <p>Before I finish, I shall say that there is one place where IN might just be OK (although even that is debatable). That one case is when you are supplying a hard-coded list of numbers. E.g.:</p> <p><span class="caps">WHERE</span> TypeId IN (1, 2, 3, 5, 9)</p> <p>That is the only time.</p> Fri, 20 Apr 2007 07:28:00 -0400 urn:uuid:c69bb417-4305-4754-bab5-2c5cd89f01c6 adrian@arctus.co.uk (Adrian O'Connor) http://blog.arctus.co.uk/articles/2007/04/20/learning-sql-avoiding-the-use-of-in-and-not-in SQL Server