Learning SQL: Avoiding the use of IN and NOT IN
IN and NOT IN are a feature of SQL that should be avoided at all cost. This is important to remember, especially if you use Microsoft SQL Server. The use of IN can cause a massive bottleneck in your application.
An example of IN:SELECT *
FROM TableOne
WHERE SomeId IN (SELECT SomeReferece FROM TableTwo)
An example of NOT IN:
SELECT *
FROM TableOne
WHERE SomeId NOT IN (SELECT SomeReferece FROM TableTwo)
Apart from being messy, the problem is this: SQL Server does not know ahead of time what values are going to be returned by the IN query. This means that it has to run the same query over and over again for each row in TableOne (in our example). That causes a massive overhead.
NOT IN is even worse. With IN, as soon as the sub-query returns a matching value it knows that it can stop searching and move on to the next row. NOT IN requires a check of every single row of the sub-query, because without checking every single row we can’t know if our value is not in that list.
Imaging that TableOne returns 200 rows, and that TableTwo has 500 records. For both examples, SQL Server will run 201 queries. For the first example, we will need to scan between 220 and 100,000 (approx) rows to complete the query, based on the data. For the second example we will always need to scan 100,000 rows. That’s a lot.
Indexes can cover-up the problem somewhat, but you can’t get away from the problem of number of queries – especially if this stops SQL Server from creating a ‘plan’. There are better, more elegant solutions. The two most common are shown here.
To replace IN, we use an INNER JOIN. They are effectively the same thing. This code does exactly the same as example 1:
SELECT t1.*
FROM TableOne t1
INNER JOIN TableTwo t2
ON t1.SomeId = t2.SomeReference
We will only ever get rows for t1 where there is a corresponding entry in t2.
NOT IN is effectly saying "where t2 doesn’t have the corresponding row". That requires a left outer join. I shall explain LEFT OUTER JOIN more fully in another article, but what we need to know is that it will return data if it can, other wise it will return an empty result. We can check for the empty result by seeing if the joined column IS NULL. It’s much easier than it sounds.
The following code achieves exactly the same results as example 2 above:
SELECT t1.*
FROM TableOne t1
LEFT OUTER JOIN TableTwo t2
ON t1.SomeId = t2.SomeReference
WHERE t2.SomeReference IS NULL
Before I finish, I shall say that there is one place where IN might just be OK (although even that is debatable). That one case is when you are supplying a hard-coded list of numbers. E.g.:
WHERE TypeId IN (1, 2, 3, 5, 9)
That is the only time.
