Correctness
If any row of a subquery using NOT IN returns NULL, the entire NOT IN operator will evaluate to either FALSE or UNKNOWN and no records will be returned. NOT IN will not match any rows if the subquery just returns a single row with NULL.
Let’s look at an example
We have an orders table
and an order_details table with one NULL for order_id
You create a query that returns orders in the orders table without a record in the order_details table
SELECT order_id
FROM orders
WHERE order_id NOT IN(SELECT order_id
FROM order_details);
You might expect the following result
but you get back the following result
The query returns no rows at all. NOT IN will not match any rows if the subquery just returns a single row with NULL.
You can rewrite your query using a defensive programming style to check for NULL in the subquery
SELECT *
FROM orders
WHERE order_id NOT IN(SELECT order_id
FROM order_details
WHERE order_id IS NOT NULL);
This query will give you the correct answer to your question.
Alternatively, you could use NOT EXISTS instead of NOT IN.
SELECT *
FROM orders o
WHERE NOT EXISTS(SELECT NULL
FROM order_details od
WHERE o.order_id=od.order_id);
Exists cannot return NULL. It’s checking solely for the presence or absence of a row in the subquery and, hence, it can only return true or false. Since it cannot return NULL, there’s no possibility of a single NULL resulting in the entire expression evaluating to UNKNOWN.
While NOT and EXISTS are equivalent, NOT IN and NOT EXISTS have different behaviour when it comes to NULL.
If you know that the column in question has a NOT NULL constraint you can leave out the extra check for NULL values in your NOT IN subquery. However, in the spirit of defensive programming you should still add it. The constraint may be dropped in the future