AP 12 - Beware of implicit self-joins in a correlated subquery
Severity
CautionType
Performance
Readability
Problem
Using an implicit self-join in a correlated subquery in the WHERE clause can cause two problems.
- Most SQL developers find correlated subqueries difficult to read or debug.
- Using a correlated subquery may cause a performance issue.
An example. The employee table is used both in the outer and inner query
SELECT lastname
,firstname
,salary
FROM employee e1
WHERE e1.salary>(SELECT avg(salary)
FROM employee e2
WHERE e2.dept_id=e1.dept_id)
Solution
Using a window function instead of an implicit self-join will improve the query performance for most cases and also make it easier to understand.
SELECT lastname
,firstname
,salary
,AVG(salary) OVER(PARTITION BY dept_id) AS avg_sal
FROM employee
QUALIFY salary>avg_sal
If your database does not support the QUALIFY operator use a subquery
SELECT lastname
,firstname
,salary
FROM( SELECT lastname
,firstname
,salary
,AVG(salary) OVER(PARTITION BY dept_id) AS avg_sal
FROM employee)
WHERE salary>avg_sal
Legitimate use of the anti pattern
In some edge situations the implicit self-join can be more performant. The window function is generally more performant under the following conditions
- You are dealing with a non-equi join, e.g. >, <, >=, <=. For Non-equi joins the CBO can not use a hash join, which may impact performance.
- The columns involved in the join condition do not contain indexes or cluster keys.
- In a self join, there are no table partitions on the columns involved in the join.
- The self join table is large and has not been reduced on one side of the join by filtering or aggregation. In the example above the inline view has been reduced by aggregation. The resulting self-join is less expensive and may be legitimate.
It is a good idea to run a performance test to compare performance of the query with the self join to the performance of the window function.