AP 12 - Beware of implicit self-joins in a correlated subquery

Severity

Warning Caution

Type

Performance

Readability

Problem

Using an implicit self-join in a correlated subquery in the WHERE clause can cause two problems.

  • Most SQL developers find correlated subqueries difficult to read or debug.
  • Using a correlated subquery may cause a performance issue.

An example. The employee table is used both in the outer and inner query

SELECT lastname
      ,firstname
      ,salary
  FROM employee e1
 WHERE e1.salary>(SELECT avg(salary)
                     FROM employee e2
                    WHERE e2.dept_id=e1.dept_id)

Solution

Using a window function instead of an implicit self-join will improve the query performance for most cases and also make it easier to understand.

SELECT lastname
      ,firstname
      ,salary
      ,AVG(salary) OVER(PARTITION BY dept_id) AS avg_sal
  FROM employee
QUALIFY salary>avg_sal

If your database does not support the QUALIFY operator use a subquery

SELECT lastname
      ,firstname
      ,salary
  FROM( SELECT lastname
              ,firstname
              ,salary
              ,AVG(salary) OVER(PARTITION BY dept_id) AS avg_sal
          FROM employee)
 WHERE salary>avg_sal

Legitimate use of the anti pattern

In some edge situations the implicit self-join can be more performant. The window function is generally more performant under the following conditions

  • You are dealing with a non-equi join, e.g. >, <, >=, <=. For Non-equi joins the CBO can not use a hash join, which may impact performance.
  • The columns involved in the join condition do not contain indexes or cluster keys.
  • In a self join, there are no table partitions on the columns involved in the join.
  • The self join table is large and has not been reduced on one side of the join by filtering or aggregation. In the example above the inline view has been reduced by aggregation. The resulting self-join is less expensive and may be legitimate.

It is a good idea to run a performance test to compare performance of the query with the self join to the performance of the window function.