AP 21 - Use window functions instead of self joins

Severity

Warning Caution

Type

Performance

Readability

Problem

Self-joins are frequently used to perform inter row calculations to look forward or backward across row boundaries in a table.

Self joins increase the number of records fetched and the number of table scans

Self-joins are hard to read and understand for most SQL developers in particular if they contain non-equi joins

Self-joins can be hard to debug

Here is an example of a self-join using a non-equi join condition (>).

SELECT wv1.vehicleregistration
      ,wv1.chillersensornumber
      ,wv1.recordedwhen
      ,MAX(wv2.recordedwhen) AS previous_recordedwhen
      ,DATEDIFF('second'
          ,MAX(wv2.recordedwhen)
          ,wv1.recordedwhen) AS TIME_ELAPSED
  FROM warehouse_vehicletemperatures wv1
  LEFT
  JOIN warehouse_vehicletemperatures wv2
    ON wv1.vehicleregistration=wv2.vehicleregistration
   AND wv1.chillersensornumber=wv2.chillersensornumber
   AND wv1.recordedwhen>wv2.recordedwhen
 GROUP BY wv1.vehicleregistration
      ,wv1.chillersensornumber
      ,wv1.recordedwhen;

Solution

Most self joins can be replaced by using window functions.

Window functions generally do not suffer from the issues of self-joins

SELECT vehicleregistration
      ,chillersensornumber
      ,recordedwhen
      ,LAG(recordedwhen) OVER(PARTITION BY vehicleregistration
          ,chillersensornumber 
     ORDER BY recordedwhen) AS previous_recordedwhen
      ,DATEDIFF('second'
          ,LAG(recordedwhen) OVER(PARTITION BY vehicleregistration
              ,chillersensornumber 
         ORDER BY recordedwhen)
          ,recordedwhen) AS TIME_ELAPSED
  FROM warehouse_vehicletemperatures;

Legitimate use of the anti pattern

In rare cases a self-join can be more performant than a window function. The more of the following conditions your query meets, the more efficiently the window function will operate compared to the self join.

  • Your self join uses a non-equi join, e.g. >, <, >=, <=. For Non-equi joins the CBO can not use a hash join, which is a very expensive operation.
  • The self-join can not make use of indexes or cluster keys on the columns involved in the join condition
  • There are no table partitions on the columns involved in the self join
  • The self join table is large and has not been reduced on one side of the join by filtering or aggregation.

Most of the time, it’s a good idea to run a performance test to compare how well the query with the self-join works to how well the window function works.