AP 21 - Use window functions instead of self joins
Severity
CautionType
Performance
Readability
Problem
Self-joins are frequently used to perform inter row calculations to look forward or backward across row boundaries in a table.
Self joins increase the number of records fetched and the number of table scans
Self-joins are hard to read and understand for most SQL developers in particular if they contain non-equi joins
Self-joins can be hard to debug
Here is an example of a self-join using a non-equi join condition (>).
SELECT wv1.vehicleregistration
,wv1.chillersensornumber
,wv1.recordedwhen
,MAX(wv2.recordedwhen) AS previous_recordedwhen
,DATEDIFF('second'
,MAX(wv2.recordedwhen)
,wv1.recordedwhen) AS TIME_ELAPSED
FROM warehouse_vehicletemperatures wv1
LEFT
JOIN warehouse_vehicletemperatures wv2
ON wv1.vehicleregistration=wv2.vehicleregistration
AND wv1.chillersensornumber=wv2.chillersensornumber
AND wv1.recordedwhen>wv2.recordedwhen
GROUP BY wv1.vehicleregistration
,wv1.chillersensornumber
,wv1.recordedwhen;
Solution
Most self joins can be replaced by using window functions.
Window functions generally do not suffer from the issues of self-joins
SELECT vehicleregistration
,chillersensornumber
,recordedwhen
,LAG(recordedwhen) OVER(PARTITION BY vehicleregistration
,chillersensornumber
ORDER BY recordedwhen) AS previous_recordedwhen
,DATEDIFF('second'
,LAG(recordedwhen) OVER(PARTITION BY vehicleregistration
,chillersensornumber
ORDER BY recordedwhen)
,recordedwhen) AS TIME_ELAPSED
FROM warehouse_vehicletemperatures;
Legitimate use of the anti pattern
In rare cases a self-join can be more performant than a window function. The more of the following conditions your query meets, the more efficiently the window function will operate compared to the self join.
- Your self join uses a non-equi join, e.g. >, <, >=, <=. For Non-equi joins the CBO can not use a hash join, which is a very expensive operation.
- The self-join can not make use of indexes or cluster keys on the columns involved in the join condition
- There are no table partitions on the columns involved in the self join
- The self join table is large and has not been reduced on one side of the join by filtering or aggregation.
Most of the time, it’s a good idea to run a performance test to compare how well the query with the self-join works to how well the window function works.