AP 28 - Avoid using functions in the Join clause

Severity

Warning Caution

Type

Performance

Problem

There are two reasons to avoid using a function on a column inside a Join clause.

  • It is CPU intensive because the function gets called for each record in your result set.
  • It prevents the database from applying an index, sort key or cluster key. As a result, the database needs to read more data from the storage layer. This can have a significant impact on query performance.

In the example below we need to apply a function to convert the order_date column with data type string to a number. The function makes any index on the order_date column unusable.

SELECT b.sales_month
      ,sum(a.sales_amt) sales_amt
  FROM orders a
  JOIN date_dim b 
    ON TO_NUMBER(a.order_date)
              =b.date_id

Solution

The proper solution depends on the exact nature of your scenario. Properly design your data model and make foreign keys the same data type as the parent keys, e.g. in the example above make both columns a date or number. Certain scenarios may require you to precompute the function as part of ETL or a batch job. Some databases, such as Oracle offer index based functions. Last but not least you can rewrite the query so that the function is used on the side of the join with the smaller table. This will allow the index to be used on the side of the bigger table.

SELECT b.sales_month
      ,sum(a.sales_amt) sales_amt
  FROM orders a
  JOIN date_dim b 
    ON a.order_date_id
              =TO_CHAR(b.date_id)

Legitimate use of the anti pattern

  • In some situations, you may not be able to avoid using the function in the WHERE clause.
  • Your data volumes are small.
  • You are dealing with an ad-hoc query
  • The column with the function inside the join condition does not have an index or cluster key