AP 36 - JOIN condition linking the compound key to multiple, different tables.

Severity

Caution Caution

Type

Performance

Correctness

Problem

An ANSI join for a table has been specified, but the conditions specified link it to multiple, different tables simultaneously.

Example

SELECT RT.TransactionID
      ,U.UserName
      ,V.VehicleType
      ,U.PreferredLocation
  FROM RentalRegion R
  JOIN RentalDepot D
    ON D.RegionID = R.RegionID
  JOIN RentalTransactions RT
    ON RT.DepotID = D.DepotID
   AND RT.RegionID = R.RegionID
;

The RentalTransactions (RT alias) table is being joined on a compound key but those keys link to more than one prior table:

 ON RT.DepotID = D.DepotID --this joins to Depot
AND RT.RegionID = R.RegionID --this joins to Region
;

Frequently, it’s an easily missed error that can lead to significant performance problems because the database can’t execute a join across several distinct tables simultaneously. While the optimizer might intervene to internally rewrite the query plan as needed, it’s typically recommended to rectify the query where feasible. It’s also prudent to join multi-key columns to the same, single primary table for both readability and consistency reasons.

Solution

Specify a join condition that consistently links the multi-keys to the same, driving table. Among the options, it’s typically the one positioned lower in the hierarchy, both in terms of the data model and the JOIN sequence. Assuming a hierarchical structure like the Snowflake Dimensional Model is in place, it would simply take swapping the key being compared from the top-level table to its equivalent at a finer granularity. To illustrate, consider how the RentalRegion is joined to the RentalDepot using the RegionID columns. In subsequent joins, the RegionID from RentalRegion can be substituted with its equivalent column from RentalDepot therefore making RT being consistently joined to the latter table only.

SELECT RT.TransactionID
      ,U.UserName
      ,V.VehicleType
      ,U.PreferredLocation
  FROM RentalRegion R
  JOIN RentalDepot D
    ON D.RegionID = R.RegionID
  JOIN RentalTransactions RT
    ON RT.DepotID = D.DepotID
   AND RT.RegionID = D.RegionID
;

Legitimate use of the anti pattern

There might be some rather rare cases where a tables could be joined using different, independent keys all together, effectively creating a cycle (eg. triangle when looking at the data model). Here the transactional table can be naturally joined to its dimensions Vehicles and Users but the last two have some extra link (preferences) that potentially closes the cycle:

SELECT RT.TransactionID
      ,U.UserName
      ,V.VehicleType
      ,U.PreferredLocation
  FROM Users U
  JOIN Vehicles V
    ON V.Location = U.PreferredLocation
   AND V.VehicleType = U.PreferredVehicleType;
  JOIN RentalTransactions RT
    ON RT.VehicleID = V.VehicleID
   AND RT.UserID = U.UserID
;

In the given scenario, it would be arguably justified enough to keep linked to multiple tables as shown. However, for improved readability, it is still advisable to break the loop and shift that less commonly used, additional JOIN condition to the WHERE clause as an additional filter:

SELECT RT.TransactionID
      ,U.UserName
      ,V.VehicleType
      ,U.PreferredLocation
  FROM RentalTransactions RT
  JOIN Vehicles V
    ON V.VehicleID = RT.VehicleID
  JOIN Users U
    ON U.UserID = RT.UserID
 WHERE V.Location = U.PreferredLocation
   AND V.VehicleType = U.PreferredVehicleType;
;