Performance
Correctness
An ANSI join for a table has been specified, but the conditions specified link it to multiple, different tables simultaneously.
Example
SELECT RT.TransactionID
,U.UserName
,V.VehicleType
,U.PreferredLocation
FROM RentalRegion R
JOIN RentalDepot D
ON D.RegionID = R.RegionID
JOIN RentalTransactions RT
ON RT.DepotID = D.DepotID
AND RT.RegionID = R.RegionID
;
The RentalTransactions (RT alias) table is being joined on a compound key but those keys link to more than one prior table:
ON RT.DepotID = D.DepotID --this joins to Depot
AND RT.RegionID = R.RegionID --this joins to Region
;
Frequently, it’s an easily missed error that can lead to significant performance problems because the database can’t execute a join across several distinct tables simultaneously. While the optimizer might intervene to internally rewrite the query plan as needed, it’s typically recommended to rectify the query where feasible. It’s also prudent to join multi-key columns to the same, single primary table for both readability and consistency reasons.
Specify a join condition that consistently links the multi-keys to the same, driving table. Among the options, it’s typically the one positioned lower in the hierarchy, both in terms of the data model and the JOIN sequence. Assuming a hierarchical structure like the Snowflake Dimensional Model is in place, it would simply take swapping the key being compared from the top-level table to its equivalent at a finer granularity. To illustrate, consider how the RentalRegion is joined to the RentalDepot using the RegionID columns. In subsequent joins, the RegionID from RentalRegion can be substituted with its equivalent column from RentalDepot therefore making RT being consistently joined to the latter table only.
SELECT RT.TransactionID
,U.UserName
,V.VehicleType
,U.PreferredLocation
FROM RentalRegion R
JOIN RentalDepot D
ON D.RegionID = R.RegionID
JOIN RentalTransactions RT
ON RT.DepotID = D.DepotID
AND RT.RegionID = D.RegionID
;
There might be some rather rare cases where a tables could be joined using different, independent keys all together, effectively creating a cycle (eg. triangle when looking at the data model). Here the transactional table can be naturally joined to its dimensions Vehicles and Users but the last two have some extra link (preferences) that potentially closes the cycle:
SELECT RT.TransactionID
,U.UserName
,V.VehicleType
,U.PreferredLocation
FROM Users U
JOIN Vehicles V
ON V.Location = U.PreferredLocation
AND V.VehicleType = U.PreferredVehicleType;
JOIN RentalTransactions RT
ON RT.VehicleID = V.VehicleID
AND RT.UserID = U.UserID
;
In the given scenario, it would be arguably justified enough to keep linked to multiple tables as shown. However, for improved readability, it is still advisable to break the loop and shift that less commonly used, additional JOIN condition to the WHERE clause as an additional filter:
SELECT RT.TransactionID
,U.UserName
,V.VehicleType
,U.PreferredLocation
FROM RentalTransactions RT
JOIN Vehicles V
ON V.VehicleID = RT.VehicleID
JOIN Users U
ON U.UserID = RT.UserID
WHERE V.Location = U.PreferredLocation
AND V.VehicleType = U.PreferredVehicleType;
;