AP 19 - Split multi join queries into smaller chunks

Severity

Warning Caution

Type

Performance

Readability

Problem

The human brain can only store a small set of items in short term memory. That’s why we break longer pieces of information into smaller parts. This is commonly referred to as chunking, e.g. when we break up a long phone number into smaller pieces of information to remember it. The same problem exists with SQL when dealing with a large number of table joins. It gets hard for people to read, understand, and debug the code. As a rule of thumb we should not join more than five tables at a time. If possible, even reduce the number of tables you join to 3-4.

You may also get performance issues when joining a large number of tables. Each time you add a new table join, the number of possible explain plans that the CBO needs to look at grows exponentially. The optimizer might get overwhelmed as it does not have enough time to explore all of the possible plans. This may lead to poor explain plans and poor performance.

Solution

As a rule of thumb we should not join more than 5 tables together at a time. For queries with a large number of tables you should break down the query into smaller chunks that are easier to read and debug.

You can do this by using Common Table Expressions (CTE) or inline views. Let’s go through an example query with overloaded joins.

SELECT sb.buyinggroupname
      ,SUM(sordl.quantity*sordl.unitprice) total_sales
      ,SUM(sinvl.quantity*sinvl.unitprice) total_invoice
  FROM SALES_CUSTOMERS sc
  LEFT
  JOIN SALES_BUYINGGROUPS sb
    ON sc.buyinggroupid=sb.buyinggroupid
  LEFT
  JOIN APPLICATION_DELIVERYMETHODS ad
    ON sc.DELIVERYMETHODID=ad.DELIVERYMETHODID
  LEFT
  JOIN SALES_ORDERS sord
    ON sc.customerid=sord.customerid
  LEFT
  JOIN SALES_ORDERLINES sordl
    ON sord.orderId=sordl.orderid
  LEFT
  JOIN SALES_INVOICES sinv
    ON sord.orderid=sinv.orderid
  LEFT
  JOIN SALES_INVOICELINES sinvl
    ON sinv.invoiceid=sinvl.invoiceid
 GROUP BY sb.buyinggroupname

In this query we join seven tables. The query becomes hard to understand and debug. Let’s break it up into smaller chunks.

WITH customer AS(
    SELECT sc.customerid
          ,sc.customername
          ,sb.buyinggroupname
          ,ad.deliverymethodname
      FROM SALES_CUSTOMERS sc
      LEFT
      JOIN SALES_BUYINGGROUPS sb
        ON sc.buyinggroupid=sb.buyinggroupid
      LEFT
      JOIN APPLICATION_DELIVERYMETHODS ad
        on sc.DELIVERYMETHODID=ad.DELIVERYMETHODID)
 ,sales_order AS(
    SELECT sord.orderid
          ,sord.customerid
          ,sordl.quantity
          ,sordl.unitprice
          ,sordl.taxrate
          ,sordl.stockitemid
          ,sordl.orderlineid
      FROM SALES_ORDERS sord
      JOIN SALES_ORDERLINES sordl
        ON sord.orderId=sordl.orderid)
,invoice AS(
    SELECT sinv.invoiceid
          ,sinv.customerid
          ,sinv.orderid
          ,sinvl.invoicelineid
          ,sinvl.quantity
          ,sinvl.unitprice
      FROM SALES_INVOICES sinv
      JOIN SALES_INVOICELINES sinvl
        ON sinv.invoiceid=sinvl.invoiceid)
 SELECT customer.buyinggroupname
      ,SUM(sales_order.quantity*sales_order.unitprice) total_sales
      ,SUM(sales_order.quantity*sinvl.unitprice) total_invoice
  FROM customer
  LEFT
  JOIN sales_order
    ON customer.customerid=sales_order.customerid
  LEFT
  JOIN invoice
    ON sales_order.orderid=invoice.orderid
 GROUP BY customer.buyinggroupname

We have broken the query up into three logical parts

  • Information related to customer
  • Information to sales
  • Information to invoice

Using a CTE we can query each of the components separately for debugging purposes. Let’s look at visual representation of the query

Visual representation of the query

Let’s zoom in on customer CTE

Customer CTE

We can collapse the CTEs

Collapse CTEs

Legitimate use of the anti pattern

None