Un-correlating Correlated Subqueries

A correlated subquery is a subquery that references a field from a table that is not included within the subquery. In other words, there are references to the "outer" query in the "inner" query. Correlated subqueries are not only computationally inefficient (they require re-running the subquery for each record in the dataset), but they also are not supported in some flavors of SQL. Here we will examine a correlated subquery and go over how to successfully "uncorrelate" it.

Let's say we have a hypothetical gaming company. Assume we have a table called monthly_spend that contains the following fields:

  

Next, we have a purchases table that contains the revenue generated from each user on our gaming platform. Below is a view of the first 10 rows from the table.

 

We want to see the months where total spend exceeds total revenue. This is accomplished by the following correlated subquery.

select
  monthly_spend.created_month
  , total_spend
from
  monthly_spend
where
  total_spend > (
    select
      sum(price)
    from
      purchases
    where
      [purchases.created_at:month] = monthly_spend.created_month
  )

Notice how the where clause of the subquery references monthly_spend, which is called in the outer query. 

To "uncorrelate" this subquery:

  1. Create a CTE / with clause that contains the monthly revenue.
  2. Join the monthly spend to the monthly revenue on the conditions that the dates match and the spend exceeds the revenue
with monthly_revenue as (
    select
      [purchases.created_at:month] as created_month
      , sum(price) as total_revenue
    from
      purchases
    group by
      1
  )
select
  monthly_spend.created_month
  , total_spend
from
  monthly_spend
  join monthly_revenue on
    monthly_spend.created_month = monthly_revenue.created_month
    and monthly_spend.total_spend > monthly_revenue.total_revenue

Not only is this computationally faster, it is also easier to read.

If you're on MySQL (and cannot write CTEs), then you can keep the subquery format but still uncorrelate it as follows:

select
  monthly_spend.created_month
  , total_spend
from
  monthly_spend
  join(
    select
      [purchases.created_at:month] as created_month
      , sum(price) as total_revenue
    from
      purchases
    group by
      1
  )
  monthly_revenue on
    monthly_spend.created_month = monthly_revenue.created_month
    and monthly_spend.total_spend > monthly_revenue.total_revenue
Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
reply to topic
Like3 Follow
  • 3 Likes
  • 4 wk agoLast active
  • 57Views
  • 1 Following