Un-correlating Correlated Subqueries
A correlated subquery is a subquery that references a field from a table that is not included within the subquery. In other words, there are references to the "outer" query in the "inner" query. Correlated subqueries are not only computationally inefficient (they require re-running the subquery for each record in the dataset), but they also are not supported in some flavors of SQL. Here we will examine a correlated subquery and go over how to successfully "uncorrelate" it.
Let's say we have a hypothetical gaming company. Assume we have a table called monthly_spend that contains the following fields:
Next, we have a purchases table that contains the revenue generated from each user on our gaming platform. Below is a view of the first 10 rows from the table.
We want to see the months where total spend exceeds total revenue. This is accomplished by the following correlated subquery.
select monthly_spend.created_month , total_spend from monthly_spend where total_spend > ( select sum(price) from purchases where [purchases.created_at:month] = monthly_spend.created_month )
Notice how the where clause of the subquery references monthly_spend, which is called in the outer query.
To "uncorrelate" this subquery:
- Create a CTE / with clause that contains the monthly revenue.
- Join the monthly spend to the monthly revenue on the conditions that the dates match and the spend exceeds the revenue
with monthly_revenue as ( select [purchases.created_at:month] as created_month , sum(price) as total_revenue from purchases group by 1 ) select monthly_spend.created_month , total_spend from monthly_spend join monthly_revenue on monthly_spend.created_month = monthly_revenue.created_month and monthly_spend.total_spend > monthly_revenue.total_revenue
Not only is this computationally faster, it is also easier to read.
If you're on MySQL (and cannot write CTEs), then you can keep the subquery format but still uncorrelate it as follows:
select monthly_spend.created_month , total_spend from monthly_spend join( select [purchases.created_at:month] as created_month , sum(price) as total_revenue from purchases group by 1 ) monthly_revenue on monthly_spend.created_month = monthly_revenue.created_month and monthly_spend.total_spend > monthly_revenue.total_revenue