Optimizing the Count of Active Accounts/Users Over Time
I recently wanted to create a dashboard in Periscope showing the count of active users in any given report period. The intuitive logic for this is to take the following steps:
- Create a record for every account/active period combination (some users may be suspended and reinstated), and for each of those records have a 'begin_period' column and an 'end_period' column. For users that are currently active their 'end_period' column is GETDATE().
- Join that table to a table that has a 'begin_report_period' and 'end_report_period' (using [daterange_start] and [daterange_end]), ON begin_period <= end_report_period AND end_period >= daterange_start
- Then do a DISTINCT count on the aggregated date.
The problem with this is approach is that join is a multiplying of #of report periods * number of active users in that report period, so if you have, say, 3-4 million active users and want to look at weekly aggregates over say, 3 years, that's over 500M records.... which of course fails (and for good reason -- no user wants to wait for that to run.) It works fine for annual and quarterly aggregates, but that's not always enough.
Has anyone else run into this challenge? What are you doing for it?
PS.... do I get a free Periscope shirt? :D
Storing distinct activity periods * users permutations into a table can sometimes be an untenable solution. That said, I still do it in some instances, because it's just that valuable. Would I do it for activity metrics like pageviews on a blog tangential to my business? Probably not. For revenue generating users engaging deep within my product on a weekly or monthly basis? Absolutely worth the execution time to run it overnight and cache the resultant table, or an aggregate of it.Reply
I totally agree with Ryan Iyengar here, the importance determines whether storing it on a daily basis is worth the investment from a data storage (100+M or even 1B+ records). Internally, in our Periscope site of Periscope, we have a daily roll-up View of all the essential metrics we'd like to see across all our customers (number of users per site, number of charts made on a given day, time on site). This allows us to quickly track the metrics over time across all or a subset and makes query structure for the core metrics dramatically easier.Reply