Custom filtering on strings containing any of the selected words in the filter

Recently, I needed to create a query that could use a Periscope Data user entered custom filter to search a dataset of comments for keywords. Since the search was based on custom filters, there was no way of knowing how many keywords will be used at any given point in time. It could be one. It could be none. It could be n*whatever (math stuff). As a result, a bunch of OR arguments in my WHERE clause weren't going to work.

I'll use the dataset on Donald Trump's Tweets to help flesh out the example because tweets are a lot like comments with the one exception that tweets can't be null by nature and my comment dataset had plenty of nulls to handle for in the query. We'll call our custom filter Tweet_Contains and it is a user entered field. For the example the Tweet_Contains values are fake and collusion

In this case Tweet_Contains renders out as 'fake, collusion'.

So this doesn't work:

select   *
from     trump_tweets
where    tweet_text ilike [Tweet_Contains]

And this doesn't work:

select   *
from     trump_tweets
where    [tweet_text=Tweet_Contains]

After banging my head against the wall for a while trying to adapt a (very insightful) regex replace solution using ilike and failing, I dug a little deeper into how custom filters render. That's when it really hit me how hard the 'fake, collusion' is to work with when you want to find all the tweets that contain either or both of those words, but regex felt like right path. So, I reformatted the custom field like this:

'(' || REGEXP_REPLACE (  lower('[Tweet_Contains]'), ', ' , ')|('  ) || ')'

Now I had a way to make the custom field values look like this: (fake)|(collusion). Yay! A regular expression. You can see it at work with a regex tester.

Then it became a case of laying in a regexp_substr and the field being filtered.

This works:

select  *
from    trump_tweets
where   REGEXP_SUBSTR(lower(tweet_text), '(' || REGEXP_REPLACE (  lower('[Tweet_Contains]'), ', ' , ')|('  ) || ')' ) != ''

If you want to handle for null value in the custom filter so that your results don't come up blank when it isn't in use, you can do a little case statement:

select  *
from    trump_tweets
where   case
          when lower('cromulent') in (lower('[Tweet_Contains|cromulent]')) then (tweet_text is null or tweet_text is not null)
          else REGEXP_SUBSTR(lower(tweet_text), '(' || REGEXP_REPLACE ( lower('[Tweet_Contains]'), ', ' , ')|(' ) || ')' ) != ''

With this in place, I was able to get all of my comments listed out when the custom filter was blank and only get the comments containing the selected keywords when any words where entered. I also slapped the lower() on there to avoid any case issues. For my purposes, this was also nice because it would get other words based on parts of words like 'customer' would also find 'customers'. I also needed a default value in the customer filter that will not be entered as a search term. For fun, I used 'cromulent' in this example because it is not a word but really sounds like one.

Hope this is helpful for anyone looking for a way get "contains" functionality out of custom filters.

Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
Like1 Follow
  • 2 wk agoLast active
  • 25Views
  • 1 Following