Custom filtering on strings containing any of the selected words in the filter
Recently, I needed to create a query that could use a Periscope Data user entered custom filter to search a dataset of comments for keywords. Since the search was based on custom filters, there was no way of knowing how many keywords will be used at any given point in time. It could be one. It could be none. It could be n*whatever (math stuff). As a result, a bunch of OR arguments in my WHERE clause weren't going to work.
I'll use the dataset on Donald Trump's Tweets to help flesh out the example because tweets are a lot like comments with the one exception that tweets can't be null by nature and my comment dataset had plenty of nulls to handle for in the query. We'll call our custom filter Tweet_Contains and it is a user entered field. For the example the Tweet_Contains values are fake and collusion.
In this case Tweet_Contains renders out as 'fake, collusion'.
So this doesn't work:
select * from trump_tweets where tweet_text ilike [Tweet_Contains]
And this doesn't work:
select * from trump_tweets where [tweet_text=Tweet_Contains]
After banging my head against the wall for a while trying to adapt a (very insightful) regex replace solution using ilike and failing, I dug a little deeper into how custom filters render. That's when it really hit me how hard the 'fake, collusion' is to work with when you want to find all the tweets that contain either or both of those words, but regex felt like right path. So, I reformatted the custom field like this:
'(' || REGEXP_REPLACE ( lower('[Tweet_Contains]'), ', ' , ')|(' ) || ')'
Now I had a way to make the custom field values look like this: (fake)|(collusion). Yay! A regular expression. You can see it at work with a regex tester.
Then it became a case of laying in a regexp_substr and the field being filtered.
select * from trump_tweets where REGEXP_SUBSTR(lower(tweet_text), '(' || REGEXP_REPLACE ( lower('[Tweet_Contains]'), ', ' , ')|(' ) || ')' ) != ''
If you want to handle for null value in the custom filter so that your results don't come up blank when it isn't in use, you can do a little case statement:
select * from trump_tweets where case when lower('cromulent') in (lower('[Tweet_Contains|cromulent]')) then (tweet_text is null or tweet_text is not null) else REGEXP_SUBSTR(lower(tweet_text), '(' || REGEXP_REPLACE ( lower('[Tweet_Contains]'), ', ' , ')|(' ) || ')' ) != '' end
With this in place, I was able to get all of my comments listed out when the custom filter was blank and only get the comments containing the selected keywords when any words where entered. I also slapped the lower() on there to avoid any case issues. For my purposes, this was also nice because it would get other words based on parts of words like 'customer' would also find 'customers'. I also needed a default value in the customer filter that will not be entered as a search term. For fun, I used 'cromulent' in this example because it is not a word but really sounds like one.
Hope this is helpful for anyone looking for a way get "contains" functionality out of custom filters.