Creating a LIKE or wildcard search filter in Periscope, and extending search functionality with regexp_replace()!

Ever needed to perform wildcard filtering on a string column? You can achieve this using our Direct Replacement Filters plus the trick of concatenating a '%' onto the filter in the WHERE clause. 

Let's say you had a dataset on Donald Trump's Tweets loaded into your database, and you wanted to search for any tweets where Trump mentioned 'sad!' somewhere in the tweet (this will indeed return quite a number of tweets).

You'd first create a new filter, and select 'allow users to type in names and values'. Let's call it 'Wildcard'. 

All you need to do now is add the filter to your WHERE clause: 

select
*
from trump_tweets
where
tweet_text ILIKE '%' || '[Wildcard|%]' || '%'

This query will now return any row where the tweet_text contained 'sad', case-insensitive (due to the 'ilike' comparator).

 

If you're not on Redshift or Postgres, and you want a case-insensitve match, you'll likely do something like : 

select * from trump_tweets
where
lower(tweet_text) LIKE '%' || lower('[Wildcard|%]') || '%'

If you do care about case-sensitivity - say you wanted to find where Trump tweeted about 'CNN' - you'd change your WHERE to a LIKE:

where tweet_text LIKE '%' || '[Wildcard|%]' || '%'

Finally, you'll find regexp_replace() helpful if you want to match with punctuation removed. This is handy when you're, for example, querying a list of healthcare providers to find which ones are doctors and you want to match 'MD', 'M.D', 'M.D.', 'md', 'm.d.', etc. 

select *
from medical_professionals
where
REGEXP_REPLACE(full_name,'[^a-zA-Z0-9]','') ILIKE '%'||'[Wildcard|%]' || '%'

The above function removes all characters which are NOT (^) lowercase a-z, uppercase A-Z, or a number 0-9 (note: if matching names, you probably wouldn't need the 0-9) . Now  you can type in 'md' to return doctors, 'PhD' to find your scientists, or 'RN' to get a list of nurses. 

Note: Wildcard matching (especially with a leading wildcard) is inherently slow as it usually results in a full table scan. Additionally, screen your results after you query! Be aware that wildcard matching also inherently can return results you don't want. For example, if you want to find the word 'stat' in the wildcard filter, it'll also return 'statistician', 'statist', 'state',  and so on. Such is the nature of the wildcard search!

Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
Like2 Follow
  • 2 Likes
  • 1 yr agoLast active
  • 1034Views
  • 1 Following