Automatically Transpose Any Table with SQL Only
Congratulations, if you've reached this page, you're about to gain a very valuable tool. For all of you that do not have Python/R on your site and cannot transpose a table in 1 simple line of code, I'm happy to present to you a custom solution that will allow you to easily and automatically transpose tables using SQL only.
Below: The end result; use filters & copy/paste to transpose a table
To do this, you'll need the following:
- A Periscope Site with Cache
- A cached table or a materialized view
- 20 Minutes to follow this walkthrough
As some of you may know, transposing a table manually using SQL only is a tedious process involving unions and case/when statements. For example, a guide on stackoverflow shows that a 6x2 table requires at least 18 lines of code to transpose. As the number of columns and rows increases, the code to transpose increases as well.
In this guide, we will build a dedicated dashboard that will generate code to transpose any table. Assuming you don't feed it gigantic tables, this should work for most use cases!
The finished product will look like the image below. Simply select your schema and table along with the column that should be the index for the new table. Copy the text from the updated chart and paste it into a new chart to get the transposed version.
- Create new dashboard
- Create a grandparent-parent-child trio of dashboard filters
- Create a materialized view
- Create the transposer chart
Step 1: Create a New Dashboard
This is the easiest step. We can create a new dashboard from the menu at the left, following these instructions. It's a good idea to name it something like Table Transposer so that others know how to use it. You might even add an extra chart that includes instructions for your users. Go ahead and remove any default filters from this dashboard
Step 2: Create a Materialized View
Normally, system tables don't like to join to non-system tables. However, we need a way to enumerate the columns and tables of the schemas on the cache. Luckily, there is a table in Redshift which can provide this information and be cached for future use and cross-database joins!
Let's go ahead and create a new view with the following SQL:
select svv_columns.table_schema, svv_columns.table_name, svv_columns.column_name, svv_columns.data_type, svv_columns.ordinal_position from svv_columns where svv_columns.table_schema not ilike 'pg_%' and svv_columns.table_schema != 'information_schema' union all select view_schema::varchar as table_schema, view_name::varchar as table_name, col_name::varchar as column_name, col_type::varchar as data_type, col_num::int as ordinal_position from pg_get_late_binding_view_cols() cols( view_schema name , view_name name , col_name name , col_type varchar , col_num int)
Right now, we don't need to worry about exactly how this query works, but I encourage you to give it a look over to find out for yourself!
Step 3: Create Filters for the Dashboard
This part is a bit more complex. We'll go through the three filters one by one, starting with the grandparent filter that references the view we just materialized.
Filter 1: Schema
First, we need something that lets users choose a schema and excludes system tables. Use the following code to generate a filter called schema:
select distinct table_schema, table_schema from [redshift_tables]
And make sure to enable radio button selection under More Options:
Filter 2: Table
The second filter will be a child of the first filter. This will let users select a table from the previously selected schema. Use the following code to generate the the filter values:
select distinct table_name, table_name, table_schema from [redshift_tables]
This filter needs to be set as a child of the previous filter as well as set to radio style buttons:
Filter 3: Column
The third and final column represents the index about which the table will be transposed. In other words, the data in the transposed table will have one row for each value of the column selected with this filter. It will make more sense when you give it a try yourself, trust me!
select distinct column_name , column_name, table_name from [redshift_tables]
This will also need to be set up as a child filter:
Step 4: Create the Main Chart
Okay, now that we have our infrastructure set up, we have to create a very clever chart that will output SQL. That's right, the display of this chart isn't data, but SQL Code that can be copied and pasted into a new chart. We'll change a couple chart settings too. For now, start with the following query:
-- --yes_cache select col_1, col_2, col_3, col_4, col_5, col_6, col_7, col_8, col_9 from ( select 'select value_description,' col_1 , null col_2 , null col_3 , null col_4 , null col_5 , null col_6 , null col_7 , null col_8 , null col_9 , 1 as sort_order union all select 'max(case when' , '[column]' , '::text = trim(translate(''' , replace([column], '''', '''''') , null , ''', chr(9),''''))then value::text else null end)' , case when regexp_count(SUBSTRING([column]::text, 1,1), '[a-zA-Z]') = 1 then REGEXP_REPLACE([column]::text, '[^a-zA-Z0-9]+', '_') else 'col_' || REGEXP_REPLACE([column]::text, '[^a-zA-Z0-9]+', '_') end , null , case when ROW_NUMBER() OVER(ORDER BY [column]) != (select count(*) from [schema].[table]) then ',' else null end , 2 as sort_order from [schema].[table] union all select 'from (' , null , null , null , null , null , null , null , null , 3 as sort_order union all select 'select ' || '[column],' , column_name , '::text value, trim(''' , null , column_name , ''') value_description from' , null , '[schema].[table]' , case when ROW_NUMBER() OVER(ORDER BY column_name) != (select count(*) from [redshift_tables] where table_schema = '[schema]' and table_name = '[table]' and column_name != '[column]') then 'union all' else null end , 4 as sort_order from [redshift_tables] where table_schema = '[schema]' and table_name = '[table]' and column_name != '[column]' union all select ') group by value_description' , null , null , null , null , null , null , null , null , 5 as sort_order order by sort_order) tbl
This is a very complex query that employs a lot of different tricks including window functions, unions, regular expressions, direct replacement filters, string casting, trimming, translating, and subqueries!
Feel free to dive into the code, but do not feel like you need to understand it to get benefit from it. Let's look at the chart settings we need to set to optimize this:
Hiding table line numbers and aligning the table to fill the space will allow easier copy pasting of the code. Save the chart and you should be good to go!
In the end, you should have an experience similar to the following gif: