1. What is Tableau Prep?
"Tableau Prep is the visual, direct and smart way to clean and shape your data for analysis." - Tableau Prep Official Site.
Tableau Prep refers to Tableau Prep Builder and Tableau Prep Conductor. Builder is a desktop tool to develop data cleaning and shaping flows. Conductor is part of the Tableau Data Management Add-on. It enables flow publish from Builder to Tableau Server or Tableau Online and manages the published flows.
In my first impression, the Prep flow looks like an ETL process. It takes various inputs at the start of the flow, uses steps to shape data and finally generates outputs for further visualization. Below is the sample flow of Superstore that comes with Tableau Prep.
So Prep cleanses and shapes data before it is fed into Tableau for visualization. It helps Tableau extend its capability to partially take care of jobs usually done by SQL or ETL. Thanks to Prep, it enables Tableau to deal with more complicated business and system scenarios. In the meantime, a clean and highly usable data source can reduce Tableau's data processing workload and simplify the visualization as well.
In addition, Tableau recently released a new data model in its latest version. It introduces logical layer on the top of physical layer, strengthening its data modeling capability. Along with Prep, Tableau takes a big step forward in data processing.
2. What can Tableau Prep do?
You can use Tableau Prep Builder to develop various flows to transform your data. A Tableau Prep flow is composed of a series of steps. I would simply classify them into 3 groups.
- Input Step
- Shape Step
- Output Step
The input step plays as the inbound interfaces fetching data from a wide range of data sources, for example, csv files, Excel sheets, SQL/NoSQL databases, ERP databases, Cloud services and etc.
The shape step refers to Clean Step, Aggregate, Pivot, Join, Union, Script. Originally I'd prefer to take Join and Union out and put them into a separate group because join and union operations don't change the data contents. However, I changed my mind since you actually can add filters and calculations to Join and Union steps in Prep. Simply put, I place all of them into one group.
The output step defines where to save your data with what format. In Prep, you can choose to save output as a text file, an extract file or a published data source.
Below are the sample icons for these steps.
Input
Clean
Aggregate
Pivot
Join
Union
Script
Output
2.1 Input Step
For text data, you can specify a number of settings including header line, field separator, character set and so forth, as shown in the below picture.
You can also choose to input data from multiple files. Wildcard union function enables you to combine a bunch of files into one table, even some of them are located in subfolders.
In this step, you can add filters by clicking Filter Values button. A dialog pops up where you can write your filter calculations.
2.2 Clean Step
You can choose 3 views to show your data.
- Show profile pane
- Show data grid
- Show list view
Show profile pane helps you see the data variations for each column.
In this step, you can create calculated fields and define filters.
For a specific field, you can use More options to define more conversion rules as shown below.
Be noted that only a part of Tableau's functions are available for Prep. The functions, for example, table calculation functions, spatial functions, RAWSQL functions, third-party platform specific functions and so on, are NOT supported.
I am kind of excited to find out that Prep supports FIXED LOD expression. But EXCLUDE and INCLUDE are out of scope.
In terms of rank and row number calculations, please follow the below syntax.
{PARTITION [field]: {ORDERBY [field]: RANK() }
{PARTITION [field]: {ORDERBY[field]:ROW_NUMBER()}}
2.3 Join Step
You can specify 6 types of joins, which are inner join, left join, leftOnly join, right join, rightOnly join and full join. LeftOnly join equals the result of left join - inner join. The same rule also applies to rightOnly join.
You can create calculated fields and specify filters as well in this step.
2.4 Union Step
This step is similar to Join step but the relationship between data sets is union.
2.5 Aggregate Step
In this step, you can choose which fields to be grouped and which fields to be aggregated according to your analytical needs.
2.6 Pivot Step
You can choose either Columns to Rows or Rows to Columns pivot type. Here I would like to use a simple sample to demonstrate this step.
The input data has 3 measures that take 3 columns respectively, as shown by the below image.
Data in Prep before pivot processing is shown as below.
Drag and drop [Measure 1], [Measure 2] and [Measure 3] fields into Pivoted Fields pane. Pivot results can be viewed on the Pivot Results pane.
2.7 Script Step
This step allows you to integrate Prep with Rserve or Tableau Python Server. I have an idea to come up with a blog focusing on Tableau and Rserve after I gain more insights into Rserve later. Here please allow me to skip the details at this point.
2.8 Output Step
You can choose your output data to be saved to a file or to be published as a data source. With regard to file type, there are 3 types available including .hyper and .tde extract types.
2.9 Run Flows Automatically
The flows can be published to Tableau Server or Tableau Online. Prep Conductor can help you schedule, manage and monitor the flows. So that you can run the flows automatically and make the data fresh.
3. How to Build Your Flow?
On Prep Builder, click Add connections button (+) to add a connection to your flow.
Drag and drop tables to the flow building canvas.
Click (+) button next to the step to add a subsequent step. And add filters and calculated fields to the new step accordingly. Repeat this process to complete the Prep flow. At last use Output step to output the cleansed and shaped data in the format you preferred.
This is a quick view. There are more functionalities coming with Tableau Prep. Give it a try and enjoy the exploration.