In Tableau’s new data model, a logical
layer is introduced over the physical layer. Tables on the logical layer are
called logical tables, which are related to each other through the fields
specified on the canvas. The physical layer is what the model used to be, where
we can join and union tables. A very critical concept we would like to place
emphasis on is that relationships don’t join all logical tables together as one
merged table, indeed, they are like contracts describing how tables are related
to each other. The timing for joining tables is upon analysis, namely, creating
a viz. At that time, the join type will be determined based on the dimensions
and measures used in the viz, and only the relevant data will be queried from
the relevant tables. When click on View Data button on Data panel, we can see
data records from each independent table but can’t see the combined records
across all tables.
As Tableau Help states, “Relationships are a dynamic,
flexible way to combine data from multiple tables for analysis.”
As to relationships, the references listed
on the bottom of this blog give a complete view. So, we won’t try to cover all
the details here, on the contrary, we will look into several typical scenarios
and see how relationships work, especially from the perspective of level of
detail.
The model stated below will be used for the
demonstration. TblSubA and TblSubB, joining on physical layer, form TblMain
that acts as the main logical table. TblMain is related to TblHigher and
TblSame and TblLower that have a higher, same and lower granularity against
TblMain, respectively.
Data
Model
Physical
Layer
Relationship
between TblMain and TblHigher
Relationship
between TblMain and TblLower
Relationship
between TblMain and TblSame
Sample records from each table are shown below.
Each table contains a measure.
Each table contains at least one record
unmatched with the related table.
TblMain has a granularity of Region –
Branch – Employee.
Data
Sample in TblMain
TblHigher’s granularity is at Region level. One note, Constant field is not used in this post.
Data
Sample in TblHigher
TblSame take the same granularity as
TblMain.
Data
Sample in TblSame
TblLower goes down to Product Category
level compared with TblMain. It has a granularity of Region – Branch – Employee
– Product Category.
Data
Sample in TblLower
Scenario : TblMain – TblHigher
First, we take a look at a classical case,
calculating the ratio of (Sum of Sales / Sum of Goal) per region, which is
previously solved by LOD expression or data blending. The LoD of the view is
defined by Region which we get from TblHigher. Goal measure comes from the same
table as Region dimension. Sales measure is taken from more granular table
TblMain (Region – Branch – Employee).
Sum of Sales per region is correctly
computed. So is Sum of Goal, as the following viz indicates.
Then we add Sum of Sales / Sum of Goal to
Marks, as a result, we get the exact ratio as expected. It is quite
straightforward. And more, this data source can be published as one to Tableau server.
Here is the summary data used in the view.
Let’s do one more test, adding Branch from
TblMain to the view. View’s LoD goes down to Region – Branch. This causes that
Goal is duplicated as many as number of branches in a region, as illustrated in
the screenshot below.
Scenario : TblMain – TblSame
When the tables have the same granularity,
any LoD of view won’t produce duplicate data. For example, we would like to
discover whether there is any correlation between Sales and Discount.
We bring Employee ID into the view. So, the
view shares the same LoD with the data source which is made of TblMain and
TblSame in this case.
Let’s try to roll the LoD up to Region. It
works perfectly without any surprise.
Since TblMain contains a record not matched
in TblSame, and vice versa. It is a good example for us to explore more about
how this is handled.
We drag and drop a dimension and a measure from
TblMain to the view. This simply queries data from only TblMain, as we can see,
there are no null values in the view.
Then, we add a measure from TblSame into
the view, let’s see what will change.
Drag and drop Discount on Detail box in
Marks. Obviously, a null value is included in the view. The underlying data
also says so.
If we add a dimension instead of a measure from
TblSame to the view, will this make a difference? Yes, we won’t see a null
value popping up on the view.
XnoneA is not findable in TblSame, so its
Name(TblSame) is null; however, this won’t affect how the view presents.
For null values, we can use measure filter
to filter them out.
Scenario : TblMain – TblLower
The pair works as similarly as TblMain – TblHigher
does. When the LoD of view is kept not lower than TblMain’s, everything works well.
Of course, null values show up in the view. But this is expected since we have
measures from both tables in the view.
LoD of View: Region
LoD
of View: Region- Branch - Employee
What if we bring Product Category dimension
to the view? LoD of view is less than that of TblMain. This leads to duplicated
Sales (from TblMain), as the screenshot shows.
LoD
of View: Region – Branch – Employee – Product Category
Scenario : TblMain – TblHigher – TblLower
This is multiple tables related scenario.
In effect, we can imagine how they work based on the results of the above
scenarios. So, we would like to test some cross-table calculations including LoD
expressions here.
First, we look at the numbers at regional
level, and we use text table for illustration this time. The measures are listed up as follows.
Goal from TblHigher (LoD: Region)
Sales from TblMain (LoD: Region – Branch –
Employee)
Sales (TblLower) from TblLower (LoD: Region
– Branch – Employee – Product Category)
And, we also include a cross-table calculation into the view.
CompleteRatio = SUM([Sales]) / SUM([Goal])
The text table gives us the result exactly
as expected.
Subsequently, we start to add more
dimensions to the view, in other words, lowering LoD. Let’s see what is going
to happen.
We add Branch (TblLower) to Rows shelf, as a subsequence, Goal is
duplicated as Totals in chart tell because Goal comes from TblHigher.
Now, we create a new LoD calculation named
by RegionalGoal and add it to the text table.
RegionalGoal = {FIXED [Region]:SUM([Goal])}
/ {FIXED [Region]:COUNT([Branch])}
Total of column shows it removes the duplicates.
What if we add the following calculation to
the view?
RegionalGoal (TblLower) =
{FIXED [Region (TblLower)]:SUM([Goal])} / {FIXED
[Region (TblLower)]:COUNT([Branch (TblLower)])}
It gives a different result from RegionalGoal.
What does cause the difference? Let’s
examine the underlying data.
Divisors, {FIXED [Region]:SUM([Goal])} and {FIXED
[Region (TblLower)]:SUM([Goal])}, produce the same result.
Dividends, {FIXED [Region] : COUNT([Branch])}
and {FIXED [Region (TblLower)] : COUNT([Branch (TblLower)])}, produce different
counts, as we can see from the below screenshots.
As we go down to TblLower, the number of
branches associated with region Chubu is exactly same as that on the screenshot.
Noted LoD expressions behave differently
against common calculations. Moreover, we can smell Order of Execution here. But
we are not going to touch more in this blog.
For reference, if we modify the formula a
little, using COUNTD in place of COUNT, it will give the same result as
RegionalGoal.
RegionalGoal (TblLower) =
{FIXED [Region (TblLower)]:SUM([Goal])} / {FIXED
[Region (TblLower)]:COUNTD([Branch (TblLower)])}
Let’s move back to the track and see how
the view looks if we add Name (Employee) and Product Category from TblLower to
the view. So the view has the same LoD as TblLower.
After we checked out the text table, measures
from TblLower or having TblLower’s LoD are computed correctly, as Totals indicate.
Regarding cross-table calculations, here is
an excerpt from Tableau Help, for reference.
“Be aware that cross-table calculations use
inner join query semantics by default. The calculation lives on top of the join
between the logical tables that it references. Also, it is now possible to
write row-level calculations across tables that result in outer joins.”
Scenario – Constant
Let’s create a calculation called
ConstantOne that is equal to 1 and add it to the view. The view simply shows 1
on the chart.
What about we drag and drop Branch from
TblMain on Detail box in Marks? Sum of ConstantOne equals to the number of branches.
Then, add Product Category from TblLower to
the view. Sum of ConstantOne increases accordingly.
So we can draw a conclusion, ConstantOne takes
the LoD of dimensions in the view.
Relationships are used when combining data
from different LoD, and support full outer joins As well. The related fields
are fixed, not like Data Blending, we can assign key fields for each view. We
can also adjust Cardinality and Referential Integrity settings on Data Source tab
to optimize queries. After related necessary logical tables, we can publish
them as one data source. But we can’t relate published data sources, and we can’t
use calculated fields or geographic fields for defining relationships.
References:
Relate Your Data – Tableau Help
Relationships, part 1: Introducing new data modeling in Tableau
Relationships, part 2: Tips and Tricks
Relationships, part 3: Asking questions across multiple related tables