Wednesday, August 2, 2023

Tableau - Relationships : How LoD Works

In Tableau’s new data model, a logical layer is introduced over the physical layer. Tables on the logical layer are called logical tables, which are related to each other through the fields specified on the canvas. The physical layer is what the model used to be, where we can join and union tables. A very critical concept we would like to place emphasis on is that relationships don’t join all logical tables together as one merged table, indeed, they are like contracts describing how tables are related to each other. The timing for joining tables is upon analysis, namely, creating a viz. At that time, the join type will be determined based on the dimensions and measures used in the viz, and only the relevant data will be queried from the relevant tables. When click on View Data button on Data panel, we can see data records from each independent table but can’t see the combined records across all tables. 

As Tableau Help states, “Relationships are a dynamic, flexible way to combine data from multiple tables for analysis.”

As to relationships, the references listed on the bottom of this blog give a complete view. So, we won’t try to cover all the details here, on the contrary, we will look into several typical scenarios and see how relationships work, especially from the perspective of level of detail.

The model stated below will be used for the demonstration. TblSubA and TblSubB, joining on physical layer, form TblMain that acts as the main logical table. TblMain is related to TblHigher and TblSame and TblLower that have a higher, same and lower granularity against TblMain, respectively.

Data Model

 

Physical Layer 

Relationship between TblMain and TblHigher

 

Relationship between TblMain and TblLower 

Relationship between TblMain and TblSame

Sample records from each table are shown below.

Each table contains a measure.
Each table contains at least one record unmatched with the related table. 

TblMain has a granularity of Region – Branch – Employee.
 

Data Sample in TblMain

 TblHigher’s granularity is at Region level. One note, Constant field is not used in this post.


Data Sample in TblHigher 

TblSame take the same granularity as TblMain.

Data Sample in TblSame 

TblLower goes down to Product Category level compared with TblMain. It has a granularity of Region – Branch – Employee – Product Category. 

Data Sample in TblLower

 

Scenario : TblMain – TblHigher

First, we take a look at a classical case, calculating the ratio of (Sum of Sales / Sum of Goal) per region, which is previously solved by LOD expression or data blending. The LoD of the view is defined by Region which we get from TblHigher. Goal measure comes from the same table as Region dimension. Sales measure is taken from more granular table TblMain (Region – Branch – Employee).

Sum of Sales per region is correctly computed. So is Sum of Goal, as the following viz indicates. 

Then we add Sum of Sales / Sum of Goal to Marks, as a result, we get the exact ratio as expected. It is quite straightforward. And more, this data source can be published as one to Tableau server. 

Here is the summary data used in the view. 

Let’s do one more test, adding Branch from TblMain to the view. View’s LoD goes down to Region – Branch. This causes that Goal is duplicated as many as number of branches in a region, as illustrated in the screenshot below.

 

Scenario : TblMain – TblSame

When the tables have the same granularity, any LoD of view won’t produce duplicate data. For example, we would like to discover whether there is any correlation between Sales and Discount.

We bring Employee ID into the view. So, the view shares the same LoD with the data source which is made of TblMain and TblSame in this case.

Let’s try to roll the LoD up to Region. It works perfectly without any surprise. 

Since TblMain contains a record not matched in TblSame, and vice versa. It is a good example for us to explore more about how this is handled.

We drag and drop a dimension and a measure from TblMain to the view. This simply queries data from only TblMain, as we can see, there are no null values in the view. 

 Then, we add a measure from TblSame into the view, let’s see what will change.

Drag and drop Discount on Detail box in Marks. Obviously, a null value is included in the view. The underlying data also says so.


 If we add a dimension instead of a measure from TblSame to the view, will this make a difference? Yes, we won’t see a null value popping up on the view.

XnoneA is not findable in TblSame, so its Name(TblSame) is null; however, this won’t affect how the view presents.

For null values, we can use measure filter to filter them out.


Scenario : TblMain – TblLower

The pair works as similarly as TblMain – TblHigher does. When the LoD of view is kept not lower than TblMain’s, everything works well. Of course, null values show up in the view. But this is expected since we have measures from both tables in the view. 

LoD of View: Region


LoD of View: Region- Branch - Employee

What if we bring Product Category dimension to the view? LoD of view is less than that of TblMain. This leads to duplicated Sales (from TblMain), as the screenshot shows.

LoD of View: Region – Branch – Employee – Product Category


Scenario : TblMain – TblHigher – TblLower

This is multiple tables related scenario. In effect, we can imagine how they work based on the results of the above scenarios. So, we would like to test some cross-table calculations including LoD expressions here.

First, we look at the numbers at regional level, and we use text table for illustration this time. The measures are listed up as follows.

Goal from TblHigher (LoD: Region)
Sales from TblMain (LoD: Region – Branch – Employee)
Sales (TblLower) from TblLower (LoD: Region – Branch – Employee – Product Category) 

And, we also include a cross-table calculation into the view.

CompleteRatio = SUM([Sales]) / SUM([Goal]) 

The text table gives us the result exactly as expected. 

Subsequently, we start to add more dimensions to the view, in other words, lowering LoD. Let’s see what is going to happen. 

We add Branch (TblLower) to Rows shelf, as a subsequence, Goal is duplicated as Totals in chart tell because Goal comes from TblHigher.

Now, we create a new LoD calculation named by RegionalGoal and add it to the text table.

RegionalGoal = {FIXED [Region]:SUM([Goal])} / {FIXED [Region]:COUNT([Branch])} 

Total of column shows it removes the duplicates.

What if we add the following calculation to the view? 

RegionalGoal (TblLower) =

{FIXED [Region (TblLower)]:SUM([Goal])} / {FIXED [Region (TblLower)]:COUNT([Branch (TblLower)])}

 It gives a different result from RegionalGoal. 

What does cause the difference? Let’s examine the underlying data.

Divisors, {FIXED [Region]:SUM([Goal])} and {FIXED [Region (TblLower)]:SUM([Goal])}, produce the same result.


Dividends, {FIXED [Region] : COUNT([Branch])} and {FIXED [Region (TblLower)] : COUNT([Branch (TblLower)])}, produce different counts, as we can see from the below screenshots.

As we go down to TblLower, the number of branches associated with region Chubu is exactly same as that on the screenshot.

Noted LoD expressions behave differently against common calculations. Moreover, we can smell Order of Execution here. But we are not going to touch more in this blog.

For reference, if we modify the formula a little, using COUNTD in place of COUNT, it will give the same result as RegionalGoal. 

RegionalGoal (TblLower) =

{FIXED [Region (TblLower)]:SUM([Goal])} / {FIXED [Region (TblLower)]:COUNTD([Branch (TblLower)])}

 

Let’s move back to the track and see how the view looks if we add Name (Employee) and Product Category from TblLower to the view. So the view has the same LoD as TblLower.

After we checked out the text table, measures from TblLower or having TblLower’s LoD are computed correctly, as Totals indicate.

Regarding cross-table calculations, here is an excerpt from Tableau Help, for reference.

“Be aware that cross-table calculations use inner join query semantics by default. The calculation lives on top of the join between the logical tables that it references. Also, it is now possible to write row-level calculations across tables that result in outer joins.”


Scenario – Constant

 Let’s create a calculation called ConstantOne that is equal to 1 and add it to the view. The view simply shows 1 on the chart.

What about we drag and drop Branch from TblMain on Detail box in Marks? Sum of ConstantOne equals to the number of branches.

 Then, add Product Category from TblLower to the view. Sum of ConstantOne increases accordingly. 

So we can draw a conclusion, ConstantOne takes the LoD of dimensions in the view.


Relationships are used when combining data from different LoD, and support full outer joins As well. The related fields are fixed, not like Data Blending, we can assign key fields for each view. We can also adjust Cardinality and Referential Integrity settings on Data Source tab to optimize queries. After related necessary logical tables, we can publish them as one data source. But we can’t relate published data sources, and we can’t use calculated fields or geographic fields for defining relationships.


References:

 Relate Your Data – Tableau Help

Relationships, part 1: Introducing new data modeling in Tableau

Relationships, part 2: Tips and Tricks

Relationships, part 3: Asking questions across multiple related tables

No comments:

Post a Comment

AWS - Build A Serverless Web App

 ‘Run your application without servers’. The idea presented by the cloud service providers is fascinating. Of course, an application runs on...