Wednesday, August 16, 2023

Python - Class


Class is a very important feature in Python, which gives Python the capability of supporting OOP. A class is declared with the keyword "class". Let's take a brief tour of it in this blog.

Noted that Jupyter Lab is used for demonstration. 


A Simple Class

Let's start with a simple class, which contains only one method. After instantiated the class, we can simply call the class method.




A Class with __init__() Method

We can add an __init__() method into our class to do some initialization work while an object is created. 



Python doesn't support overloading by default. So we can declare only one __init__() method. To be more precise, we can declare over one __init__() methods, but only the last one is valid. 

How to overload methods? There are several ways to achieve this, for example, we can use multiple dispatch decorator, assign a default value to an argument, or use dynamic arguments etc. Please check out the reference The Right Way To Overload Methods and Operators In Python for more details.



Class Variable and Instance Variable

A class variable belongs to the class, it is shared cross all instances created from this class. On the contrary, an instance variable belongs to that particular instance. 



What if we define another string class variable named by "_anothermsg"? Something interesting happens. 

When object "hp" is created, "_anothermsg" is assigned with a string "Hi Python". Later in the creation of object "sy",  it is  given another string "Hello". However, this doesn't update "_anothermsg" in object "hp". Simply put, "_anothermsg" behaves like an instance variable.



This doesn't mean something wrong. Python is born with its own philosophy, having its conventions. Overall, Python has very loose constraints. 

If we simply look at this from the perspective of pass-by-reference or pass-by-value, list is pass by reference, string is pass by value, whereas slice of a list is pass by value, meaning a copy of that slice is created.



Private Variables

Python doesn't have private keyword to declare a variable. If a variable or method is named with a leading underscore "_", it should be treated as a private part in practice.

In the meanwhile, if we define a variable preceding with double underscores "__",  the variable becomes inaccessible from outside the class, as illustrated by the following example.



Of course, we can use property or getter method to access it, which are described in the next section.

In nature, a variable prefixed with "__" is replaced with "_classname__variablename" in Python, for example, "__message" is replaced with "_HiPython__message". 




Property

A property is defined by using property decorator "@property". It provides access to a class or an instance variable.




Getter and Setter

First, we can implement these by using methods. As shown in the example below, "set_message()" method converts a string into uppercase and assign it to the instance variable "_message", whereas "get_message()" method returns "_messasge".



We can also do this in a more graceful fashion, using a combination of property and setter.



One of the merits we have a setter is that we can do some transformations before the assignment.



Inheritance

Inheritance is realized by placing the super class's name inside the parentheses when we define a sub class. Multiple inheritance is also supported by Python. 

"SubPython" is derived from "HiPython" in the following example.



Can a derived class add its own __init__() method? Yes, it is doable. In the below example, the sub class declares a __init__() method that calls the super class's __init__() within it.




Polymorphism

Although it is practicable to overload operators in Python,  we shed light on class method here.

For overloading, we can find workarounds even Python doesn't implement it as a default feature.

For overriding, let's have a look at the example below. SayHello() method is defined in the super class and re-defined in the derived class. 



Inside SubPython.SayHello(), we can also call the super class's method through super(). 

super().SayHello()
or 
super(SubPython, self).SayHello()




Dataclass

A dataclass is specialized in holding and representing data. It is declared by using @dataclass decorator, as a consequence, some special methods such as __init__() and __repr__() etc are automatically added into the class.

Here is a dataclass example using default __init__() and __repr__() methods.




We can also define our own __init__() and __repr__() methods, overriding the original ones, as shown in the below screenshot. 



Reference

Python Tutorial    by Guido van Rossum and the Python development team

Getters and Setters: Manage Attributes in Python

The Right Way To Overload Methods and Operators In Python

How to Use Python Data Classes in 2023 (A Beginner’s Guide)

Wednesday, August 2, 2023

Tableau - Relationships : How LoD Works

In Tableau’s new data model, a logical layer is introduced over the physical layer. Tables on the logical layer are called logical tables, which are related to each other through the fields specified on the canvas. The physical layer is what the model used to be, where we can join and union tables. A very critical concept we would like to place emphasis on is that relationships don’t join all logical tables together as one merged table, indeed, they are like contracts describing how tables are related to each other. The timing for joining tables is upon analysis, namely, creating a viz. At that time, the join type will be determined based on the dimensions and measures used in the viz, and only the relevant data will be queried from the relevant tables. When click on View Data button on Data panel, we can see data records from each independent table but can’t see the combined records across all tables. 

As Tableau Help states, “Relationships are a dynamic, flexible way to combine data from multiple tables for analysis.”

As to relationships, the references listed on the bottom of this blog give a complete view. So, we won’t try to cover all the details here, on the contrary, we will look into several typical scenarios and see how relationships work, especially from the perspective of level of detail.

The model stated below will be used for the demonstration. TblSubA and TblSubB, joining on physical layer, form TblMain that acts as the main logical table. TblMain is related to TblHigher and TblSame and TblLower that have a higher, same and lower granularity against TblMain, respectively.

Data Model

 

Physical Layer 

Relationship between TblMain and TblHigher

 

Relationship between TblMain and TblLower 

Relationship between TblMain and TblSame

Sample records from each table are shown below.

Each table contains a measure.
Each table contains at least one record unmatched with the related table. 

TblMain has a granularity of Region – Branch – Employee.
 

Data Sample in TblMain

 TblHigher’s granularity is at Region level. One note, Constant field is not used in this post.


Data Sample in TblHigher 

TblSame take the same granularity as TblMain.

Data Sample in TblSame 

TblLower goes down to Product Category level compared with TblMain. It has a granularity of Region – Branch – Employee – Product Category. 

Data Sample in TblLower

 

Scenario : TblMain – TblHigher

First, we take a look at a classical case, calculating the ratio of (Sum of Sales / Sum of Goal) per region, which is previously solved by LOD expression or data blending. The LoD of the view is defined by Region which we get from TblHigher. Goal measure comes from the same table as Region dimension. Sales measure is taken from more granular table TblMain (Region – Branch – Employee).

Sum of Sales per region is correctly computed. So is Sum of Goal, as the following viz indicates. 

Then we add Sum of Sales / Sum of Goal to Marks, as a result, we get the exact ratio as expected. It is quite straightforward. And more, this data source can be published as one to Tableau server. 

Here is the summary data used in the view. 

Let’s do one more test, adding Branch from TblMain to the view. View’s LoD goes down to Region – Branch. This causes that Goal is duplicated as many as number of branches in a region, as illustrated in the screenshot below.

 

Scenario : TblMain – TblSame

When the tables have the same granularity, any LoD of view won’t produce duplicate data. For example, we would like to discover whether there is any correlation between Sales and Discount.

We bring Employee ID into the view. So, the view shares the same LoD with the data source which is made of TblMain and TblSame in this case.

Let’s try to roll the LoD up to Region. It works perfectly without any surprise. 

Since TblMain contains a record not matched in TblSame, and vice versa. It is a good example for us to explore more about how this is handled.

We drag and drop a dimension and a measure from TblMain to the view. This simply queries data from only TblMain, as we can see, there are no null values in the view. 

 Then, we add a measure from TblSame into the view, let’s see what will change.

Drag and drop Discount on Detail box in Marks. Obviously, a null value is included in the view. The underlying data also says so.


 If we add a dimension instead of a measure from TblSame to the view, will this make a difference? Yes, we won’t see a null value popping up on the view.

XnoneA is not findable in TblSame, so its Name(TblSame) is null; however, this won’t affect how the view presents.

For null values, we can use measure filter to filter them out.


Scenario : TblMain – TblLower

The pair works as similarly as TblMain – TblHigher does. When the LoD of view is kept not lower than TblMain’s, everything works well. Of course, null values show up in the view. But this is expected since we have measures from both tables in the view. 

LoD of View: Region


LoD of View: Region- Branch - Employee

What if we bring Product Category dimension to the view? LoD of view is less than that of TblMain. This leads to duplicated Sales (from TblMain), as the screenshot shows.

LoD of View: Region – Branch – Employee – Product Category


Scenario : TblMain – TblHigher – TblLower

This is multiple tables related scenario. In effect, we can imagine how they work based on the results of the above scenarios. So, we would like to test some cross-table calculations including LoD expressions here.

First, we look at the numbers at regional level, and we use text table for illustration this time. The measures are listed up as follows.

Goal from TblHigher (LoD: Region)
Sales from TblMain (LoD: Region – Branch – Employee)
Sales (TblLower) from TblLower (LoD: Region – Branch – Employee – Product Category) 

And, we also include a cross-table calculation into the view.

CompleteRatio = SUM([Sales]) / SUM([Goal]) 

The text table gives us the result exactly as expected. 

Subsequently, we start to add more dimensions to the view, in other words, lowering LoD. Let’s see what is going to happen. 

We add Branch (TblLower) to Rows shelf, as a subsequence, Goal is duplicated as Totals in chart tell because Goal comes from TblHigher.

Now, we create a new LoD calculation named by RegionalGoal and add it to the text table.

RegionalGoal = {FIXED [Region]:SUM([Goal])} / {FIXED [Region]:COUNT([Branch])} 

Total of column shows it removes the duplicates.

What if we add the following calculation to the view? 

RegionalGoal (TblLower) =

{FIXED [Region (TblLower)]:SUM([Goal])} / {FIXED [Region (TblLower)]:COUNT([Branch (TblLower)])}

 It gives a different result from RegionalGoal. 

What does cause the difference? Let’s examine the underlying data.

Divisors, {FIXED [Region]:SUM([Goal])} and {FIXED [Region (TblLower)]:SUM([Goal])}, produce the same result.


Dividends, {FIXED [Region] : COUNT([Branch])} and {FIXED [Region (TblLower)] : COUNT([Branch (TblLower)])}, produce different counts, as we can see from the below screenshots.

As we go down to TblLower, the number of branches associated with region Chubu is exactly same as that on the screenshot.

Noted LoD expressions behave differently against common calculations. Moreover, we can smell Order of Execution here. But we are not going to touch more in this blog.

For reference, if we modify the formula a little, using COUNTD in place of COUNT, it will give the same result as RegionalGoal. 

RegionalGoal (TblLower) =

{FIXED [Region (TblLower)]:SUM([Goal])} / {FIXED [Region (TblLower)]:COUNTD([Branch (TblLower)])}

 

Let’s move back to the track and see how the view looks if we add Name (Employee) and Product Category from TblLower to the view. So the view has the same LoD as TblLower.

After we checked out the text table, measures from TblLower or having TblLower’s LoD are computed correctly, as Totals indicate.

Regarding cross-table calculations, here is an excerpt from Tableau Help, for reference.

“Be aware that cross-table calculations use inner join query semantics by default. The calculation lives on top of the join between the logical tables that it references. Also, it is now possible to write row-level calculations across tables that result in outer joins.”


Scenario – Constant

 Let’s create a calculation called ConstantOne that is equal to 1 and add it to the view. The view simply shows 1 on the chart.

What about we drag and drop Branch from TblMain on Detail box in Marks? Sum of ConstantOne equals to the number of branches.

 Then, add Product Category from TblLower to the view. Sum of ConstantOne increases accordingly. 

So we can draw a conclusion, ConstantOne takes the LoD of dimensions in the view.


Relationships are used when combining data from different LoD, and support full outer joins As well. The related fields are fixed, not like Data Blending, we can assign key fields for each view. We can also adjust Cardinality and Referential Integrity settings on Data Source tab to optimize queries. After related necessary logical tables, we can publish them as one data source. But we can’t relate published data sources, and we can’t use calculated fields or geographic fields for defining relationships.


References:

 Relate Your Data – Tableau Help

Relationships, part 1: Introducing new data modeling in Tableau

Relationships, part 2: Tips and Tricks

Relationships, part 3: Asking questions across multiple related tables

AWS - Build A Serverless Web App

 ‘Run your application without servers’. The idea presented by the cloud service providers is fascinating. Of course, an application runs on...