Wednesday, May 27, 2020

Tableau Prep Builder - A Few Tips

Tableau Prep Builder provides a GUI for users to build a data transformation flow. The GUI is intuitive and easy-to-use. Moreover, an informative help with more than 500 pages comes with this product. All these ensure that you can smoothly unleash Prep Builder's power.
Here I give a demonstration for a few use cases.

1. Insert a Step


In case you want to insert a clean step between step monthly2019 and step Union 1, hover over the flow line until you see the (+) icon popping up.
Click (+) icon and select Clean Step on the context menu.


As a result, a new clean step is inserted.


Be noted that if you click the (+) button immediately next to step monthly2019, you will end up forking a separate sub-flow as shown by the below picture.


2. Wildcard Union


Wildcard Union is an efficient data consolidation function when you deal with a number of sheets holding similar data in multiple files. This function is available in Input step.
On Multiple Files tab, you can specify search conditions for files and sheets. Prep Builder will look up them on the specified folder and consolidate all the data into one table at one time. You don't have to create a Union step to do this.

Note: Wildcard Union also applies to tables in a database. It can combine data from multiple tables which meet the conditions that you specified. In this example, I use an Excel file for the demonstration.

Specify folder and search conditions in the input step.


Click Apply button to consolidate data. As you might notice, a new field Table Names is added into the table.


On the Data Grid pane, you can review the combined data.


3. Merge


In the light of Union/Join operations, columns with duplicated data are often produced, like Branch and Branch 1 in the below example.
You can merge them into one by using Merge function in Tableau Prep Builder. First, select the 2 columns on the List View pane and right click to show the context menu. Then click Merge menu item to merge them.
The first selected column Branch 1 is the target column. On the contrary, the latter selected Branch is the source column which is merged into the target one.


Prep Builder keeps all of the data from the target column and replaces its nulls with values from the source column. After merge, the source column is deleted.



4. Table Name Change


When you change the source table name, Prep Builder shows an alert with an exclamation mark on the input step. In the example, monthly2020 was renamed to monthly2020Updated intentionally.


To correct it, you can remove the problematic step from the flow. And drag the new table to the flow canvas. Hover over the unionSales step until you see Add, Union and Join icons showing up. Then drop the new table on Add icon.


Below is the flow with the new table monthly2020Updated added in.




5. LOD Expression


Simply click Create Calculated Fields button and type a LOD calculation on the pop-up dialog. Prep Builder only supports FIXED expression. It follows the same syntax and semantic as that in Tableau Desktop.
Additionally, Create Calculated Fields function seems not to be available in Input step, Aggregate step and Output step.


The calculation result can be viewed on the Data Grid pane.


Friday, May 22, 2020

Tableau Prep - A Quick Tour

1. What is Tableau Prep?



"Tableau Prep is the visual, direct and smart way to clean and shape your data for analysis."  - Tableau Prep Official Site.
Tableau Prep refers to Tableau Prep Builder and Tableau Prep Conductor. Builder is a desktop tool to develop data cleaning and shaping flows. Conductor is part of the Tableau Data Management Add-on. It enables flow publish from Builder to Tableau Server or Tableau Online and manages the published flows.

In my first impression, the Prep flow looks like an ETL process. It takes various inputs at the start of the flow, uses steps to shape data and finally generates outputs for further visualization. Below is the sample flow of Superstore that comes with Tableau Prep.

So Prep cleanses and shapes data before it is fed into Tableau for visualization. It helps Tableau extend its capability to partially take care of jobs usually done by SQL or ETL. Thanks to Prep, it enables Tableau to deal with more complicated business and system scenarios. In the meantime, a clean and highly usable data source can reduce Tableau's data processing workload and simplify the visualization as well.

In addition, Tableau recently released a new data model in its latest version. It introduces logical layer on the top of physical layer, strengthening its data modeling capability. Along with Prep, Tableau takes a big step forward in data processing.

 

2. What can Tableau Prep do?


You can use Tableau Prep Builder to develop various flows to transform your data. A Tableau Prep flow is composed of a series of steps. I would simply classify them into 3 groups.

  • Input Step
  • Shape Step
  • Output Step

The input step plays as the inbound interfaces fetching data from a wide range of data sources, for example, csv files, Excel sheets, SQL/NoSQL databases, ERP databases, Cloud services and etc.

The shape step refers to Clean Step, Aggregate, Pivot, Join, Union, Script. Originally I'd prefer to take Join and Union out and put them into a separate group because join and union operations don't change the data contents. However, I changed my mind since you actually can add filters and calculations to Join and Union steps in Prep. Simply put, I place all of them into one group.

The output step defines where to save your data with what format. In Prep, you can choose to save output as a text file, an extract file or a published data source.

Below are the sample icons for these steps.

        Input             
        Clean       
        Aggregate 
        Pivot          
        Join           
        Union        
        Script         
        Output      

2.1 Input Step


For text data, you can specify a number of settings including header line, field separator, character set and so forth, as shown in the below picture.
You can also choose to input data from multiple files. Wildcard union function enables you to combine a bunch of files into one table, even some of them are located in subfolders.
In this step, you can add filters by clicking Filter Values button. A dialog pops up where you can write your filter calculations.


2.2 Clean Step


You can choose 3 views to show your data.
  • Show profile pane
  • Show data grid
  • Show list view
Show profile pane helps you see the data variations for each column.
In this step, you can create calculated fields and  define filters.


For a specific field, you can use More options to define more conversion rules as shown below.


Be noted that only a part of Tableau's functions are available for Prep. The functions, for example, table calculation functions, spatial functions, RAWSQL functions, third-party platform specific functions and so on, are NOT supported.

I am kind of excited to find out that Prep supports FIXED LOD expression. But EXCLUDE and INCLUDE are out of scope.

In terms of rank and row number calculations, please follow the below syntax.

        {PARTITION [field]: {ORDERBY [field]: RANK() }

        {PARTITION [field]: {ORDERBY[field]:ROW_NUMBER()}}

2.3 Join Step


You can specify 6 types of joins, which are inner join, left join, leftOnly join, right join, rightOnly join and full join. LeftOnly join equals the result of left join - inner join. The same rule also applies to rightOnly join.
You can create calculated fields and specify filters as well in this step.


2.4 Union Step


This step is similar to Join step but the relationship between data sets is union. 


2.5 Aggregate Step


In this step, you can choose which fields to be grouped and which fields to be aggregated according to your analytical needs. 


2.6 Pivot Step


You can choose either Columns to Rows or Rows to Columns pivot type. Here I would like to use a simple sample to demonstrate this step. 

The input data has 3 measures that take 3 columns respectively, as shown by the below image.


Data in Prep before pivot processing is shown as below.


Drag and drop [Measure 1], [Measure 2] and [Measure 3] fields into Pivoted Fields pane. Pivot results can be viewed on the Pivot Results pane.


2.7 Script Step


This step allows you to integrate Prep with Rserve or Tableau Python Server. I have an idea to come up with a blog focusing on Tableau and Rserve after I gain more insights into Rserve later. Here please allow me to skip the details at this point.


2.8 Output Step


You can choose your output data to be saved to a file or to be published as a data source. With regard to file type, there are 3 types available including .hyper and .tde extract types. 


2.9 Run Flows Automatically


The flows can be published to Tableau Server or Tableau Online. Prep Conductor can help you schedule, manage and monitor the flows. So that you can run the flows automatically and make the data fresh.

3. How to Build Your Flow?


On Prep Builder, click Add connections button (+) to add a connection to your flow.


Drag and drop tables to the flow building canvas.



Click (+) button next to the step to add a subsequent step. And add filters and calculated fields to the new step accordingly. Repeat this process to complete the Prep flow. At last use Output step to output the cleansed and shaped data in the format you preferred.



This is a quick view. There are more functionalities coming with Tableau Prep. Give it a try and enjoy the exploration.

Friday, May 15, 2020

Tableau - How to Embed Tableau Viz into React?

Tableau has placed itself on a leading position in business intelligence area. React has taken the center stage as a user interface development library for single page applications. Here I will give a quick demo of how to embed a Tableau viz into React. With how to create a React project, it is not included in this blog.

1. Import Tableau js


In this demo, I put Tableau js import statement in head tag in Index.html.

    <head>
        <script type="text/javascript" src="http://public.tableau.com/javascripts/api/tableau-2.min.js">
        </script>
    </head>

2. Create Viz Object


In this example, I create a js file by the name of tableauviz.js. TableauViz class is defined in this file and exported as well. Below are the contents of the file.
   
    import React, { Component } from 'react';

    class TableauViz extends Component{

        componentDidMount(){

            //
            //-- Specify the div container for the viz.
            var containerDiv = document.getElementById("vizContainer"),

            //
            //-- Specify the URL of the viz.
            url = "http://public.tableau.com/views/AQuickGuideToReferencesForTableau/Main",

            //
            //-- Specify a set of options for the viz
            options = {
                hideTabs: false,
                hideToolbar: false,
                onFirstInteractive: function () {
                    //
                    //-- Define a callback function called when the viz object finishes instantiating.
               }
            };

            //
            //-- Create a viz object.
            this.vizContainer = new window.tableau.Viz(this.refs.vizContainer, url, options);

        }

        render(){
            return (
                <div id="vizContainer" />
            );
        }
    }

    export default TableauViz;

3. Use TableauViz in Other Component


To use TableauViz defined as above, you simply import and use it  as a normal component. Below is an example.

//-- TableauComponent.jsx --

    import TableauViz from "./js/tableauviz.js";

    class TableauComponent extends Component {
        render() {
            return (
                <div id="tableauPage">
                    <h2 className="text-center">Tableau Demo</h2>
                    <br />
                    <hr />

                    <TableauViz />

                </div>
            );
        }
    }

React - How to Consume REST Services by Using Axios?


1. Install Axios


Before use axios, we first need to install it in the development environment where we have Node installed beforehand.
The npm command to install axios is shown as below.

    npm add axios

2. Call REST APIs via Axios


To use axios, it is required to be imported in the .js file. Below is the import syntax.

    import axios from 'axios';

Here I use EndpointURL to represent the HTTP URL for REST APIs, use resource to indicate the object, and use resourceId as the unique identifier of the object.

2.1 Fetch Resource


Below is the snippet of codes for fetching a list of resources.

    fetchResources(){
        return axios.get( EndpointURL );
    }

Codes for fetching a resource by using its unique identifier.

    fetchResourceById( resourceId ){
        return axios.get( EndpointURL + '/' + resourceId );
    }

2.2 Add Resource


Codes for adding a new resource.

    addResource( resource ){
        return axios.post( EndpointURL, resource );
    }

2.3 Update Resource


Codes for updating a specified resource.

    updateResource( resource ){
        return axios.put( EndpointURL + '/' + resource.id, resource );
    }

2.4 Delete Resource


Codes for deleting a given resource.

    deleteResource( resourceId ){
        return axios.delete( EndpointURL + '/' + resourceId );
    }


The above calls are stored in a component called RestServices which is exported by using the below statement.

    export default new RestServices();

3. Use the Services in React Component


Below is the import statement for importing the services component.

    import RestServices from "./RestServices";

Now you can use the REST APIs for operations by simply calling them.

3.1 Load Resources


Load the list of the resources and store them in a local object called resources.

    loadResourceList() {
        RestServices.fetchResources()
            .then(response => {
                this.setState({resources: response.data, message: response.statusText})
            });
    }

A component in React has a 3-phase lifecycle, The 3 phases are mounting, updating and unmounting. During mounting phase, all elements are put into the DOM.

componentDidMount is a method in mounting phase called right after the component is rendered. You can place the fetch method within componentDidMount, so that you can view the list when you open the page.

    componentDidMount() {
        this.loadResourceList();
    }

Load a specified resource and save it in the state object.

    loadResource() {
        RestServices.fetchResourceById( resourceId )
            .then((response) => {
                let resource = response.data;
                this.setState({
                    id: resource.id,
                    property1: resource.property1,
                    property2: resource.property2
                })
            });
    }

3.2 Add/Update Resource


First, get the data from the state object.
To add a new resource, call RestServices.addResource( resource ).
To update an existing resource, call RestServices.updateResource( resource ).

Below is an example for new source addition.

    newResource = (e) => {
        e.preventDefault();
        let resource = { property1: this.state.property1, property2: this.state.property2 };
        RestServices.addResource( resource )
            .then(res => {
                this.setState({message : 'Resource added successfully.'});
            });
    }

3.3 Remove a Resource


To remove a resource, simply call the API to delete it.

    RestServices.deleteResource( resource.id )

Wednesday, May 13, 2020

Oracle - How to Retrieve Execution Plan?

SQL Optimizer in Oracle generates multiple plans for a SQL statement and chooses the one with the lowest cost.

1. A Simple Way


If you have the SQL statement at hand, you can simply execute the following commands to retrieve the execution plan.
Here I use "SELECT * FROM EMP" script as an example.


EXPLAIN PLAN FOR ( SELECT * FROM EMP );

SELECT PLAN_TABLE_OUTPUT FROM TABLE( DBMS_XPLAN.DISPLAY );


2. Retrieve Execution Plan Using SQL_ID


2.1 Get SQL_ID


Basically using the SQL script shown as below can fetch the SQL_ID of the intended SQL statement. MODULE column will help you identify who issued the statement.

SELECT MODULE, SQL_ID, CHILD_NUMBER, LAST_LOAD_TIME, SQL_TEXT
FROM V$SQL
WHERE SQL_FULLTEXT LIKE '%FROM EMP%'
ORDER BY MODULE, LAST_LOAD_TIME DESC ;

There are some points that you may need to pay attention to.

  • If your have a lengthy SQL statement, it is recommended to use SQL_FULLTEXT instead of SQL_TEXT in WHERE clause.
  • If you are using an Oracle Real Application Cluster, you may want to use GV$SQL rather than V$SQL, which retrieves information from all eligible instances. 
  • LAST_LOAD_TIME help you identify the latest time when the query plan was loaded into the library cache.


2.2 Retrieve Execution Plan


After you get the proper SQL_ID, you can retrieve the Execution Plan via the following script. The 2nd parameter is CHILD_NUMBER, please make sure you have the right number as well.

SELECT PLAN_TABLE_OUTPUT 
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('4ttqgu8uu8fus', 0, 'TYPICAL'));

 
As you might have noticed, TYPICAL is specified as the 3rd parameter in this example. It indicates what level information to be retrieved, as listed below.

  • BASIC
  • TYPICAL
  • SERIAL
  • ALL

To use it properly, you may need to set a corresponding statistical level, which can be done by the following commands.
Check statistics level: SHOW PARAMETER STATISTICS_LEVEL
Change statistics level: ALTER SESSION SET STATISTICS_LEVEL




Monday, May 11, 2020

Pentaho - Java Filter

Pentaho Kettle's support for Java and JavaScript greatly increases its flexibility to deal with various transformation needs. Here we take a look at the Java Filter step that you can write a Java expression to filter data records.

The overview transformation diagram is illustrated as below.


The data comes from EMP table which resides in an Orace XE database, as shown in the following image.



We want to pick out salesman's records. To make the condition a little bit more complicated, we want SAL to be not less than 1500 as well.



The matching data flows to "toFile" step, and the non-matching data goes to "toDummy" step.
Be noticed that non-matching destination step must be specified; otherwise you will get "failed to initialize" error when you try to execute the transformation.

Save the transformation and execute it. You will see the filter results shown by the below tables.

"toFile" destination step:


"toDummy" destination step:


Additionally, conditional operator (? :) can be used for sure.

    E.g. (DEPTNO == 10) ?  Boolean.TRUE : Boolean.FALSE

For dates comparison, use compareTo() method, for example, Date1.compareTo(Date2).

Tableau - How to color top N bars sorted by measure by sort order?

1. What to Achieve?


Here we will build a bar graph with multiple measures in the view such as Measure 1, Measure 2 and Measure 3, as shown in the below diagram. In this view, you can choose which measure to be sorted by, and you can specify the sort order as well, either ascending or descending in this example. Moreover, only the top N bars of the selected measure are highlighted with a different color.

2. How to Build?


2.1 Create Parameters


In this exercise, we create 3 parameters, [Sort by], [Order by] and [Top N]. 
[Sort by] is used to select a measure that the bars are sorted by. 
[Order by] is used for sort order selection.
[Top N] is used to determine how many bars on the top will be highlighted.
The definitions of the 3 parameters are listed up as below.

[Sort by]
    Data Type: String
    Value:
        Measure 1
        Measure 2
        Measure 3

[Order by]
    Data Type: String
    Value:
        Ascending
        Descending

[Top N]
    Data Type:  Integer
    Minimum:  1
    Maximum: 10
    Step Size:   1

2.2 Create Calculated Fields


The 2 calculated fields are created as following.
[selectedMeasure] is used to get the selected measure in response to parameter [Sort by].
[fieldForSort] is the field used for ordering. When you choose "Ascending", use [selectedMeasure] as the order field; on the contrary use -[selectedMeasure] when you select "Descending".

[selectedMeasure]
    CASE [Sort by]
        WHEN "Measure 1" THEN [Measure 1]
        WHEN "Measure 2" THEN [Measure 2]
        WHEN "Measure 3" THEN [Measure 3]
        ELSE NULL
    END

[fieldForSort]
    IIF([Order by]="Ascending", [selectedMeasure], -[selectedMeasure])

[Rank]
    INDEX()

2.3 Create Set


[setTopN] set is created on field [Product], with [fieldForSort] as the sort field. Parameter [Top N] is used to decide how many bars on the top of the graph will be included in this set.


2.4 Create Color Fields


Considering only the selected measure will be highlighted, here we create a separate color field for each measure, respectively. They will be placed on the color mark for every measure in the view. 

colorTopNm1
    IIF([setTopN] AND [Sort by]="Measure 1", 1, 0)

colorTopNm2
    IIF([setTopN] AND [Sort by]="Measure 2", 1, 0)

colorTopNm3
    IIF([setTopN] AND [Sort by]="Measure 3", 1, 0)

2.5 Drag and Drop Fields onto the View


Now it is time to put all the fields on the right place.
Columns: [Measure 1], [Measure 2], [Measure 3]
Rows:      [Rank], [Product]
Color Mark for [Measure 1]: [colorTopNm1]
Color Mark for [Measure 2]: [colorTopNm2]
Color Mark for [Measure 3]: [colorTopNm3]


For [Product] on the Rows card, use the Sort setting as shown by the below diagram.


3. One Note


Although Tableau offers a bunch of guidelines for performance tuning including how to write calculations, performance is not of concern in this example.

AWS - Build A Serverless Web App

 ‘Run your application without servers’. The idea presented by the cloud service providers is fascinating. Of course, an application runs on...