Showing posts with label Oracle. Show all posts
Showing posts with label Oracle. Show all posts

Monday, September 5, 2022

Oracle for Analytics - SQL Modeling

SQL modeling in Oracle provides a way to do calculations like a spreadsheet does. It leverages the data processing power of Oracle database engine and immediate access to the large data sets. With the calculation rules specified, SQL modeling is another sophisticated technique for data analysis and reporting.

Let's work on a few samples together to get an overview of this technique. I created a view by the name of SALES_VIEW, which has 5 columns, YEAR, SEGMENT, CATEGORY, SALES and PROFIT. All the values for each column are displayed in the table below.


All models in this exercise will be built on this view. 


The Basics


We start with a model that sums up Sales in the last 2 year by category and assign the values to 2020 respectively. 
Here comes the SQL script.

SELECT
    SEGMENT
   ,CATEGORY
   ,YEAR
   ,S
FROM SALES_VIEW_TBL
MODEL
    RETURN ALL ROWS
    PARTITION BY (SEGMENT)
    DIMENSION BY (CATEGORY, YEAR)
    MEASURES (SALES S)
    UNIQUE DIMENSION
    RULES UPSERT SEQUENTIAL ORDER
    (
        S['Appliance', 2020] = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018],
        S['Furniture', 2020]  = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018]
    )
    ORDER BY
    SEGMENT
   ,CATEGORY
   ,YEAR
;

Below is the result we got. The records with gray background are newly generated by the model.


First of all, all the columns appearing in the model are categorized into 3 groups, partition, dimension and measure columns. 

  • Partition columns. They divide the data set into logical blocks, in other words, define the boundaries for calculations. The concept is similar with partitions in the analytic functions.

  • Dimension columns. They are attributes describe the measures. A full combination of them can  identify one cell in a partition. 

  • Measure columns. These columns have numeric values and similar to facts in a multi-dimensional fact table.
In a model, you'll need to explicitly specify the 3 groups of columns.

Second, RULES block gives the instructions for how to compute the measures in the model. You can use SQL functions including analytic ones, not to mention arithmetic operators. 

Third, a measure is reference by the dimensions. You can reference one measure as well as multiple measures. For example, S['Appliance', 2020] indicates one cell. CV function returns the current value of a dimension used in the left side of a computation formula. 

Additionally, you can also specify some general calculation rules. Here are some examples.

RETURN ALL ROWS: This returns all selected including updated rows by the rules. RETURN UPDATED ROWS will return only those updated rows. 

UNIQUE DIMENSION:  This says the combination of partition and dimension columns uniquely identify one cell.

SEQUENTIAL ORDER: This means the rules will be executed in the written order in the clause.

 

Rules for SQL Modeling


As you might have noticed in the previous section, you can apply rules in the Model clause to cell calculations, row handling, execution order and etc.

For cell reference, here are some basic rules. Please be noted that  the dimensions are the key to position a cell.

Single Cell Reference - Constants are used for dimensions, for example, S['Appliance', 2020].

Multi-Cell Reference - Here are the two examples for the left side and right side respectively, S['Appliance', YEAR >= 2018], MAX(S)['Appliance', Year BETWEEN 1998 AND 2002].

CV Function - It provides a means to use the dimension values on the right side which are referenced on the left side as illustrated in this formula, S['Appliance', 2020] = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018].


For determining cell behavior, you can use UPDATE, UPSERT, and UPSERT ALL. Here are some highlighted points.

UPDATE - If the referenced cell in the left side exists, its measure is updated with the calculation result. No new cells will not be inserted.

UPSERT -  If the cell exists, its measure will be updated. If it doesn't exist, a new cell will be inserted.

UPSERT ALL -  Simply put, it enables the model rules on the left side to have UPSERT capability. For example, S[ANY, 2020] = S[CV(), 2019] + S[CV(), 2018].


There are also some other general rules, for example,

  • Measures are the only updatable columns
  • In terms of execution order, all other clauses, except for Select Distinct and Order By, are evaluated before the Model clause

  • You can't use an aggregate or analytic function in either Select list or Order By list when you have a Model clause in your query

  • You can't use subqueries in the rules except in FOR loop
  • And etc.



Using Aggregate/Analytic Functions in SQL Modeling


To use aggregate or analytic functions in a model clause is straightforward. Basically you just follow the syntax of those functions, no magic here. Let's take a look at a couple of examples.

An Example of Using an Aggregate Function


SELECT
    SEGMENT
   ,CATEGORY
   ,YEAR
   ,S
FROM SALES_VIEW
WHERE YEAR = 2019
MODEL
    DIMENSION BY (SEGMENT, CATEGORY, YEAR)
    MEASURES (SALES S)
    RULES UPSERT
    (
        S['Retail', 'Sub-total', 2019]          = SUM(S)[CV(SEGMENT), ANY, 2019],
        S['Wholesale', 'Sub-total', 2019]  = SUM(S)[CV(SEGMENT), ANY, 2019],
        S['Total', '', 2019]                          = SUM(S)[ANY, 'Sub-total', 2019]
    )
    ORDER BY
    SEGMENT
   ,YEAR
   ,CATEGORY
;


An Example of Using an Analytic Function

  
SELECT
     YEAR
    ,SEGMENT
   ,CATEGORY
   ,S
   ,CSUM
FROM SALES_VIEW
MODEL
    PARTITION BY ( YEAR )
    DIMENSION BY ( SEGMENT, CATEGORY )
    MEASURES (SALES S, 0 CSUM )
    RULES UPSERT
    (
        CSUM[ANY, ANY] = SUM(S) OVER ( PARTITION BY YEAR
                                                                         ORDER BY         YEAR, SEGMENT, CATEGORY 
                                                                         ROWS UNBOUNDED PRECEDING )
    )
    ORDER BY
     YEAR
    ,SEGMENT
   ,CATEGORY
;



Parallel Execution


When you have a number of processors in your database server, parallel computation is expected to be an efficient way to improve performance. Model clause computation can leverage it if you have PARTITION BY keys in the model. Accordingly data will be partitioned among the parallel slaves.


Oracle for Analytics - Pivot and Unpivot

 Pivoting is a very useful technique in data analytics, which transforms rows into columns. On the contrary, unpivoting operation brings columns to rows. If you are familiar with pivot table in Excel, you can simply see this as a implementation at database level. Tableau also incorporates this technique into its data source layer.

Pivoting works as shown by the diagram below.


I am going to demonstrate them in the following sections by using the below data example, a view named by SALES_VIEW. 


Oracle for Analytics - Materialized View


1 Introduction


In data warehousing environment, performance is an eye-catching theme as always. Materialized view provides an efficient and effective means to improve performance. It enables a mechanism to precompute summaries or expensive joins and store the results. Materialized view can be used for 
improving query performance as well as providing replicated data.

Generally, there are 3 types of materialized view:
  • Materialized View with Aggregates
  • Materialized View Containing Only Joins
  • Nested Materialized Views
We'll create them one by one in the below sections.


2 How to Create a Materialized View?


In this demonstration, I use table OFA_ORDERS to keep measures, along with the other 3 dimensional tables such as OFA_CUSTOMERS, OFA_RETURNS and OFA_PRODUCTS. The model is shown as the diagram below.


ER Diagram

A materialized view is created by using CREATE MATERIALIZED VIEW statementLet's create 3 types of materialized views one by one.

2-1 Create Materialized View Logs


Materialized view logs are not mandatory. However, if you want to use fast refresh, you will need to create materialized view logs. In case the detail table supports partition change tracking (PCT), the log on that table is not required. 
The materialized view log creation statements used in this exercise are as follows.

CREATE MATERIALIZED VIEW LOG ON OFA_ORDERS WITH ROWID (ORDER_ID, ORDER_DATE, SHIP_DATE, SHIP_MODE, CUSTOMER_ID, PRODUCT_ID, SALES, QUANTITY, DISCOUNT, PROFIT) INCLUDING NEW VALUES;

 CREATE MATERIALIZED VIEW LOG ON OFA_PRODUCTS WITH ROWID (PRODUCT_ID, CATEGORY, SUB_CATEGORY, PRODUCT_NAME) INCLUDING NEW VALUES;

 CREATE MATERIALIZED VIEW LOG ON OFA_CUSTOMERS WITH ROWID (CUSTOMER_ID, CUSTOMER_NAME, SEGMENT, COUNTRY_REGION, CITY, STATE, REGION, POSTAL_CODE) INCLUDING NEW VALUES ;

CREATE MATERIALIZED VIEW LOG ON OFA_RETURNS WITH ROWID (ORDER_ID, RETURNED) INCLUDING NEW VALUES ;

 

2-2 Materialized View with Aggregates


The first example is to create a summary materialized view with a column alias list added to the CREATE statement.

CREATE MATERIALIZED VIEW MV_SALES_PRODUCT
(ORDER_DATE, PRODUCT_ID, CATEGORY, SUB_CATEGORY, PRODUCT_NAME, SUM_SALES, SUM_QUANTITY, SUM_PROFIT)
PARALLEL
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
ENABLE QUERY REWRITE
AS
SELECT
    ORDER_DATE
   ,ORD.PRODUCT_ID
   ,CATEGORY
   ,SUB_CATEGORY
   ,PRODUCT_NAME
   ,SUM(SALES)
   ,SUM(QUANTITY)
   ,SUM(PROFIT)
FROM
OFA_ORDERS ORD
LEFT JOIN 
OFA_PRODUCTS PRD
    ON ORD.PRODUCT_ID = PRD.PRODUCT_ID
GROUP BY
    ORDER_DATE
   ,ORD.PRODUCT_ID
   ,CATEGORY
   ,SUB_CATEGORY
   ,PRODUCT_NAME
;

2-3 Materialized View Containing Only Joins


When JOINs become very expensive, you can create a materialized view precalculating the joins as a solution, as shown by the following example. Then build queries and aggregate calculations upon the materialized view.
 
CREATE MATERIALIZED VIEW MV_SALES_JOIN
PARALLEL
BUILD IMMEDIATE
REFRESH FORCE
ENABLE QUERY REWRITE
AS
SELECT
    ORD.ORDER_ID
   ,ORDER_DATE
   ,SHIP_DATE
   ,SHIP_MODE
   ,ORD.CUSTOMER_ID
   ,ORD.PRODUCT_ID
   ,SALES
   ,QUANTITY
   ,DISCOUNT
   ,PROFIT
   ,CATEGORY
   ,SUB_CATEGORY
   ,PRODUCT_NAME
   ,CUSTOMER_NAME
   ,SEGMENT
   ,COUNTRY_REGION
   ,CITY
   ,STATE
   ,CTM.REGION
   ,POSTAL_CODE
   ,RETURNED
FROM
OFA_ORDERS ORD
LEFT JOIN 
OFA_PRODUCTS PRD
    ON ORD.PRODUCT_ID = PRD.PRODUCT_ID
LEFT JOIN 
OFA_CUSTOMERS CTM
    ON ORD.CUSTOMER_ID = CTM.CUSTOMER_ID
LEFT JOIN 
OFA_RETURNS RET
    ON ORD.ORDER_ID = RET.ORDER_ID
;

2-4 Nested Materialized View


The below view, MV_SALES_ANALYTICS, is built on the join materialized view created in the above section, with a column alias assigned to each field.

CREATE MATERIALIZED VIEW MV_SALES_ANALYTICS
PARALLEL
BUILD IMMEDIATE
REFRESH COMPLETE
ENABLE QUERY REWRITE
AS
SELECT
    TO_CHAR(ORDER_DATE, 'YYYYMM')  ORDER_YM
   ,CATEGORY                                                  CATEGORY
   ,SUB_CATEGORY                                        SUB_CATEGORY
   ,SEGMENT                                                    SEGMENT
   ,COUNTRY_REGION                                  COUNTRY_REGION
   ,CITY                                                             CITY
   ,STATE                                                           STATE 
   ,REGION                                                       REGION
   ,SUM(SALES)                                               SUM_SALES
   ,SUM(QUANTITY)                                      SUM_QUANTITY
   ,SUM(PROFIT)                                             SUM_PROFIT
FROM
MV_SALES_JOIN
GROUP BY
    TO_CHAR(ORDER_DATE, 'YYYYMM')
   ,CATEGORY
   ,SUB_CATEGORY
   ,SEGMENT
   ,COUNTRY_REGION
   ,CITY
   ,STATE
   ,REGION
;

Using the above materialized view, we can carry out further analytic tasks. For example, we may want to compare a city's monthly profit with the top in that month.

SELECT
   ORDER_YM
  ,STATE
  ,CITY
  ,SUM(SUM_SALES) TOTAL_SALES
  ,SUM(SUM_PROFIT) TOTAL_PROFIT
  ,SUM(SUM_PROFIT) / SUM(SUM_SALES) PROFIT_RATIO
  ,NTH_VALUE(SUM(SUM_PROFIT), 1) OVER 
  (PARTITION BY ORDER_YM ORDER BY SUM(SUM_PROFIT) DESC 
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) TOPPROFIT_YM
FROM
MV_SALES_ANALYTICS
GROUP BY
   ORDER_YM
  ,STATE
  ,CITY
ORDER BY
   ORDER_YM
  ,STATE
  ,CITY
;


3 Partitioned Materialized View


In data warehousing context, partitioning is a significant technique. Partitioning fact tables can provide you with many benefits in terms of scalability, administration, performance and etc. You can partition a materialized view as well.

CREATE MATERIALIZED VIEW MV_SALES_PRODUCT_PARTITIONED
(ORDER_DATE, PRODUCT_ID, CATEGORY, SUB_CATEGORY, PRODUCT_NAME, SUM_SALES, SUM_QUANTITY, SUM_PROFIT)
PARALLEL
PARTITION BY RANGE (ORDER_DATE) INTERVAL (NUMTOYMINTERVAL(1, 'YEAR'))
(PARTITION "P2014"  VALUES LESS THAN (TO_DATE('2015-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')),
PARTITION "P2015"  VALUES LESS THAN (TO_DATE('2016-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')))
BUILD IMMEDIATE
REFRESH FAST
ENABLE QUERY REWRITE
AS
SELECT
    ORDER_DATE
   ,ORD.PRODUCT_ID
   ,CATEGORY
   ,SUB_CATEGORY
   ,PRODUCT_NAME
   ,SUM(SALES)
   ,SUM(QUANTITY)
   ,SUM(PROFIT)
FROM
OFA_ORDERS ORD
LEFT JOIN 
OFA_PRODUCTS PRD
    ON ORD.PRODUCT_ID = PRD.PRODUCT_ID
GROUP BY
    ORDER_DATE
   ,ORD.PRODUCT_ID
   ,CATEGORY
   ,SUB_CATEGORY
   ,PRODUCT_NAME
;


4 Refresh Materialized View


There are 3 packages available for refreshing materialized view.
  • DBMS_MVIEW.FEFRESH - Refresh one or more materialized view.
  • DBMS_MVIEW.FEFRESH_ALL_MVIEWS - Refresh all materialized views.
  • DBMS_MVIEW.FEFRESH_DEPENDENT - Refresh all materialized views relying on the specified tables or materialized views.
For example, we can use this statement to refresh MV_SALES_JOIN materialized view created in Section 2-3.
EXEC DBMS_MVIEW.REFRESH('MV_SALES_JOIN', METHOD => 'C');

Additionally, atomic refresh option ensures the views refreshed as a whole in a single transaction. For example, the all views will be refreshed in a single transaction if atomic refresh is enabled in DBMS_MVIEW.FEFRESH_ALL_MVIEWS.


5 A Note about Compression


If a materialized view is not build on a prebuilt table, it consumes storage space for sure. The table below suggests how much storage space our materialized views take up. You can consider compression technique as an option to save space especially when your data is redundant.






Wednesday, May 13, 2020

Oracle - How to Retrieve Execution Plan?

SQL Optimizer in Oracle generates multiple plans for a SQL statement and chooses the one with the lowest cost.

1. A Simple Way


If you have the SQL statement at hand, you can simply execute the following commands to retrieve the execution plan.
Here I use "SELECT * FROM EMP" script as an example.


EXPLAIN PLAN FOR ( SELECT * FROM EMP );

SELECT PLAN_TABLE_OUTPUT FROM TABLE( DBMS_XPLAN.DISPLAY );


2. Retrieve Execution Plan Using SQL_ID


2.1 Get SQL_ID


Basically using the SQL script shown as below can fetch the SQL_ID of the intended SQL statement. MODULE column will help you identify who issued the statement.

SELECT MODULE, SQL_ID, CHILD_NUMBER, LAST_LOAD_TIME, SQL_TEXT
FROM V$SQL
WHERE SQL_FULLTEXT LIKE '%FROM EMP%'
ORDER BY MODULE, LAST_LOAD_TIME DESC ;

There are some points that you may need to pay attention to.

  • If your have a lengthy SQL statement, it is recommended to use SQL_FULLTEXT instead of SQL_TEXT in WHERE clause.
  • If you are using an Oracle Real Application Cluster, you may want to use GV$SQL rather than V$SQL, which retrieves information from all eligible instances. 
  • LAST_LOAD_TIME help you identify the latest time when the query plan was loaded into the library cache.


2.2 Retrieve Execution Plan


After you get the proper SQL_ID, you can retrieve the Execution Plan via the following script. The 2nd parameter is CHILD_NUMBER, please make sure you have the right number as well.

SELECT PLAN_TABLE_OUTPUT 
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('4ttqgu8uu8fus', 0, 'TYPICAL'));

 
As you might have noticed, TYPICAL is specified as the 3rd parameter in this example. It indicates what level information to be retrieved, as listed below.

  • BASIC
  • TYPICAL
  • SERIAL
  • ALL

To use it properly, you may need to set a corresponding statistical level, which can be done by the following commands.
Check statistics level: SHOW PARAMETER STATISTICS_LEVEL
Change statistics level: ALTER SESSION SET STATISTICS_LEVEL




React - Makeover in React: W3Schools How To

Last updated on: When it comes to building a React app, I get used to searching for a UI library to build the front pages, a straightforwar...