+tech Blog: Oracle

Showing posts with label Oracle. Show all posts

Thursday, August 3, 2023

Oracle Tips - UPDATE in Case of Referencing Another Table

Scenario

Oracle Tips - MERGE

Scenario

Oracle Tips - Multiple Tables INSERT

Scenario

Oracle for Analytics - SQL Modeling

SQL modeling in Oracle provides a way to do calculations like a spreadsheet does. It leverages the data processing power of Oracle database engine and immediate access to the large data sets. With the calculation rules specified, SQL modeling is another sophisticated technique for data analysis and reporting.

Let's work on a few samples together to get an overview of this technique. I created a view by the name of SALES_VIEW, which has 5 columns, YEAR, SEGMENT, CATEGORY, SALES and PROFIT. All the values for each column are displayed in the table below.

All models in this exercise will be built on this view.

The Basics

We start with a model that sums up Sales in the last 2 year by category and assign the values to 2020 respectively.

Here comes the SQL script.

SELECT
SEGMENT
,CATEGORY
,YEAR
,S
FROM SALES_VIEW_TBL
MODEL
RETURN ALL ROWS
PARTITION BY (SEGMENT)
DIMENSION BY (CATEGORY, YEAR)
MEASURES (SALES S)
UNIQUE DIMENSION
RULES UPSERT SEQUENTIAL ORDER
(
S['Appliance', 2020] = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018],
S['Furniture', 2020] = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018]
)
ORDER BY
SEGMENT
,CATEGORY
,YEAR
;

Below is the result we got. The records with gray background are newly generated by the model.

First of all, all the columns appearing in the model are categorized into 3 groups, partition, dimension and measure columns.

Partition columns. They divide the data set into logical blocks, in other words, define the boundaries for calculations. The concept is similar with partitions in the analytic functions.

Dimension columns. They are attributes describe the measures. A full combination of them can identify one cell in a partition.

Measure columns. These columns have numeric values and similar to facts in a multi-dimensional fact table.

In a model, you'll need to explicitly specify the 3 groups of columns.

Second, RULES block gives the instructions for how to compute the measures in the model. You can use SQL functions including analytic ones, not to mention arithmetic operators.

Third, a measure is reference by the dimensions. You can reference one measure as well as multiple measures. For example, S['Appliance', 2020] indicates one cell. CV function returns the current value of a dimension used in the left side of a computation formula.

Additionally, you can also specify some general calculation rules. Here are some examples.

RETURN ALL ROWS: This returns all selected including updated rows by the rules. RETURN UPDATED ROWS will return only those updated rows.

UNIQUE DIMENSION: This says the combination of partition and dimension columns uniquely identify one cell.

SEQUENTIAL ORDER: This means the rules will be executed in the written order in the clause.

Rules for SQL Modeling

As you might have noticed in the previous section, you can apply rules in the Model clause to cell calculations, row handling, execution order and etc.

For cell reference, here are some basic rules. Please be noted that the dimensions are the key to position a cell.

Single Cell Reference - Constants are used for dimensions, for example, S['Appliance', 2020].
Multi-Cell Reference - Here are the two examples for the left side and right side respectively, S['Appliance', YEAR >= 2018], MAX(S)['Appliance', Year BETWEEN 1998 AND 2002].
CV Function - It provides a means to use the dimension values on the right side which are referenced on the left side as illustrated in this formula, S['Appliance', 2020] = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018].

For determining cell behavior, you can use UPDATE, UPSERT, and UPSERT ALL. Here are some highlighted points.

UPDATE - If the referenced cell in the left side exists, its measure is updated with the calculation result. No new cells will not be inserted.

UPSERT - If the cell exists, its measure will be updated. If it doesn't exist, a new cell will be inserted.

UPSERT ALL - Simply put, it enables the model rules on the left side to have UPSERT capability. For example, S[ANY, 2020] = S[CV(), 2019] + S[CV(), 2018].

There are also some other general rules, for example,

Measures are the only updatable columns

In terms of execution order, all other clauses, except for Select Distinct and Order By, are evaluated before the Model clause

You can't use an aggregate or analytic function in either Select list or Order By list when you have a Model clause in your query

You can't use subqueries in the rules except in FOR loop

And etc.

Using Aggregate/Analytic Functions in SQL Modeling

To use aggregate or analytic functions in a model clause is straightforward. Basically you just follow the syntax of those functions, no magic here. Let's take a look at a couple of examples.

An Example of Using an Aggregate Function

SELECT
SEGMENT
,CATEGORY
,YEAR
,S
FROM SALES_VIEW
WHERE YEAR = 2019
MODEL
DIMENSION BY (SEGMENT, CATEGORY, YEAR)
MEASURES (SALES S)
RULES UPSERT
(
S['Retail', 'Sub-total', 2019] = SUM(S)[CV(SEGMENT), ANY, 2019],
S['Wholesale', 'Sub-total', 2019] = SUM(S)[CV(SEGMENT), ANY, 2019],
S['Total', '', 2019] = SUM(S)[ANY, 'Sub-total', 2019]
)
ORDER BY
SEGMENT
,YEAR
,CATEGORY
;

An Example of Using an Analytic Function

SELECT
YEAR
,SEGMENT
,CATEGORY
,S
,CSUM
FROM SALES_VIEW
MODEL
PARTITION BY ( YEAR )
DIMENSION BY ( SEGMENT, CATEGORY )
MEASURES (SALES S, 0 CSUM )
RULES UPSERT
(
CSUM[ANY, ANY] = SUM(S) OVER ( PARTITION BY YEAR
ORDER BY YEAR, SEGMENT, CATEGORY
ROWS UNBOUNDED PRECEDING )
)
ORDER BY
YEAR
,SEGMENT
,CATEGORY
;

Parallel Execution

When you have a number of processors in your database server, parallel computation is expected to be an efficient way to improve performance. Model clause computation can leverage it if you have PARTITION BY keys in the model. Accordingly data will be partitioned among the parallel slaves.

Oracle for Analytics - Pivot and Unpivot

Pivoting is a very useful technique in data analytics, which transforms rows into columns. On the contrary, unpivoting operation brings columns to rows. If you are familiar with pivot table in Excel, you can simply see this as a implementation at database level. Tableau also incorporates this technique into its data source layer.

Pivoting works as shown by the diagram below.

I am going to demonstrate them in the following sections by using the below data example, a view named by SALES_VIEW.

1 Introduction

In data warehousing environment, performance is an eye-catching theme as always. Materialized view provides an efficient and effective means to improve performance. It enables a mechanism to precompute summaries or expensive joins and store the results. Materialized view can be used for

improving query performance as well as providing replicated data.

Generally, there are 3 types of materialized view:

Materialized View with Aggregates
Materialized View Containing Only Joins
Nested Materialized Views

We'll create them one by one in the below sections.

2 How to Create a Materialized View?

In this demonstration, I use table OFA_ORDERS to keep measures, along with the other 3 dimensional tables such as OFA_CUSTOMERS, OFA_RETURNS and OFA_PRODUCTS. The model is shown as the diagram below.

ER Diagram

A materialized view is created by using CREATE MATERIALIZED VIEW statement. Let's create 3 types of materialized views one by one.

2-1 Create Materialized View Logs

Materialized view logs are not mandatory. However, if you want to use fast refresh, you will need to create materialized view logs. In case the detail table supports partition change tracking (PCT), the log on that table is not required.

The materialized view log creation statements used in this exercise are as follows.

CREATE MATERIALIZED VIEW LOG ON OFA_ORDERS WITH ROWID (ORDER_ID, ORDER_DATE, SHIP_DATE, SHIP_MODE, CUSTOMER_ID, PRODUCT_ID, SALES, QUANTITY, DISCOUNT, PROFIT) INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON OFA_PRODUCTS WITH ROWID (PRODUCT_ID, CATEGORY, SUB_CATEGORY, PRODUCT_NAME) INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON OFA_CUSTOMERS WITH ROWID (CUSTOMER_ID, CUSTOMER_NAME, SEGMENT, COUNTRY_REGION, CITY, STATE, REGION, POSTAL_CODE) INCLUDING NEW VALUES ;

CREATE MATERIALIZED VIEW LOG ON OFA_RETURNS WITH ROWID (ORDER_ID, RETURNED) INCLUDING NEW VALUES ;

2-2 Materialized View with Aggregates

The first example is to create a summary materialized view with a column alias list added to the CREATE statement.

CREATE MATERIALIZED VIEW MV_SALES_PRODUCT
(ORDER_DATE, PRODUCT_ID, CATEGORY, SUB_CATEGORY, PRODUCT_NAME, SUM_SALES, SUM_QUANTITY, SUM_PROFIT)
PARALLEL
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
ENABLE QUERY REWRITE
AS
SELECT
ORDER_DATE
,ORD.PRODUCT_ID
,CATEGORY
,SUB_CATEGORY
,PRODUCT_NAME
,SUM(SALES)
,SUM(QUANTITY)
,SUM(PROFIT)
FROM
OFA_ORDERS ORD
LEFT JOIN
OFA_PRODUCTS PRD
ON ORD.PRODUCT_ID = PRD.PRODUCT_ID
GROUP BY
ORDER_DATE
,ORD.PRODUCT_ID
,CATEGORY
,SUB_CATEGORY
,PRODUCT_NAME
;

2-3 Materialized View Containing Only Joins

When JOINs become very expensive, you can create a materialized view precalculating the joins as a solution, as shown by the following example. Then build queries and aggregate calculations upon the materialized view.

CREATE MATERIALIZED VIEW MV_SALES_JOIN
PARALLEL
BUILD IMMEDIATE
REFRESH FORCE
ENABLE QUERY REWRITE
AS
SELECT
ORD.ORDER_ID
,ORDER_DATE
,SHIP_DATE
,SHIP_MODE
,ORD.CUSTOMER_ID
,ORD.PRODUCT_ID
,SALES
,QUANTITY
,DISCOUNT
,PROFIT
,CATEGORY
,SUB_CATEGORY
,PRODUCT_NAME
,CUSTOMER_NAME
,SEGMENT
,COUNTRY_REGION
,CITY
,STATE
,CTM.REGION
,POSTAL_CODE
,RETURNED
FROM
OFA_ORDERS ORD
LEFT JOIN
OFA_PRODUCTS PRD
ON ORD.PRODUCT_ID = PRD.PRODUCT_ID
LEFT JOIN
OFA_CUSTOMERS CTM
ON ORD.CUSTOMER_ID = CTM.CUSTOMER_ID
LEFT JOIN
OFA_RETURNS RET
ON ORD.ORDER_ID = RET.ORDER_ID
;

2-4 Nested Materialized View

The below view, MV_SALES_ANALYTICS, is built on the join materialized view created in the above section, with a column alias assigned to each field.

CREATE MATERIALIZED VIEW MV_SALES_ANALYTICS
PARALLEL
BUILD IMMEDIATE
REFRESH COMPLETE
ENABLE QUERY REWRITE
AS
SELECT
TO_CHAR(ORDER_DATE, 'YYYYMM') ORDER_YM
,CATEGORY   CATEGORY
,SUB_CATEGORY                SUB_CATEGORY
,SEGMENT                     SEGMENT
,COUNTRY_REGION             COUNTRY_REGION
,CITY                         CITY
,STATE                         STATE
,REGION                      REGION
,SUM(SALES) SUM_SALES
,SUM(QUANTITY) SUM_QUANTITY
,SUM(PROFIT) SUM_PROFIT
FROM
MV_SALES_JOIN
GROUP BY
TO_CHAR(ORDER_DATE, 'YYYYMM')
,CATEGORY
,SUB_CATEGORY
,SEGMENT
,COUNTRY_REGION
,CITY
,STATE
,REGION
;

Using the above materialized view, we can carry out further analytic tasks. For example, we may want to compare a city's monthly profit with the top in that month.

SELECT
ORDER_YM
,STATE
,CITY
,SUM(SUM_SALES) TOTAL_SALES
,SUM(SUM_PROFIT) TOTAL_PROFIT
,SUM(SUM_PROFIT) / SUM(SUM_SALES) PROFIT_RATIO
,NTH_VALUE(SUM(SUM_PROFIT), 1) OVER
(PARTITION BY ORDER_YM ORDER BY SUM(SUM_PROFIT) DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) TOPPROFIT_YM
FROM
MV_SALES_ANALYTICS
GROUP BY
ORDER_YM
,STATE
,CITY
ORDER BY
ORDER_YM
,STATE
,CITY
;

3 Partitioned Materialized View

In data warehousing context, partitioning is a significant technique. Partitioning fact tables can provide you with many benefits in terms of scalability, administration, performance and etc. You can partition a materialized view as well.

CREATE MATERIALIZED VIEW MV_SALES_PRODUCT_PARTITIONED
(ORDER_DATE, PRODUCT_ID, CATEGORY, SUB_CATEGORY, PRODUCT_NAME, SUM_SALES, SUM_QUANTITY, SUM_PROFIT)
PARALLEL
PARTITION BY RANGE (ORDER_DATE) INTERVAL (NUMTOYMINTERVAL(1, 'YEAR'))
(PARTITION "P2014" VALUES LESS THAN (TO_DATE('2015-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')),
PARTITION "P2015" VALUES LESS THAN (TO_DATE('2016-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')))
BUILD IMMEDIATE
REFRESH FAST
ENABLE QUERY REWRITE
AS
SELECT
ORDER_DATE
,ORD.PRODUCT_ID
,CATEGORY
,SUB_CATEGORY
,PRODUCT_NAME
,SUM(SALES)
,SUM(QUANTITY)
,SUM(PROFIT)
FROM
OFA_ORDERS ORD
LEFT JOIN
OFA_PRODUCTS PRD
ON ORD.PRODUCT_ID = PRD.PRODUCT_ID
GROUP BY
ORDER_DATE
,ORD.PRODUCT_ID
,CATEGORY
,SUB_CATEGORY
,PRODUCT_NAME
;

4 Refresh Materialized View

There are 3 packages available for refreshing materialized view.

DBMS_MVIEW.FEFRESH - Refresh one or more materialized view.
DBMS_MVIEW.FEFRESH_ALL_MVIEWS - Refresh all materialized views.
DBMS_MVIEW.FEFRESH_DEPENDENT - Refresh all materialized views relying on the specified tables or materialized views.

For example, we can use this statement to refresh MV_SALES_JOIN materialized view created in Section 2-3.

EXEC DBMS_MVIEW.REFRESH('MV_SALES_JOIN', METHOD => 'C');

Additionally, atomic refresh option ensures the views refreshed as a whole in a single transaction. For example, the all views will be refreshed in a single transaction if atomic refresh is enabled in DBMS_MVIEW.FEFRESH_ALL_MVIEWS.

5 A Note about Compression

If a materialized view is not build on a prebuilt table, it consumes storage space for sure. The table below suggests how much storage space our materialized views take up. You can consider compression technique as an option to save space especially when your data is redundant.

Wednesday, May 13, 2020

Oracle - How to Retrieve Execution Plan?

SQL Optimizer in Oracle generates multiple plans for a SQL statement and chooses the one with the lowest cost.

1. A Simple Way

If you have the SQL statement at hand, you can simply execute the following commands to retrieve the execution plan.
Here I use "SELECT * FROM EMP" script as an example.

EXPLAIN PLAN FOR ( SELECT * FROM EMP );

SELECT PLAN_TABLE_OUTPUT FROM TABLE( DBMS_XPLAN.DISPLAY );

2. Retrieve Execution Plan Using SQL_ID

2.1 Get SQL_ID

Basically using the SQL script shown as below can fetch the SQL_ID of the intended SQL statement. MODULE column will help you identify who issued the statement.

SELECT MODULE, SQL_ID, CHILD_NUMBER, LAST_LOAD_TIME, SQL_TEXT
FROM V$SQL
WHERE SQL_FULLTEXT LIKE '%FROM EMP%'
ORDER BY MODULE, LAST_LOAD_TIME DESC ;

There are some points that you may need to pay attention to.

If your have a lengthy SQL statement, it is recommended to use SQL_FULLTEXT instead of SQL_TEXT in WHERE clause.
If you are using an Oracle Real Application Cluster, you may want to use GV$SQL rather than V$SQL, which retrieves information from all eligible instances.
LAST_LOAD_TIME help you identify the latest time when the query plan was loaded into the library cache.

2.2 Retrieve Execution Plan

After you get the proper SQL_ID, you can retrieve the Execution Plan via the following script. The 2nd parameter is CHILD_NUMBER, please make sure you have the right number as well.

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('4ttqgu8uu8fus', 0, 'TYPICAL'));

As you might have noticed, TYPICAL is specified as the 3rd parameter in this example. It indicates what level information to be retrieved, as listed below.

BASIC
TYPICAL
SERIAL
ALL

To use it properly, you may need to set a corresponding statistical level, which can be done by the following commands.
Check statistics level: SHOW PARAMETER STATISTICS_LEVEL
Change statistics level: ALTER SESSION SET STATISTICS_LEVEL

+tech Blog

Thursday, August 3, 2023

Oracle Tips - UPDATE in Case of Referencing Another Table

Scenario

Friday, June 16, 2023

Oracle Tips - MERGE

Scenario

Wednesday, June 14, 2023

Oracle Tips - Multiple Tables INSERT

Scenario

Monday, September 5, 2022

Oracle for Analytics - SQL Modeling

The Basics

Rules for SQL Modeling

Using Aggregate/Analytic Functions in SQL Modeling

An Example of Using an Aggregate Function

An Example of Using an Analytic Function

Parallel Execution

Oracle for Analytics - Pivot and Unpivot

Oracle for Analytics - Materialized View

1 Introduction

2 How to Create a Materialized View?

2-1 Create Materialized View Logs

2-2 Materialized View with Aggregates

2-3 Materialized View Containing Only Joins

2-4 Nested Materialized View

3 Partitioned Materialized View

4 Refresh Materialized View

5 A Note about Compression

Wednesday, May 13, 2020

Oracle - How to Retrieve Execution Plan?

1. A Simple Way

2. Retrieve Execution Plan Using SQL_ID

2.1 Get SQL_ID

2.2 Retrieve Execution Plan

React - Makeover in React: W3Schools How To

Labels

Wikipedia