+tech Blog: Oracle for Analytics

SQL modeling in Oracle provides a way to do calculations like a spreadsheet does. It leverages the data processing power of Oracle database engine and immediate access to the large data sets. With the calculation rules specified, SQL modeling is another sophisticated technique for data analysis and reporting.

Let's work on a few samples together to get an overview of this technique. I created a view by the name of SALES_VIEW, which has 5 columns, YEAR, SEGMENT, CATEGORY, SALES and PROFIT. All the values for each column are displayed in the table below.

All models in this exercise will be built on this view.

The Basics

We start with a model that sums up Sales in the last 2 year by category and assign the values to 2020 respectively.

Here comes the SQL script.

SELECT
SEGMENT
,CATEGORY
,YEAR
,S
FROM SALES_VIEW_TBL
MODEL
RETURN ALL ROWS
PARTITION BY (SEGMENT)
DIMENSION BY (CATEGORY, YEAR)
MEASURES (SALES S)
UNIQUE DIMENSION
RULES UPSERT SEQUENTIAL ORDER
(
S['Appliance', 2020] = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018],
S['Furniture', 2020] = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018]
)
ORDER BY
SEGMENT
,CATEGORY
,YEAR
;

Below is the result we got. The records with gray background are newly generated by the model.

First of all, all the columns appearing in the model are categorized into 3 groups, partition, dimension and measure columns.

Partition columns. They divide the data set into logical blocks, in other words, define the boundaries for calculations. The concept is similar with partitions in the analytic functions.

Dimension columns. They are attributes describe the measures. A full combination of them can identify one cell in a partition.

Measure columns. These columns have numeric values and similar to facts in a multi-dimensional fact table.

In a model, you'll need to explicitly specify the 3 groups of columns.

Second, RULES block gives the instructions for how to compute the measures in the model. You can use SQL functions including analytic ones, not to mention arithmetic operators.

Third, a measure is reference by the dimensions. You can reference one measure as well as multiple measures. For example, S['Appliance', 2020] indicates one cell. CV function returns the current value of a dimension used in the left side of a computation formula.

Additionally, you can also specify some general calculation rules. Here are some examples.

RETURN ALL ROWS: This returns all selected including updated rows by the rules. RETURN UPDATED ROWS will return only those updated rows.

UNIQUE DIMENSION: This says the combination of partition and dimension columns uniquely identify one cell.

SEQUENTIAL ORDER: This means the rules will be executed in the written order in the clause.

Rules for SQL Modeling

As you might have noticed in the previous section, you can apply rules in the Model clause to cell calculations, row handling, execution order and etc.

For cell reference, here are some basic rules. Please be noted that the dimensions are the key to position a cell.

Single Cell Reference - Constants are used for dimensions, for example, S['Appliance', 2020].
Multi-Cell Reference - Here are the two examples for the left side and right side respectively, S['Appliance', YEAR >= 2018], MAX(S)['Appliance', Year BETWEEN 1998 AND 2002].
CV Function - It provides a means to use the dimension values on the right side which are referenced on the left side as illustrated in this formula, S['Appliance', 2020] = S[CV(CATEGORY), 2019] + S[CV(CATEGORY), 2018].

For determining cell behavior, you can use UPDATE, UPSERT, and UPSERT ALL. Here are some highlighted points.

UPDATE - If the referenced cell in the left side exists, its measure is updated with the calculation result. No new cells will not be inserted.

UPSERT - If the cell exists, its measure will be updated. If it doesn't exist, a new cell will be inserted.

UPSERT ALL - Simply put, it enables the model rules on the left side to have UPSERT capability. For example, S[ANY, 2020] = S[CV(), 2019] + S[CV(), 2018].

There are also some other general rules, for example,

Measures are the only updatable columns

In terms of execution order, all other clauses, except for Select Distinct and Order By, are evaluated before the Model clause

You can't use an aggregate or analytic function in either Select list or Order By list when you have a Model clause in your query

You can't use subqueries in the rules except in FOR loop

And etc.

Using Aggregate/Analytic Functions in SQL Modeling

To use aggregate or analytic functions in a model clause is straightforward. Basically you just follow the syntax of those functions, no magic here. Let's take a look at a couple of examples.

An Example of Using an Aggregate Function

SELECT
SEGMENT
,CATEGORY
,YEAR
,S
FROM SALES_VIEW
WHERE YEAR = 2019
MODEL
DIMENSION BY (SEGMENT, CATEGORY, YEAR)
MEASURES (SALES S)
RULES UPSERT
(
S['Retail', 'Sub-total', 2019] = SUM(S)[CV(SEGMENT), ANY, 2019],
S['Wholesale', 'Sub-total', 2019] = SUM(S)[CV(SEGMENT), ANY, 2019],
S['Total', '', 2019] = SUM(S)[ANY, 'Sub-total', 2019]
)
ORDER BY
SEGMENT
,YEAR
,CATEGORY
;

An Example of Using an Analytic Function

SELECT
YEAR
,SEGMENT
,CATEGORY
,S
,CSUM
FROM SALES_VIEW
MODEL
PARTITION BY ( YEAR )
DIMENSION BY ( SEGMENT, CATEGORY )
MEASURES (SALES S, 0 CSUM )
RULES UPSERT
(
CSUM[ANY, ANY] = SUM(S) OVER ( PARTITION BY YEAR
ORDER BY YEAR, SEGMENT, CATEGORY
ROWS UNBOUNDED PRECEDING )
)
ORDER BY
YEAR
,SEGMENT
,CATEGORY
;

Parallel Execution

When you have a number of processors in your database server, parallel computation is expected to be an efficient way to improve performance. Model clause computation can leverage it if you have PARTITION BY keys in the model. Accordingly data will be partitioned among the parallel slaves.

+tech Blog

Monday, September 5, 2022

Oracle for Analytics - SQL Modeling

The Basics

Rules for SQL Modeling

Using Aggregate/Analytic Functions in SQL Modeling

An Example of Using an Aggregate Function

An Example of Using an Analytic Function

Parallel Execution

No comments:

Post a Comment

Oracle - JOINs

Labels

Wikipedia