Last updated on:
Join is a fundamental operation in Oracle, which combines two datasets and returns the result set, as illustrated in the diagram below. The join condition describes how the tables are related to each other.
Join Tree
When we design a data model, we break down into entities, in other words, we try to normalize the tables and make sure business scenarios can be supported at the same time. Like the sample schema, HR, we have the employees and departments tables which store the data of employees and departments, respectively. When it comes to a case that we need to retrieve a given group of employees with the corresponding departments’ information, we join the tables and fetch the dataset that meets our requirements.
In this post, we are going to examine plain examples to see what joins we can have; then we’ll take a deeper dive into how Oracle operates joins behind the scene, for example, how it retrieves data, what method it chooses to join the two row sources, what optimizations we can carry out and etc.
The SQL statements used in this blog are available on GitHub.
Tables and Data for Demonstration
The tables, Employees and Departments, are used for demonstration, as shown in the diagram. They consist of minimum columns for higher readability.
With the same consideration, we load only several records into either of the tables. So, we can identify with less effort which records are included in the result dataset and which are not.
You may want to keep an eye on the last record in the Employees table which doesn’t have a Department_Id, as well as the last row in the Departments table whose Department_Id, 99, is not findable in Employees. As intentionally designed records, they will be included into or excluded from the joined dataset as the join type changes.
If any other database objects are created during the demonstration, we will explicitly mention them on the spot.
What Joins Can We Have?
i. Inner Join
Inner join perhaps is the mostly used join type in practice. It returns only the rows meeting the join conditions. The last row of Employees is not a part of the result set, nor is the last row of Departments.
We may be more familiar with this form of statement: omit the JOIN keyword; move the join condition to WHERE clause.
ii. Left Join
A left join returns not only the rows satisfying the join condition, but also the rows in the left table which don’t have a match in the right table.
The statement can be re-written as follows by using the Oracle join operator. However, the preceding is recommended.
iii. Right Join
Right join is similar with left join; but it fetches all the rows in the right table. As the example shows, the last row in the Departments table is returned.
For a right join, the join operator appears in the left side of the equality operator.
iv. Full Join
A full join is the union of a left join and a right join. All rows in the left table are fetched, and so are all rows in the right table.
The last row in the Employees table, the last row in the Departments table, both are included into the final dataset.
v. Semijoin
As for a semijoin, only the rows in the first dataset are returned which have a match in the subquery dataset.
Either of the following SQL statements yields the result dataset as expected.
vi. Antijoin
It could be a bit more difficult to understand antijoin from the letters. What it does is the contrary to semijoin, to be more specific, the rows are returned that do NOT have a matching record in the subquery dataset.
I’d like to call for your attention regarding null handling. Look at the example below. You may remember that the last record in the Employees table has its Deparement_Id set as null; but that record is not selected.
In a nutshell, null is not comparable in Oracle. We did a few tests below. As you can see, we must use “IS null” to decide whether the variable is null or not.
vii. Selfjoin
As its name tells, a selfjoin is to join a table with itself. The same table plays both roles: it is not only the left table, but also the right table. For example, if we want to know who reports to whom, a selfjoin of Employees can get us there.
viii. Cartesian Join
Here, we can do a simple mathematical calculation. We have 5 rows in Employees, 3 rows in Departments, so, a Cartesian join gets back 5x3, 15 rows in total.
How to do that? We just don’t specify a join condition between the tables.
It can be useful when we want to generate a large dataset during a performance test.
What If We Look from the Perspective of Join Conditions?
Let’s move away from the FROM clause, shed the spotlight on the join condition now. You might have noticed, in the inner join example, we used an equality operator in the join condition. This is called equijoin.
What if we have a non-equality operator like BETWEEN in the condition? Right, it is call nonequijoin. It is an inner join as well. The point here is: the rows satisfying the join condition are selected; the condition doesn’t have to contain an equality operator.
The band join example below is also a nonequijoin.
Let’s add one more condition to eliminate the rows comparing the same person, “AND emp1.Employee_Id != emp2.Employee_Id”.
The result comes out like this.
Of course, in a business case, we expect by far more complicated join conditions such as comparison conditions, logical conditions, pattern-matching conditions, null conditions and etc. As far as they can serve the business purpose, and meet the system and performance requirements, all is fine. No need to say, easiness to read is also important. That is why we always say: the simpler, the better.
A Quick Look at Execution Plan
When we submit a SQL statement, how does Oracle carry out the query? Execution plan, the answer is. An execution plan details every step Oracle executes to produce the result dataset.
Here is the plan of the inner join example. How to retrieve the execution plan? Please refer to this post: [Tips] Oracle – How to Retrieve Execution Plan.
[Id 2]: get all rows from Departments using full scan; this forms the outer row source.
[Id 3]: get all rows from Employees using full scan, filtering the dataset with “Department_Id IS NOT NULL”; this becomes the inner row source.
[Id 1]: join the preceding row sources using hash join, and return the result.
What happened in the hash join? Oracle uses Deparement_Id in the outer row source to generate a hash table; then scan the inner row source and use the same hash function to calculate the hash values of Department_Id; probes the hash table to check if a matching row exists.
These can be described by an execution tree, which shows the flow of joining the row sources step by step.
Execution Tree
Touch Optimizer
After we issued a SQL query via SQL Plus or SQL Developer, sooner or later, we will get the result dataset displayed on the window. But what happened on the server side? Once the server process received the statement, it performs a series of steps to come up with the result, as illustrated by the diagram below.
Stages of SQL Processing
Among these steps, Optimization is a core process, because it chooses the most optimal means of executing the SQL statement. The task is undertaken by a built-in software, called Optimizer. The optimizer generates multiple execution plans based on the statistics, assigns a numeric cost to each plan which is computed based on CPU usage, I/O load, interprocess communication load and etc., and picks the plan with the lowest cost.
Access path - how to retrieve rows from each source such as tables, indexes and table clusters;
Join method - what join methods are used to join each pair of row sources, nested loops join, hash join or sort merge join;
Join order - the outer row source and the inner row source for a join (order of joins in case of multiple tables being joined);
Other operations like filtering, sorting, grouping and etc.
The execution plan is passed down to Row Source Generation, which accordingly produces a binary program called iterative plan.
The database engine executes the program and returns the result set.
As for parsing, generally it checks syntax, semantics of the SQL statement, and transforms the statement to an internal data structure.
Let’s run a few tests to see how the join order changes when join type varies.
Join
Type |
Join
Method |
Outer
Row Source |
Inner
Row Source |
Inner Join |
Hash Join |
Departments |
Employees |
Left Join |
Hash Join Outer |
Employees |
Departments |
Right Join |
Hash Join Outer |
Departments |
Employees |
Inner Join
Left Join
Right Join
Here Come Join Methods
A join type along with join conditions are more like a set of rules, describing what we want to do; whereas, a join method shows us how the database, internally, physically, combines the row sources.
Hash Join
Hash join is useful to join larger datasets. Given the nature of hash join, the join conditions need to contain at least one equality condition.
In a hash join illustrated by the concept diagram, the database
- Computes the hash values of Deparement_Id in the Departments source to build a hash table;
- Scans the Employees source and use the same hash function to calculate Deparement_Id’s hash value for the first row;
- Probes the hash table to see if there are rows existent;
- If so, compares the join column, Department_Id, and join them if they are matched;
- Reads the next row from Employees and repeats [3] and [4] until reaches the end of this iteration.
Hash Join Concept
For the execution plan example, refer to “A Quick Look at Execution Plan” section.
Nested Loops Join
Nested loops join is suitable for small datasets. A nested loops joins is like nesting two for loops; but if there is an index built on a join key for the inner dataset, it becomes more efficient.
In this example, Departments is the outer row source and Employees is the inner row source.
The basic flow is briefly stated as below:
- Oracle fetches the first row from Departments and gets its Department_Id;
- Performs an index scan over employees_departments_id_idx to obtain rowids associating with the precding Department_Id;
- Joins the outer row and the inner row if there are rowids found;
- Reads the next row from Departments and repeats the above steps until all rows are processed.
Nested Loops Join Example
Note that the below index was created for this example.
Sort Merge Join
Sort merge join is a variation of nested loops join. Sort join operation sorts the dataset, whereas merge join operation joins the outer row source and the inner row source.
Sort Merge Join Example
Can We Specify a Join Method?
In some cases, we may want to try out other join methods to see if we could get a more efficient result, rather than the one recommended by the Optimizer. Yes, it is doable. Oracle provides a pathway to influence the optimizer, that is to use hints.
For example, by default, the inner join example uses hash join as the join method; but we can use USE_NL hint to instruct the optimizer to employ nested loops join instead.
USE_NL Hint
Besides USE_NL, we can use USE_HASH and USE_MERGE hints to specify hash join and sort merge join, respectively.
Not only join method, but also access path can be specified.