Understanding JOINS in Execution Plans of PostgreSQL

Posted By Sanjay Saini | 29-Sep-2018

Hi Guys ,

This blog is next part of Understanding execution plans in PostgreSQL series. In this blog, I explain to you how JOINS work in the execution plan. When there are many tables joined in the SQL Query, the optimizer requires to choose the best join algorithm. The optimizer also decides on the direction in which tables in the join are obtained.


Let's take a look at the methods performed in PostgreSQL.


Nested Loop

There are two main variants of the Nested Loop method:

Nested Loop With Inner Sequential Scan. For every part from the first table, it checks every row of the secondary table using the Sequential Scan method. If the join condition is satisfied, the row is delivered. The method can be very costly and is most frequently used for small tables.

Take a look at the example of a Nested Loop With Inner Sequential Scan:

EXPLAIN SELECT table1.name FROM table1 JOIN table2 ON (table1.id = table2.id) WHERE table1.id = 135;
                        QUERY PLAN
Nested Loop (cost=0.00..178.31 rows=320 width=28)
    -> Seq Scan on table1 (cost=0.00..152.15 rows=70 width=4)
        Filter: (id = 135::oid)
    -> Materialize (cost=0.00..38.02 rows=2 width=30)
        -> Seq Scan on table2 (cost=0.00..35.08 rows=2 width=30)
            Filter: (id = 135::oid)


The fixed limitation (table1.id = 135) has created the optimizer to pick the Nested Loop join method. For each row in table1, the optimizer performs a Sequential Scan of table2 to obtain the matching rows.

The next version, Nested Loop With Inner Index Scan, uses an index for the second table rather of the Sequential Scan. If we have an index on the id column in the table1 table and another index on the id column in the table2 table, the execution plan may look like this:

EXPLAIN SELECT table2.name FROM table1 JOIN table2 ON (table1.id = table2.id) WHERE table1.id = 135;

                        QUERY PLAN
Nested Loop (cost=0.00..18.65 rows=1 width=278)
-> Index Scan using itable1_id on table1 (cost=0.00..9.32 rows=1 width=4)
Index Cond: (id = 135::oid)
-> Index Scan using itable2_id on table2 (cost=0.00..9.32 rows=1 width=286)
Index Cond: (table2.id = 135::oid)

Again, the strong limitation has made the optimizer determine the Nested Loop join method. Because proper indices were present, the optimizer chooses to apply the Inner Index Scan version.


Hash Join

The Hash Join algorithm effects by planning a hash table of the smaller table on the join key. Every row is saved in the hash table at the position specified by a deterministic hash function. Next, the open table is looked, examining the hash table to locate the rows which satisfy the join condition. See the example:

EXPLAIN SELECT table2.name FROM table1 JOIN table2 ON (table1.id = table2.id) WHERE table2.id > 128;
                        QUERY PLAN
Hash Join (cost=42.78..812.82 rows=16532 width=32)
    Hash Cond: (table1.id = table2.id)
        -> Seq Scan on table1 (cost=0.00..114.23 rows=7234 width=38) -> Hash (cost=23.71..23.71 rows=380 width=4)
            -> Seq Scan on table2 (cost=0.00..23.71 rows=380 width=4)
                Filter: (id > 128)

Here, the Sequential Scan on table2 works as the input for the Hash Node, which builds the hash table. The table is then returned to Hash Join and the rows from the external table are read to look for races on the Hash Condition. This is a very useful algorithm, but it needs sufficient main memory to keep the whole hash table. 


Merge Join

The MergeJoin is alike to the MergeSort algorithm. Before the tables are linked, they are both ordered by the joint property. The tables are then scanned in parallel to find pairing values. Each row is scanned once given that there are no duplicates in the left table. This method is favored for large tables.

Consider the following:

EXPLAIN SELECT table2.name FROM table1 JOIN table2 ON (table1.id = table2.id);
                        QUERY PLAN
Merge Join (cost=723.68..1403.01 rows=58321 width=32)
    Merge Cond: (table2.id = table1.id)
    -> Sort (cost=72.31..78.74 rows=1087 width=6)
        Sort Key: table2.id
        -> Seq Scan on table2 (cost=0.00..21.27 rows= 1087 width=6)
    -> Sort (cost=801.13..826.54 rows=8634 width=36)
        Sort Key: table1.id
        -> Seq Scan on table1 (cost=0.00..154.68 rows= 8634 width=38)

The input data must be ordered on the join keys, from the Sort method applied for both tables. In this example, Sequential Scan has been managed to do so. The sorted tables are then moved to the Merge Join algorithm to look for parallel rows that meet on the Merge Condition.


Sanjay Saini

Request for Proposal

Recaptcha is required.

Sending message..