A SQL MERGE assertion performs actions primarily based on a RIGHT JOIN

RIGHT JOIN is an esoteric characteristic within the SQL language, and rarely seen in the true world, as a result of virtually each RIGHT JOIN can simply be expressed as an equal LEFT JOIN. The next two statements are equal:

-- Common
SELECT c.first_name, c.last_name, p.quantity
FROM buyer AS c
LEFT JOIN fee AS p ON c.customer_id = p.customer_id

-- Esoteric
SELECT c.first_name, c.last_name, p.quantity
FROM fee AS p
RIGHT JOIN buyer AS c ON c.customer_id = p.customer_id

It’s not unreasonable to count on these two statements to provide the identical execution plan on most RDBMS, provided that they’re logically equal. Since we’ve grown used to studying issues from left to proper and prime to backside, I don’t assume RIGHT JOIN will change into extra standard any time quickly.

There’s, nonetheless, one place within the SQL language the place RIGHT JOIN is surprisingly ubiquitous!

The MERGE assertion

Why is it a shock? As a result of that place doesn’t use the identical syntax for becoming a member of two tables. However that’s precisely what occurs within the MERGE assertion. Let’s have a look at the next MERGE assertion that takes:

A staging desk (SOURCE desk) to load knowledge from
An atypical desk (TARGET desk) to retailer knowledge into

With a schema like this:

CREATE TABLE book_to_book_store (
  book_id BIGINT NOT NULL REFERENCES e book,
  title TEXT NOT NULL REFERENCES book_store,
  inventory INT NOT NULL,

  PRIMARY KEY (book_id, title)
);

CREATE TABLE book_to_book_store_staging AS 
SELECT * FROM book_to_book_store
WITH NO DATA;

A question that could possibly be typical of an ETL job:

-- The goal desk
MERGE INTO book_to_book_store AS t

-- The supply desk
USING book_to_book_store_staging AS s

-- The RIGHT JOIN predicate
ON t.book_id = s.book_id AND t.title = s.title

-- The actions for every row, primarily based on RIGHT JOIN matching
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED THEN INSERT (book_id, title, inventory) 
VALUES (s.book_id, s.title, s.inventory);

That is merely taking all of the rows from the BOOK_TO_BOOK_STORE_STAGING desk, and merges them into BOOK_TO_BOOK_STORE:

If the row already exists (there’s a MATCH), then the STOCK is up to date
If the row doesn’t exist already (there’s no MATCH), then the row is inserted

However we don’t use this supply -> goal syntactic order, we first specify the goal desk BOOK_TO_BOOK_STORE, after which we RIGHT JOIN the BOOK_TO_BOOK_STORE_STAGING desk to it. Give it some thought this fashion:

SELECT *
FROM book_to_book_store AS t
RIGHT JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.title = s.title

And, if we consider a RIGHT JOIN not as a Venn diagram, however as a cartesian product as follows, then it may be seen simply what is completed per MATCH or non-MATCH:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|            |         |       |Faraway Land|1        |9      | <-- NOT MATCHED
|Faraway Land|2        |10     |Faraway Land|2        |12     | <--     MATCHED
|Faraway Land|3        |10     |Faraway Land|3        |5      | <--     MATCHED
|            |         |       |Paper Path |1        |1      | <-- NOT MATCHED
|Paper Path |3        |2      |Paper Path |3        |0      | <--     MATCHED

As at all times with a RIGHT JOIN, each row from proper facet of the be a part of is matched with an identical row from the left facet of the be a part of, or an empty row of NULL values, if there’s no such match. After this MERGE, we wish the ensuing knowledge to be up to date as follows:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1        |9      |Faraway Land|1        |9      | <-- NOT MATCHED
|Faraway Land|2        |12     |Faraway Land|2        |12     | <--     MATCHED
|Faraway Land|3        |5      |Faraway Land|3        |5      | <--     MATCHED
|Faraway Land|1        |1      |Paper Path |1        |1      | <-- NOT MATCHED
|Paper Path |3        |0      |Paper Path |3        |0      | <--     MATCHED

That is how the MERGE assertion works.

Word, I stated earlier than that the JOIN is producing a cartesian product. Not like with SELECT statements, nonetheless, there’s a limitation to MERGE the place the cartesian product should not produce any duplicate matches per TARGET row, because the order of actions wouldn’t be outlined if there have been a number of SOURCE rows per TARGET row.

Deleting rows

MERGE is extra highly effective than simply performing INSERT and UPDATE. It may well additionally DELETE rows. Let’s assume that we wish a staging desk’s STOCK = 0 to imply that the row must be deleted, as an alternative of the STOCK being set to 0. Then we will write:

MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.title = s.title
WHEN MATCHED AND s.inventory = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED THEN INSERT (book_id, title, inventory) 
VALUES (s.book_id, s.title, s.inventory);

Now, with the above staging knowledge, we’ll take away the final row as an alternative of updating it:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1        |9      |Faraway Land|1        |9      | <-- NOT MATCHED : INSERT
|Faraway Land|2        |10     |Faraway Land|2        |12     | <--     MATCHED : UPDATE
|Faraway Land|3        |10     |Faraway Land|3        |5      | <--     MATCHED : UPDATE
|Paper Path |1        |1      |Paper Path |1        |1      | <-- NOT MATCHED : INSERT
|            |         |       |Paper Path |3        |0      | <--     MATCHED : DELETE

The RIGHT JOIN semantics remains to be the identical, simply the motion is totally different now, relying on the extra AND clause of the WHEN MATCHED clause.

Matching by supply

Some RDBMS assist an much more highly effective vendor particular variant of MERGE, which must be added to the IEC/ISO 9075 normal, in my view. The BY TARGET / BY SOURCE clause. Let’s take a look on the following assertion:

MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.title = s.title
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED BY TARGET THEN INSERT (book_id, title, inventory) 
VALUES (s.book_id, s.title, s.inventory)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

Including a WHEN NOT MATCHED BY SOURCE clause has the straightforward impact of turning the RIGHT JOIN operation right into a FULL JOIN operation. Consider it this fashion:

SELECT *
FROM book_to_book_store AS t
FULL JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.title = s.title

Now, the end result would possibly look one thing like this:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|            |         |       |Faraway Land|1        |9      | <-- NOT MATCHED BY TARGET
|Faraway Land|2        |10     |Faraway Land|2        |12     | <--     MATCHED
|Faraway Land|3        |10     |Faraway Land|3        |5      | <--     MATCHED
|            |         |       |Paper Path |1        |1      | <-- NOT MATCHED BY TARGET
|Paper Path |3        |2      |            |         |       | <-- NOT MATCHED BY SOURCE

The phrases NOT MATCHED BY TARGET and NOT MATCHED BY SOURCE are fairly self-explanatory when visualised as above, and possibly much less complicated to rookies than LEFT JOIN and RIGHT JOIN. I wouldn’t thoughts SQL syntax to be enhanced in a means that it might be attainable to determine whether or not a NULL worth originating from an OUTER JOIN is because of:

The supply knowledge containing the NULL worth
The row being NOT MATCHED by the “different facet” of the OUTER JOIN

Think about a hypothetical syntax like this:

SELECT c.first_name, c.last_name, p.quantity
FROM buyer AS c
LEFT JOIN fee AS p ON c.customer_id = p.customer_id
WHERE p IS NOT MATCHED BY JOIN -- Successfully an ANTI JOIN

Anyway…

When deleting rows, this method is rather more handy than having to depend on an interpretation of the semantics of information, equivalent to STOCK = 0 that means a deletion. We now have absent rows within the SOURCE desk (the staging) desk, which merely imply the row have to be deleted, if that’s how we wish to mannequin issues. So, after working the above MERGE assertion, we’ll get this consequence once more:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1        |9      |Faraway Land|1        |9      | <-- NOT MATCHED BY TARGET : INSERT
|Faraway Land|2        |12     |Faraway Land|2        |12     | <--     MATCHED           : UPDATE
|Faraway Land|3        |5      |Faraway Land|3        |5      | <--     MATCHED           : UPDATE
|Faraway Land|1        |1      |Paper Path |1        |1      | <-- NOT MATCHED BY TARGET : INSERT
|            |         |       |            |         |       | <-- NOT MATCHED BY SOURCE : DELETE

At the least the next RDBMS assist the BY SOURCE and BY TARGET clauses:

Databricks
Firebird 5
PostgreSQL 17
SQL Server

Given how helpful that is, I’ll count on extra RDBMS to undertake this T-SQL syntax, quickly. jOOQ 3.20 has added assist for it, and a future model of jOOQ could emulate it by transferring the FULL JOIN into the USING clause.