Often at work, we will pull data from an upstream system into a delta table, delete the matching rows in the main table and insert the updated rows. This works great and is fast but has a major assumption : That the rows from the upstream system will NEVER disappear.
-- ============================================================================
-- How to Delete Rows That do not Exist in Another Table
--
-- Often at work, we will pull data from an upstream system into a delta table,
-- delete the matching rows in the main table and insert the updated rows. This
-- works great and is fast but has a major assumption : That the rows from the
-- upstream system will NEVER disappear.
--
-- As part of a process I am working on, I needed to be able to the delta table
-- run but then remove any rows that disappeared from the upstream system. Below
-- are a few ways you can do this. I suggest you use the first as it is the most
-- efficient of the three assuming you have a index on your key columns.
-- ============================================================================
--
-- Using LEFT JOIN/IS NULL:
DELETE FROM BLOB b
LEFT JOIN FILES f ON f.id = b.fileid
WHERE f.id IS NULL;
--
-- Using NOT EXISTS:
DELETE FROM BLOB
WHERE NOT EXISTS(
SELECT NULL FROM FILES f
WHERE f.id = fileid
);
--
-- Using NOT IN:
DELETE FROM BLOB
WHERE fileid NOT IN (
SELECT f.id FROM FILES f);