Skip to main content

Often at work, we will pull data from an upstream system into a delta table, delete the matching rows in the main table and insert the updated rows. This works great and is fast but has a major assumption : That the rows from the upstream system will NEVER disappear.

-- ============================================================================
-- How to Delete Rows That do not Exist in Another Table
--
-- Often at work, we will pull data from an upstream system into a delta table,
-- delete the matching rows in the main table and insert the updated rows. This
-- works great and is fast but has a major assumption : That the rows from the
-- upstream system will NEVER disappear.
--
-- As part of a process I am working on, I needed to be able to the delta table
-- run but then remove any rows that disappeared from the upstream system. Below
-- are a few ways you can do this. I suggest you use the first as it is the most
-- efficient of the three assuming you have a index on your key columns.
-- ============================================================================

--
-- Using LEFT JOIN/IS NULL:
DELETE FROM BLOB b
    LEFT JOIN FILES f ON f.id = b.fileid
    WHERE f.id IS NULL;

--
-- Using NOT EXISTS:
DELETE FROM BLOB
    WHERE NOT EXISTS(
        SELECT NULL FROM FILES f
        WHERE f.id = fileid
    );
--
-- Using NOT IN:
DELETE FROM BLOB
    WHERE fileid NOT IN (
        SELECT f.id FROM FILES f);