Deleting duplicate data with PostgreSQL
Here is an update to a query I posted a while back for detecting duplicate data. It allows you to select all but one of the resulting duplicates, for easy deletion. It only works on PostgreSQL, but is pretty neat. It uses a window function!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
DELETE FROM users USING ( SELECT id, first_value(id) OVER ( PARTITION BY name ORDER BY created_at DESC ) first_id FROM users WHERE name IN ( SELECT name FROM users GROUP BY name HAVING count(name) > 1 ) ) dups WHERE dups.id != dups.first_id AND users.id = dups.id; |
The order by is optional, but handy if you need to select a particular row rather than just an arbitrary one. You need an extra sub-query because you can’t have window functions in a where clause.
For more tasty PostgreSQL tricks, check out my Meet PostgreSQL screencast, a steal at only $12 plug plug plug.