Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Deleting duplicate data with PostgreSQL

Here is an update to a query I posted a while back for detecting duplicate data. It allows you to select all but one of the resulting duplicates, for easy deletion. It only works on PostgreSQL, but is pretty neat. It uses a window function!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
DELETE FROM users 
USING (
  SELECT id, first_value(id) OVER (
    PARTITION BY name ORDER BY created_at DESC
  ) first_id
  FROM users
  WHERE name IN (
    SELECT name 
    FROM users 
    GROUP BY name 
    HAVING count(name) > 1
  )
) dups
WHERE dups.id != dups.first_id AND users.id = dups.id;

The order by is optional, but handy if you need to select a particular row rather than just an arbitrary one. You need an extra sub-query because you can’t have window functions in a where clause.

For more tasty PostgreSQL tricks, check out my Meet PostgreSQL screencast, a steal at only $12 plug plug plug.

  1. Bogdan Gusiev says:

    If I get you right

    delete from users where id in (select u2.id from users u1, users u2 where u1.name = u2.name and u1.id < u2.id)

Post a comment


(lesstile enabled - surround code blocks with ---)

A pretty flower Another pretty flower