Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Duplicate Data

UPDATE: If you are on PostgreSQL, check this updated query, it’s more useful.

Forgotten to back validates_uniqueness_of with a unique constraint in your database? Oh no! Here is some SQL that will pull out all the duplicate records for you.

1
2
3
4
5
6
7
8
9
User.find_by_sql <<-EOS
  SELECT * 
  FROM users 
  WHERE name IN (
    SELECT name 
    FROM users 
    GROUP BY name 
    HAVING count(name) > 1);
EOS

You will need your own strategy for resolving the duplicates, since it is totally dependent on your data. Some ideas:

  • Arbitrarily deleting one of the records. Perhaps based on latest update time? Don’t forget about child records! If you have forgotten a uniqueness constraint it is likely you have also forgotten a foreign key, so you will have to delete child records manually.
  • Merge the records, including child records.
  • Manually resolving the conflicts on a case by case basis. Possible if there are not too many duplicates.
A pretty flower Another pretty flower