Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Five Tips For Adding Foreign Keys To Existing Apps

You’re convinced foreign keys are a good idea, but how should you retroactively add them to your production application? Here are some tips to help you out.

Identify and fix orphan records. If orphan records exist, creating a foreign key will fail. Use the following SQL to identify children that reference a parent that doesn’t exist:

1
SELECT * FROM children LEFT JOIN parents ON parent_id = parents.id WHERE parents.id IS NULL

Begin with new or unimportant relationships. With any new change, it’s best to walk before you run. Targeting the most important relationships in your application head on can quickly turn into a black hole. Adding foreign keys to new or low value relationships first means you have a smaller code base that is affected, and allows you to test your test suite and plugins for compatibility over a smaller area. Get this running in production early, so any issues will crop up early on low value code where they’ll be easier to fix. Be agile in your approach and iterate.

Move away from fixtures and mocking in your tests. Rails fixture code is not designed to work well with foreign keys. (Fixtures are generally not a good idea regardless.) Also, the intense stubbing of models that was in vogue back when rspec first came on the scene doesn’t play nice either. The current best practice is to use object factories (such as Machinist) to create your test data, and this works well with foreign keys.

Use restrict rather than cascade for ON DELETE. You still want to keep on_destroy logic in your models, so even if conceptually a cascading delete makes sense, implement it using the :dependent => :destroy option to has_many, with a restrict option at the database level to ensure all cascading deletes run through your callbacks.

Be pragmatic. Ideally every relationship will have a foreign key, but for that model filled with weird hacks and supported by a massive old school test suite, it may be just too much effort to get everything working smoothly with database constraints. In this case, set up a test suite that runs over your production data regularly to quickly identify any data problems that arise (see the SQL above).

Foreign keys give you confidence and piece of mind about your data and your application. Rails may be afraid of them, but that doesn’t mean you have to be.

July through September I am running full day training sessions in the US and UK on how to make use of your database and write solid Rails code, increasing your quality without compromising your velocity. Chances are I’m coming to your city, so check it out at http://www.dbisyourfriend.com

  1. Nilesh says:

    "...implement it using the :dependent => :destroy option to has_many, with a restrict option at the database level to ensure all cascading deletes run through your callbacks."

    How would a Rails cascade destroy work with a MySQL RESTRICT? Wouldn't this cause a deadlock? You have an example to show this?

  2. Xavier Shay says:

    Rails cascades the destroy by loading each child object into memory then calling destroy. When there are no children left, the parent object is destroyed. Since there are no children left anymore, the restrict does not trigger an the object is able to be deleted.

    Whether or not this can deadlock totally depends on what else you're doing at the same time, but it's probably not likely.

  3. http://google.com/profiles/myron.marston says:

    "You still want to keep on_destroy logic in your models, so even if conceptually a cascading delete makes sense, implement it using the :dependent => :destroy option to has_many, with a restrict option at the database level to ensure all cascading deletes run through your callbacks."

    I don't think that it would cause any problems to both ON DELETE CASCADE in the DB and :dependent => :destroy--after all, Rails will load all the child records into memory and individually destroy each one, so by the time the parent record is deleted, there will no longer be any child records in the DB to cascade to. And I think there's a benefit to having it: I've found that there are (rare) occasions when you really just need to issue a mass delete SQL statement, and trust that it will cascade correctly. Obviously this shouldn't be the norm, but when you are doing some kind of mass schema migration it comes in handy, as issuing direct SQL that affects multiple records is much faster than doing everything through your AR models. And you have to be aware of the fact that you're bypassing your model logic. But there are definitely cases this comes in handy--so why not put the cascade in both your DB and the model?

  4. http://pjfitzgibbons.myopenid.com says:

    RE: ON DELETE CASCADE
    I actually like the restrict-only cascade and here's why : I've worked in apps where the bread-and-butter operation of the app lives down in the depths of the object hierarchy where a 7-way join to get your data is very common. In this scenario, writing queries into my favorite query-analyzer for maintenance, data-repair, or development-experiments is all too common.
    When in data-repair on production, it has been comforting to follow the restriction path, where one delete fails, leading to another delete... fails, leading to another delete... and on down to that one independent table that deletes ok. Then I have a clear picture of what is happening. In my case, there's too much data/objects/rules, etc to keep all that in my head.

    That's my 2c

  5. grandpa's child says:

    I'm looking for some guide on how to maintain custom (db-specific) constraints in a rails app; since schema.rb does not include these things. I'm aware that you can dump the schema in sql, but operations such as db:reset still uses the schema.rb without the constraints; what's the best way to go about adding and maintaining these little buggers this?

  6. grandpa's child says:

    going once...

  7. grandpa's child says:

    going twice...

  8. Xavier Shay says:

    Way too late, but you can change over the active_record config to dump using sql, then it bypasses schema.rb. Something like

    1
    
    config.active_record.schema_format = :sql
    

Post a comment


(lesstile enabled - surround code blocks with ---)

A pretty flower Another pretty flower