PostgreSQL 9 and ruby full text search tricks
I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:
- Setup full text search
- Optimize search with triggers and indexes
- Use Postgres with Ruby on Rails 3
- Optimize indexes by including only the rows that you need
- Use database standards for more reliable queries
- Write powerful reports in only a few lines of code
- Convert an existing MySQL application to use Postgres
It’s a steal at only $12. You can buy it over at PeepCode.
In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:
1 2 3 4 |
CREATE TRIGGER posts_search_vector_refresh BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(search_vector, 'pg_catalog.english', body, title); |
That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$ DECLARE search TEXT; child_search TEXT; begin SELECT string_agg(author_name, ' ') INTO child_search FROM comments WHERE post_id = new.id; search := ''; search := search || ' ' || coalesce(new.title); search := search || ' ' || coalesce(new.body); search := search || ' ' child_search; new.search_index := to_tsvector(search); return new; end $$ LANGUAGE plpgsql; CREATE TRIGGER posts_search_vector_refresh BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE search_trigger(); |
Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document
in a text field on the model, then use a trigger to only index that field into our search_vector
field. The search_document
field can then easily be set from your ORM.
Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:
1 2 3 4 5 6 7 8 9 10 |
class Post include DataMapper::Resource include Searchable property :id, Serial property :title, String property :body, Text searchable :title, :body # Provides Post.search('keyword') end |
You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.