PostgreSQL 9 and ruby full text search tricks
I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:
- Setup full text search
 - Optimize search with triggers and indexes
 - Use Postgres with Ruby on Rails 3
 - Optimize indexes by including only the rows that you need
 - Use database standards for more reliable queries
 - Write powerful reports in only a few lines of code
 - Convert an existing MySQL application to use Postgres
 
It’s a steal at only $12. You can buy it over at PeepCode.
In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:
1 2 3 4  | 
  CREATE TRIGGER posts_search_vector_refresh BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(search_vector, 'pg_catalog.english', body, title);  | 
That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  | 
  CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$ DECLARE search TEXT; child_search TEXT; begin SELECT string_agg(author_name, ' ') INTO child_search FROM comments WHERE post_id = new.id; search := ''; search := search || ' ' || coalesce(new.title); search := search || ' ' || coalesce(new.body); search := search || ' ' child_search; new.search_index := to_tsvector(search); return new; end $$ LANGUAGE plpgsql; CREATE TRIGGER posts_search_vector_refresh BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE search_trigger();  | 
Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document in a text field on the model, then use a trigger to only index that field into our search_vector field. The search_document field can then easily be set from your ORM.
Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:
1 2 3 4 5 6 7 8 9 10  | 
  class Post include DataMapper::Resource include Searchable property :id, Serial property :title, String property :body, Text searchable :title, :body # Provides Post.search('keyword') end  | 
You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.