Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Setting isolation level in MySQL 5.1, 5.5, Postgres

From the I-want-my-evening-back department, differences in behaviour when
setting isolation levels between MySQL 5.1, 5.5, and postgres. Documenting here
for my poor future self.

In postgres and MySQL 5.1, the following is the correct ordering:

1
2
3
4
5
6
7
ActiveRecord::Base.transaction do
  ActiveRecord::Base.connection.execute(
    "SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE"
  )

  # ...
end

On MySQL 5.5 with mysql2 gem, no error will be raised, but the isolation
level will not be set correctly. If you run the same commands in a mysql shell,
you see an error informing that the isolation level cannot be set after the
transaction has started.

Ok well, let’s move it outside then:

1
2
3
4
5
6
ActiveRecord::Base.connection.execute(
  "SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE"
)
ActiveRecord::Base.transaction do
  # ...
end

That works on 5.5, but fails on postgres.

Deleting duplicate data with PostgreSQL

Here is an update to a query I posted a while back for detecting duplicate data. It allows you to select all but one of the resulting duplicates, for easy deletion. It only works on PostgreSQL, but is pretty neat. It uses a window function!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
DELETE FROM users 
USING (
  SELECT id, first_value(id) OVER (
    PARTITION BY name ORDER BY created_at DESC
  ) first_id
  FROM users
  WHERE name IN (
    SELECT name 
    FROM users 
    GROUP BY name 
    HAVING count(name) > 1
  )
) dups
WHERE dups.id != dups.first_id AND users.id = dups.id;

The order by is optional, but handy if you need to select a particular row rather than just an arbitrary one. You need an extra sub-query because you can’t have window functions in a where clause.

For more tasty PostgreSQL tricks, check out my Meet PostgreSQL screencast, a steal at only $12 plug plug plug.

PostgreSQL 9 and ruby full text search tricks

I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:

  • Setup full text search
  • Optimize search with triggers and indexes
  • Use Postgres with Ruby on Rails 3
  • Optimize indexes by including only the rows that you need
  • Use database standards for more reliable queries
  • Write powerful reports in only a few lines of code
  • Convert an existing MySQL application to use Postgres

It’s a steal at only $12. You can buy it over at PeepCode.

In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:

1
2
3
4
CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts 
FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(search_vector, 'pg_catalog.english',  body, title);

That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$
DECLARE
  search TEXT;
  child_search TEXT;
begin
  SELECT string_agg(author_name, ' ') INTO child_search
  FROM comments
  WHERE post_id = new.id;

  search := '';
  search := search || ' ' || coalesce(new.title);
  search := search || ' ' || coalesce(new.body);
  search := search || ' ' child_search;

  new.search_index := to_tsvector(search); 
  return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE
  search_trigger();

Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document in a text field on the model, then use a trigger to only index that field into our search_vector field. The search_document field can then easily be set from your ORM.

Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:

1
2
3
4
5
6
7
8
9
10
class Post
  include DataMapper::Resource
  include Searchable

  property :id, Serial
  property :title, String
  property :body, Text

  searchable :title, :body # Provides Post.search('keyword')
end

You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.

A pretty flower Another pretty flower