Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

DataMapper Retrospective

I introduced DataMapper on my last two major projects. As those projects matured after I had left, they both migrated to a different ORM. That deserves a retrospective, I think. As I’ve left both projects, I don’t have the insider level of detail on the decision to abandon DataMapper, but developers from both projects kindly provided background for this blog post.

Project A

Web application and a batch processing component built on top of a legacy Oracle database.

Good

  • Field mappings, nice ruby names and able to ignore fields we didn’t care about.

Bad

  • Had to roll our own locking and time zone integration.
  • Not great for batch processing (trying to write SQL through DM abstraction.)

It turned out this project required a lot more batch processing than we anticipated, which DataMapper does not shine at. It was migrated to Sequel which provides a far better abstraction for working closer to SQL.

Project B

A fairly typical Rails 3 application. A couple of tens of thousands of lines of code.

Good

  • No migrations (pre-release).
  • Foreign keys, composite primary keys.
  • Auto-validations.

Bad

  • Auto-validations with nested attributes was uncharted territory (needed bug fixes).
  • Performance on large object graphs was unusable for page rendering (close to two seconds for our home page, which admittedly had a stupid amount of stuff on it).
  • Performance was suboptimal (though passable) on smaller pages.
  • Tracing through what his happening across multiple gems (particularly around transactions) was tricky.
  • The maintenance/interactions of all the various gems was problematic (e.g. gems X,Y work with 1.9.3 but Z doesn’t yet).
  • Inability to easily “break the abstraction” when SQL was required.

The performance issues were clear in our code base, but eluded much effort to reduce them down to smaller reproducible problems. The best quick win I found was ~15% by disabling assertions, but I suspect that given the large scope of the problem DataMapper is trying to solve there may not be any approachable way of tackling the issue (would love to be proven wrong!)

We ran into obvious integration bugs (apologies for not having kept a concrete list), a symptom of a library not widely used. As a commiter on the project this wasn’t an issue, since they were easily fixed and moved past (the DataMapper code base is really nice to work on), but having a commiter on your team isn’t a tenable strategy.

DataMapper takes an all-ruby-all-the-time approach, which means things get tricky when the abstraction leaks. Much of the SQL generation is hidden in private methods. Compare some code to create a composable full text search query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def self.search(keywords, options = {})
  options = {
    conditions: ["true"]
  }.merge(options)

  current_query = query.merge(options)

  a           = repository.adapter
  columns_sql = a.send(:columns_statement,    current_query.fields,     false)
  conditions  = a.send(:conditions_statement, current_query.conditions, false)
  order_sql   = a.send(:order_statement,      current_query.order,      false)
  limit_sql   = current_query.limit || 50
  conditions_sql, conditions_values = *conditions

  bind_values = [keywords] + conditions_values

  find_by_sql([<<-SQL, *bind_values])
    SELECT #{columns_sql}, ts_rank_cd(search_vector, query) AS rank
    FROM things
    CROSS JOIN plainto_tsquery(?) query
    WHERE #{conditions_sql} AND (query @@ search_vector)
    ORDER BY rank DESC, #{order_sql}
    LIMIT #{limit_sql}
  SQL
end

To the ActiveRecord equivalent (Sequel is similar):

1
2
3
4
5
6
def self.search(keywords)
  select("things.*, ts_rank_cd(search_vector, query) AS rank")
    .joins(sanitize_sql_array(["CROSS JOIN plainto_tsquery(?) query", keywords]))
    .where("query @@ search_vector")
    .order("rank DESC")
end

Switching to ActiveRecord took a week of all hands (~4) on deck, plus another week alongside other feature work to get it stable. From beginning to in production was two weeks. The end result was a drop in response time (the deploy is pretty blatant in the graph below), start up time, plus 3K less lines of code (a lot of custom code for dropping down to SQL was able to be removed).

Do differently

Ultimately, DataMapper provides an abstraction that I just don’t need, and even if I did it hasn’t had its tires kicked sufficiently that a team can use it without having to delve down to the internals. The applications I find myself writing are about data, and the store in which that data lives is vitally important to the application. Abstracting away those details seems to be heading in the wrong direction for writing simple applications. As an intellectual achievement in its own right I really dig DataMapper, but it is too complicated a component to justify using inside other applications.

Rich Hickey’s talk Simple Made Easy has been rattling around my head a lot.

Nowadays I’m back to ActiveRecord for team conformance. It’s more work to keep on top of foreign keys and the like, but overall it does the job. It’s still too complicated, but has the non-trivial benefit of being used by lots of people. This is my responsible choice at the moment.

On my own projects I first reach for Sequel. It supports all the nice database features I want to use, while providing a thin layer over SQL. In other words, I don’t have to worry about the abstraction leaking because the abstraction is still SQL, just expressed in ruby (which is a huge win for composeability that you don’t get with raw SQL). While it does have “ORM” features, it feels more like the most convenient way of accessing my database rather than an abstraction layer. It’s actively maintained and the only bug I have found was something that Rails broke, and a patch was already available. There are no open issues in the bug tracker. My experiences have been overwhelmingly positive. I haven’t built anything big enough with it yet to have confidence using it on a team project though.

I still have a soft spot in my heart for DataMapper, I just don’t see anywhere for me to use it anymore.

PostgreSQL 9 and ruby full text search tricks

I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:

  • Setup full text search
  • Optimize search with triggers and indexes
  • Use Postgres with Ruby on Rails 3
  • Optimize indexes by including only the rows that you need
  • Use database standards for more reliable queries
  • Write powerful reports in only a few lines of code
  • Convert an existing MySQL application to use Postgres

It’s a steal at only $12. You can buy it over at PeepCode.

In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:

1
2
3
4
CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts 
FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(search_vector, 'pg_catalog.english',  body, title);

That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$
DECLARE
  search TEXT;
  child_search TEXT;
begin
  SELECT string_agg(author_name, ' ') INTO child_search
  FROM comments
  WHERE post_id = new.id;

  search := '';
  search := search || ' ' || coalesce(new.title);
  search := search || ' ' || coalesce(new.body);
  search := search || ' ' child_search;

  new.search_index := to_tsvector(search); 
  return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE
  search_trigger();

Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document in a text field on the model, then use a trigger to only index that field into our search_vector field. The search_document field can then easily be set from your ORM.

Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:

1
2
3
4
5
6
7
8
9
10
class Post
  include DataMapper::Resource
  include Searchable

  property :id, Serial
  property :title, String
  property :body, Text

  searchable :title, :body # Provides Post.search('keyword')
end

You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.

Ordering by a field in a join model with DataMapper

The public interface for datamapper 1.0.3 does not support ordering by a column in a joined model on a query. The core of datamapper does support this though, so we can use some hacks to make it work, as the following code demonstrates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
require 'rubygems'
require 'dm-core'
require 'dm-migrations'

DataMapper::Logger.new($stdout, :debug)
DataMapper.setup(:default, 'postgres://localhost/test') # createdb test

class User
  include DataMapper::Resource

  property :id, Serial

  has 1, :user_profile

  def self.ranked
    order = DataMapper::Query::Direction.new(user_profile.ranking, :desc) 
    query = all.query # Access a blank query object for us to manipulate
    query.instance_variable_set("@order", [order])

    # Force the user_profile model to be joined into the query
    query.instance_variable_set("@links", [relationships['user_profile'].inverse])

    all(query) # Create a new collection with the modified query
  end
end

class UserProfile
  include DataMapper::Resource

  property :user_id, Integer, :key => true
  property :ranking, Integer, :default => 0

  belongs_to :user
end

DataMapper.finalize
DataMapper.auto_migrate!

User.create(:user_profile => UserProfile.new(:ranking => 2))
User.create(:user_profile => UserProfile.new(:ranking => 5))
User.create(:user_profile => UserProfile.new(:ranking => 3))

puts User.ranked.map {|x| x.user_profile.ranking }.inspect

Transactional before all with RSpec and DataMapper

By default, before(:all) in rspec executes outside of any transaction, meaning that you can’t really use it for creating objects. Normally this should go in a before(:each), but for a spec with simple creation and a large number of assertions this is terribly inefficient.

Let’s fix it!

This code assumes you are using DataMapper, and that your database supports some form of nested transactions (at the very least faking them with savepoints – see nested transactions in postgres with datamapper). It wraps each before/after :all and :each in it’s own transaction.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
RSpec.configure do |config|
  [:all, :each].each do |x|
    config.before(x) do
      repository(:default) do |repository|
        transaction = DataMapper::Transaction.new(repository)
        transaction.begin
        repository.adapter.push_transaction(transaction)
      end
    end

    config.after(x) do
      repository(:default).adapter.pop_transaction.rollback
    end
  end

  config.include(RSpecExtensions::Set)
end

See that RSpecExtensions::Set include? That’s a version of the lovely let helpers that works with before(:all) setup. Props to pcreux for this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
module RSpecExtensions
  module Set

    module ClassMethods
      # Generates a method whose return value is memoized
      # in before(:all). Great for DB setup when combined with
      # transactional before alls.
      def set(name, &block)
        define_method(name) do
          __memoized[name] ||= instance_eval(&block)
        end
        before(:all) { __send__(name) }
        before(:each) do
          __send__(name).tap do |obj|
            obj.reload if obj.respond_to?(:reload)
          end
        end
      end
    end

    module InstanceMethods
      def __memoized # :nodoc:
        @__memoized ||= {}
      end
    end

    def self.included(mod) # :nodoc:
      mod.extend ClassMethods
      mod.__send__ :include, InstanceMethods
    end

  end
end

Fast specs make me a happy man.

Nested Transactions in Postgres with DataMapper

Hacks to get nested transactions support for Postgres in DataMapper. Not extensively tested, more a proof of concept. It re-opens the existing Transaction class to add a check for whether we need a nested transaction or not, and adds a new NestedTransaction transaction primitive that issues savepoint commands rather than begin/commit.

I put this code in a Rails initializer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Hacks to get nested transactions in Postgres
# Not extensively tested, more a proof of concept
#
# It re-opens the existing Transaction class to add a check for whether
# we need a nested transaction or not, and adds a new NestedTransaction
# transaction primitive that issues savepoint commands rather than begin/commit.

module DataMapper
  module Resource
    def transaction(&block)
      self.class.transaction(&block)
    end
  end

  class Transaction
    # Overridden to allow nested transactions
    def connect_adapter(adapter)
      if @transaction_primitives.key?(adapter)
        raise "Already a primitive for adapter #{adapter}"
      end

      primitive = if adapter.current_transaction
        adapter.nested_transaction_primitive
      else
        adapter.transaction_primitive
      end

      @transaction_primitives[adapter] = validate_primitive(primitive)
    end
  end

  module NestedTransactions
    def nested_transaction_primitive
      DataObjects::NestedTransaction.create_for_uri(normalized_uri, current_connection)
    end
  end

  class NestedTransactionConfig < Rails::Railtie
    config.after_initialize do
      repository.adapter.extend(DataMapper::NestedTransactions)
    end
  end
end

module DataObjects
  class NestedTransaction < Transaction

    # The host name. Note, this relies on the host name being configured
    # and resolvable using DNS
    HOST = "#{Socket::gethostbyname(Socket::gethostname)[0]}" rescue "localhost"
    @@counter = 0

    # The connection object for this transaction - must have already had
    # a transaction begun on it
    attr_reader :connection
    # A unique ID for this transaction
    attr_reader :id

    def self.create_for_uri(uri, connection)
      uri = uri.is_a?(String) ? URI::parse(uri) : uri
      DataObjects::NestedTransaction.new(uri, connection)
    end

    #
    # Creates a NestedTransaction bound to an existing connection
    #
    def initialize(uri, connection)
      @connection = connection
      @id = Digest::SHA256.hexdigest(
        "#{HOST}:#{$$}:#{Time.now.to_f}:nested:#{@@counter += 1}")
    end

    def close
    end

    def begin
      run %{SAVEPOINT "#{@id}"}
    end

    def commit
      run %{RELEASE SAVEPOINT "#{@id}"}
    end

    def rollback
      run %{ROLLBACK TO SAVEPOINT "#{@id}"}
    end

    private
    def run(cmd)
      connection.create_command(cmd).execute_non_query
    end
  end
end

I wrote code similar to this with hassox while at NZX, big ups to those guys. I’m working on a proper patch, but haven’t quite figured out the internals enough. If you know how DataMapper works, please check out and comment on this sample patch for three dm gems.

Unique data in dm-sweatshop

dm-sweatshop is how you set up test data for your datamapper apps. Standard practice is to generate random data that follows a pattern:

1
2
3
4
5
User.fix {{
  :login  => /\w+/.gen
}}

new_user = User.gen

Let’s not now debate whether or not random data in tests is a good idea. What’s more important is that the above code should make you uneasy if login is supposed to be unique. There was a hack in sweatshop that would try recreating the data if you had a uniqueness constraint on login and it was invalid, but it was exactly that: a hack. As of a few days ago (what will be 0.9.7), you need to be more explicit if you want unique data. It’s pretty easy:

1
2
3
4
5
include DataMapper::Sweatshop::Unique

User.fix {{
  :login  => unique { /\w+/.gen }
}}

Tada! You can also easily get non-random unique data by providing a block with one parameter. Check the README for this and other cool things you can do.

A pretty flower Another pretty flower