Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

How I Test Rails Applications

The Rails conventions for testing provide three categories for your tests:

  • Unit. What you write to test your models.
  • Integration. Used to test the interaction among any number of controllers.
  • Functional. Testing the various actions of a single controller.

This tells you where to put your tests, but the type of testing you perform on each part of the system is the same: load fixtures into the database to get the app into the required state, run some part of the system either directly (models) or using provided harnesses (controllers), then verify the expected output.

This techinque is simple, but is only one of a number of ways of testing. As your application grows, you will need to add other approaches to your toolbelt to enable your test suite to continue providing valuable feedback not just on the correctness of your code, but its design as well.

I use a different set of categories for my tests (taken from the GOOS book):

  • Unit. Do our objects do the right thing, and are they convenient to work with?
  • Integration. Does our code work against code we can’t change?
  • Acceptance. Does the whole system work?

Note that these definitions of unit and integration are radically different to how Rails defines them. That is unfortunate, but these definitions are more commonly accepted across other languages and frameworks and I prefer to use them since it facilitates an exchange of information across them. All of the typical Rails tests fall under the “integration” label, leaving two new levels of testing to talk about: unit and acceptance.

Unit Tests

“A test is not a unit test if it talks to the database, communicates across a network, or touches the file system.” – Working with Legacy Code, p. 14

This type of test is typically referred to in the Rails community as a “fast unit test”, which is unfortunate since speed is far from the primary benefit. The primary benefit of unit testing is the feedback it provides on the dependencies in your design. “Design unit tests” would be a better label.

This feedback is absolutely critical in any non-trivial application. Unchecked dependency is crippling, and Rails encourages you not to think about it (most obviously by implicitly autoloading everything).

By unit testing a class you are forced to think about how it interacts with other classes, which leads to simpler dependency trees and simpler programs.

Unit tests tend to (though don’t always have to) make use of mocking to verify interactions between classes. Using rspec-fire is absolutely critical when doing this. It verifies your mocks represent actual objects with no extra effort required in your tests, bridging the gap to statically-typed mocks in languages like Java.

As a guideline, a single unit test shouldn’t take more than 1ms to run.

Acceptance Tests

A Rails integration test doesn’t exercise the entire system, since it uses a harness and doesn’t use the system from the perspective of a user. As one example, you need to post form parameters directly rather than actually filling out the form, making the test both brittle in that if you change your HTML form the test will still pass, and incomplete in that it doesn’t actually load the page up in a browser and verify that Javascript and CSS are not intefering with the submission of the form.

Full system testing was popularized by the cucumber library, but cucumber adds a level of indirection that isn’t useful for most applications. Unless you are actually collaborating with non-technical stakeholders, the extra complexity just gets in your way. RSpec can easily be written in a BDD style without extra libraries.

Theoretically you should only be interacting with the system as a black box, which means no creating fixture data or otherwise messing with the internals of the system in order to set it up correctly. In practice, this tends to be unweildy but I still maintain a strict abstraction so that tests read like black box tests, hiding any internal modification behind an interface that could be implemented by black box interactions, but is “optimized” to use internal knowledge. I’ve had success with the builder pattern, also presented in the GOOS book, but that’s another blog post (i.e. build_registration.with_hosting_request.create).

A common anti-pattern is to try and use transactional fixtures in acceptance tests. Don’t do this. It isn’t executing the full system (so can’t test transaction level functionality) and is prone to flakiness.

An acceptance test will typically take seconds to run, and should only be used for happy-path verification of behaviour. It makes sure that all the pieces hang together correctly. Edge case testing should be done at the unit or integration level. Ideally each new feature should have only one or two acceptance tests.

File Organisation.

I use spec/{unit,integration,acceptance} folders as the parent of all specs. Each type of spec has it’s own helper require, so unit specs require unit_helper rather than spec_helper. Each of those helpers will then require other helpers as appropriate, for instance my rails_helper looks like this (note the hack required to support this layout):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)

# By default, rspec/rails tags all specs in spec/integration as request specs,
# which is not what we want. There does not appear to be a way to disable this
# behaviour, so below is a copy of rspec/rails.rb with this default behaviour
# commented out.
require 'rspec/core'

RSpec::configure do |c|
  c.backtrace_clean_patterns << /vendor\//
  c.backtrace_clean_patterns << /lib\/rspec\/rails/
end

require 'rspec/rails/extensions'
require 'rspec/rails/view_rendering'
require 'rspec/rails/adapters'
require 'rspec/rails/matchers'
require 'rspec/rails/fixture_support'
require 'rspec/rails/mocks'
require 'rspec/rails/module_inclusion'
# require 'rspec/rails/example' # Commented this out
require 'rspec/rails/vendor/capybara'
require 'rspec/rails/vendor/webrat'

# Added the below, we still want access to some of the example groups
require 'rspec/rails/example/rails_example_group'
require 'rspec/rails/example/controller_example_group'
require 'rspec/rails/example/helper_example_group'

Controllers specs go in spec/integration/controllers, though I’m trending towards using poniard that allows me to test controllers in isolation (spec/unit/controllers).

Helpers are either unit or integration tested depending on the type of work they are doing. If it is domain level logic it can be unit tested (though I tend to use presenters for this, which are also unit tested), but for helpers that layer on top of Rails provided helpers (like link_to or content_tag) they should be integration tested to verify they are using the library in the correct way.

I have used this approach on a number of Rails applications over the last 1-2 years and found it leads to better and more enjoyable code.

Form Objects in Rails

For a while now I have been using form objects instead of nested attributes for complex forms, and the experience has been pleasant. A form object is an object designed explicitly to back a given form. It handles validation, defaults, casting, and translation of attributes to the persistence layer. A basic example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class Form::NewRegistration
  include ActiveModel::Validations

  def self.scalar_attributes
    [:name, :age]
  end

  attr_accessor *scalar_attributes
  attr_reader :event

  validates_presence_of :name

  def initialize(event, params = {})
    self.class.scalar_attributes.each do |attr|
      self.send("%s=" % attr, params[attr]) if params.has_key?(attr)
    end
  end

  def create
    return unless valid?

    registration = Registration.create!(
      event: event,
      data_json: {
        name: name,
        age:  age.to_i,
      }.to_json
    )

    registration
  end

  # ActiveModel support
  def self.name; "Registration"; end
  def persisted?; false; end
  def to_key; nil; end
end

Note how this allows an easy mapping from form fields to a serialized JSON blob.

I have found this more explicit and flexible than tying forms directly to nested attributes. It allows more fine tuned control of the form behaviour, is easier to reason about and test, and enables you to refactor your data model with minimal other changes. (In fact, if you are planning on refactoring your data model, adding in a form object as a “shim” to protect other parts of the system from change before you refactor is usually desirable.) It even works well with nested attributes, using the form object to build up the required nested hash in the #create method.

Relationships

A benefit of this approach, albeit still a little clunky, is having accessors map one to one with form fields even for one to many associations. My approach takes advantages of Ruby’s flexible object model to define accessors on the fly. For example, say a registration has multiple custom answer fields, as defined on the event, I would call the following method on initialisation:

1
2
3
4
5
6
7
8
9
def add_answer_accessors!
  event.questions.each do |q|
    attr = :"answer_#{q.id}"
    instance_eval <<-RUBY
      def #{attr};     answers[#{q.id}]; end
      def #{attr}=(x); answers[#{q.id}] = x; end
    RUBY
  end
end

With the exception of the above code (which isn’t too bad), this greatly simplifies typical code for handling one to many relationships: it avoids fields_for, index, and is easier to set up sane defaults for.

Casting

I use a small supporting module to handle casting of attributes to certain types.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
module TypedWriter
  def typed_writer(type, attribute)
    class_eval <<-EOS
      def #{attribute}=(x)
        @#{attribute} = type_cast(x, :#{type})
      end
    EOS
  end

  def type_cast(x, type)
    case type
    when :integer
      x.to_s.length > 0 ? x.to_i : nil
    when :boolean
      x.to_s.length > 0 ? x == true || x == "true" : nil
    when :boolean_with_nil
      if x.to_s == 'on' || x.nil?
        nil
      else
        x.to_s.length > 0 ? x == true || x == "true" : nil
      end
    when :int_array
      [*x].map(&:to_i).select {|x| x > 0 }
    else
      raise "Unknown type #{type}"
    end
  end

  def self.included(klass)
    # Make methods available both as class and instance methods.
    klass.extend(self)
  end
end

It is used like so:

1
2
3
4
5
6
7
class Form::NewRegistration
  # ...

  include TypedWriter

  typed_writer :age, :integer
end

Testing

I don’t load Rails for my form tests, so an explicit require of active model is necessary. I do this in my form code since I like explicitly requiring third-party dependencies everywhere they are used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
require 'unit_helper'

require 'form/new_registration'

describe Form::NewRegistration do
  include RSpec::Fire

  let(:event) { fire_double('Event') }

  subject { described_class.new(event) }

  def valid_attributes
    {
      name: 'don',
      age:  25
    }
  end

  def form(extra = {})
    described_class.new(event, valid_attributes.merge(extra))
  end

  describe 'validations' do
    it 'is valid for default attributes' do
      form.should be_valid
    end

    it { form(name: '').should have_error_on(:name) }
  end

  describe 'type-casting' do
    let(:f) { form } # Memoize the form

    # This pattern is overkill in this example, but useful when you have many
    # typed attributes.
    let(:typecasts) {{
      int: {
        nil  => nil,
        ""   => nil,
        23   => 23,
        "23" => 23,
      }
    }}

    it 'casts age to an int' do
      typecasts[:int].each do |value, expected|
        f.age = value
        f.age.should == expected
      end
    end
  end

  describe '#create' do
    it 'returns false when not valid' do
      subject.create.should_not be
    end

    it 'creates a new registration' do
      f = form
      dao = fire_replaced_class_double("Registration")
      dao.should_receive(:create).with {|x|
        x[:event].should == event

        data = JSON.parse(x[:data_json])

        data['name'].should == valid_attributes[:name]
        data['age'].should == valid_attributes[:age]
      }
      f.create.should new_rego
    end
  end

  it { should_not be_persisted }
end

Code Sharing

I tend to have a parent object Form::Registration, with subclasses for Form::{New,Update,View}Registration. A common mixin would also work. For testing, I use a shared spec that is run by the specs for each of the three subclasses.

Conclusion

There are other solutions to this problem (such as separating validations completely) which I haven’t tried yet, and I haven’t used this approach on a team yet. It has worked well for my solo projects though, and I’m just about confident enough to recommend it for production use.

Poniard: a Dependency Injector for Rails

I just open sourced poniard, a dependency injector for Rails. It’s a newer version of code I posted a few weeks back that allows you to write controllers using plain ruby objects:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
module Controller
  class Registration
    def update(response, now_flash, update_form)
      form = update_form

      if form.save
        response.respond_with SuccessfulUpdateResponse, form
      else
        now_flash[:message] = "Could not save registration."
        response.render action: 'edit', ivars: {registration: form}
      end
    end

    SuccessfulUpdateResponse = Struct.new(:form) do
      def html(response, flash, current_event)
        flash[:message] = "Updated details for %s" % form.name
        response.redirect_to :registrations, current_event
      end

      def js(response)
        response.render json: form
      end
    end
  end
end

This makes it possible to test them in isolation, leading to a better appreciation of your dependencies and nicer code.

Check it out!

Dependency Injection for Rails Controllers

What if controllers looked like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
module Controller
  class Registration
    def update(response, now_flash, update_form)
      form = update_form

      if form.save
        response.respond_with SuccessfulUpdateResponse, form
      else
        now_flash[:message] = "Could not save registration."
        response.render action: 'edit', ivars: {registration: form}
      end
    end

    SuccessfulUpdateResponse = Struct.new(:form) do
      def html(response, flash, current_event)
        flash[:message] = "Updated details for %s" % form.name
        response.redirect_to :registrations, current_event
      end

      def js(response)
        response.render json: form
      end
    end
  end
end

It is a plain ruby object that receives all needed dependencies via method arguments. (Requires Some Magic, explained below.) This is a style of dependency injection inspired by Raptor, Dropwizard and Guice. It allows you to cleanly separate authorization, object fetching, control flow, and other typical controller responsibilities, and as a result is much easier to organise and test than the traditional style.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
require 'unit_helper'

require 'injector'
require 'controller/registration'

describe Controller::Registration do
  success_response = Controller::Registration::SuccessfulUpdateResponse

  let(:form)      { fire_double("Form::UpdateRegistration") }
  let(:response)  { fire_double("ControllerSource::Response") }
  let(:event)     { fire_double("Event") }
  let(:flash)     { {} }
  let(:now_flash) { {} }
  let(:injector)  { Injector.new([OpenStruct.new(
    response:      response.as_null_object,
    current_event: event.as_null_object,
    update_form:   form.as_null_object,
    flash:         flash,
    now_flash:     now_flash
  )]) }

  describe '#update' do
    it 'saves form and responds with successful update' do
      form.should_receive(:save).and_return(true)
      response
        .should_receive(:respond_with)
        .with(success_response, form)

      injector.dispatch described_class.new.method(:update)
    end

    it 'render edit page when save fails' do
      form.should_receive(:save).and_return(false)
      response
        .should_receive(:render)
        .with(action: 'edit', ivars: {registration: form})

      injector.dispatch described_class.new.method(:update)

      now_flash[:message].length.should > 0
    end
  end

  describe success_response do
    describe '#html' do
      it 'redirects to registration' do
        response.should_receive(:redirect_to).with(:registrations, event)

        injector.dispatch success_response.new(form).method(:html)
      end

      it 'includes name in flash message' do
        form.stub(:name).and_return("Don")

        injector.dispatch success_response.new(form).method(:html)

        flash[:message].should include(form.name)
      end
    end
  end
end

Before filters and authorization can be extracted out into a separate source, and will be applied when they are named in a method. For instance, if you specify current_event as a method argument in Controller::Registration#update, you will receive Controller::RegistrationSource#current_event. Authorization is interesting: requesting authorized_organiser when not authorized will cause and UnauthorizedException, which you can handle in your base ApplicationController (note: the above example omits authorization).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
module Controller
  class RegistrationSource
    def current_event(params)
      Event.find(params[:event_id])
    end

    def current_registration(params, current_event)
      current_event.registrations.find(params[:id])
    end

    def current_organiser(session)
      Organiser.find_by_id(session[:organiser_id])
    end

    def authorized_organiser(current_event, current_organiser)
      unless current_organiser && current_organiser.can_edit?(current_event)
        raise UnauthorizedException
      end
    end

    def update_form(params, current_registration)
      Form::UpdateRegistration.build(
        current_registration,
        params[:registration]
      )
    end
  end
end

Magic wiring

An Injector is responsible for introspecting method arguments and finding an appropriate object from its sources to inject. In the controller case two sources are required: one for standard controller dependencies (params, flash, etc), and one for application specific logic (the RegistrationSource seen above).

1
2
3
4
5
6
7
8
9
class RegistrationsController < ApplicationController
  def update
    injector = Injector.new([
      ControllerSource.new(self),
      Controller::RegistrationSource.new
    ])
    injector.dispatch Controller::Registration.new.method(:update)
  end
end

The injector itself is fairly straightforward. The tricky part is the recursive dispatch, which enables sources to themselves request dependency injection, allowing the type of decomposition seen in registration_source where authorized_organiser depends on the definition of current_organiser in the same class.

UnknownInjectable is a cute trick for testing: you don’t need to specify every dependency requested by the method, only the ones that are being used by the code path being executed. In non-test code it probably makes sense to raise an exception earlier.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class Injector
  attr_reader :sources

  def initialize(sources)
    @sources = sources + [self]
  end

  def dispatch(method, overrides = {})
    args = method.parameters.map {|_, name|
      source = sources.detect {|source| source.respond_to?(name) }
      if source
        dispatch(source.method(name), overrides)
      else
        UnknownInjectable.new(name)
      end
    }
    method.call(*args)
  end

  def injector
    self
  end

  class UnknownInjectable < BasicObject
    def initialize(name)
      @name = name
    end

    def method_missing(*args)
      ::Kernel.raise "Tried to call method on an uninjected param: #{@name}"
    end
  end
end

Finally for completeness, an implementation of ControllerSource:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class ControllerSource
  Response = Struct.new(:controller, :injector) do
    def redirect_to(path, *args)
      controller.redirect_to(controller.send("#{path}_path", *args))
    end

    def render(*args)
      ivars = {}
      if args.last.is_a?(Hash) && args.last.has_key?(:ivars)
        ivars = args.last.delete(:ivars)
      end

      ivars.each do |name, val|
        controller.instance_variable_set("@#{name}", val)
      end

      controller.render *args
    end

    def respond_with(klass, *args)
      obj = klass.new(*args)
      format = controller.request.format.symbol
      if obj.respond_to?(format)
        injector.dispatch obj.method(format)
      end
    end
  end

  def initialize(controller)
    @controller = controller
  end

  def params;    @controller.params; end
  def session;   @controller.session; end
  def flash;     @controller.flash; end
  def now_flash; @controller.flash.now; end

  def response(injector)
    Response.new(@controller, injector)
  end
end

Initial impressions are that it does feel like more magic until you get in the groove, after which it is no more so than normal Rails. I remember my epiphany when writing Guice code—“oh you just name a thing and you get it!”—after which the ride became a lot smoother. I really like the better testability of controllers, since that has always been a pain point of mine. I’m going to experiment some more on larger chunks of code, try and nail down the naming conventions some more.

Disclaimer: I haven’t use this ideal in any substantial form, beyond one controller action from a project I have lying around. It remains to be seen whether it is a good idea or not.

All code as a gist.

Screencast: moving to Heroku

A treat from the archives! I found a screen recording with commentary of me moving this crusty old blog from a VPS on to Heroku from about a year ago. It’s still pretty relevant, not just technology wise but also how I work (except I wasn’t using tmux then).

This is one take with no rehersal, preparation or editing, so you get my development and thought process raw. All two and a half hours of it. That has positives and negatives. I don’t know how interesting this is to others, but putting it out there in case. Make sure you watch them in a viewer that can speed up the video.

An interesting observation I noted was that I tend to have two tasks going in parallel most of the time to context switch between when I’m blocked on one waiting for a gem install or the like.

I have divided it into four parts, each around 40 minutes long and 350mb in size.

  • Part 1 gets the specs running, green, fixes deprecations, moves from 1.8 to 1.9.
  • Part 2 moves from MySQL to Postgres, replaces sphinx with full text search.
  • Part 3 continues the sphinx to postgres transition, implementing related posts
  • Part 4 deploys the finished product to heroku, copies data across, and gets exception notification working.

Rough indexes are provided below.

Part 1

0:00 Introduction
0:50 rake, bundle
1:42 Search for MySQL to PG conversion, maybe taps gem?
3:22 bundle finishes
3:42 couldn’t parse YAML file, switch to 1.8.7 for now
4:10 Add .rvmrc
4:39 bundle again for 1.8.7
4:50 Search for Heroku cedar stack docs (back when it was new), reading
6:30 Gherkin fails to build
8:50 Can’t find solution, update gherkin to latest
9:10 Find YAML fix while waiting for gherkin to update
10:08 Cancel gherkin update, switch to 1.9.2 and apply YAML fix
10:20 AWS S3 gem not 1.9 compatible, but not needed anymore so delete
11:10 Remove db2s3 gem also
11:20 nil.[] error, non-obvious
11:50 Missing test db config
12:20 Tests are running, failures
12:50 Debug missing partial error, start local server to click around and it works here
14:15 Back to fixing specs
14:25 Removed functionality but not specs, clearly haven’t been running specs regularly. Poor form.
15:45 Target specs passing
16:13 Fix a deprecation warning along the way
16:40 Commit fixes for 1.9.2
17:50 While waiting for specs, check for sphinx code
18:05 author_ip can’t be null, why is that still there?
18:50 make it nullable, don’t want to delete old data right now
19:40 Search for MySQL syntax
21:06 Oh actually author_ip does get set, specs actually are broken
22:07 Add blank values to spec, fixes spec.
22:39 Add blank values in again, would be nice to extract duplicate code
23:35 Start fixing tagging
24:30 Why no backtraces? Argh color scheme hiding them, must have reset recently
25:50 This changed recently? Look at git log
26:46 Looks like a dodgy merge, fixed. That’ll learn me for not running specs
28:15 Tackle view specs, long time since I’ve used these.
29:06 Be easier if I had factories, look for them.
29:23 Find them under cucumber
30:11 Extract valid_comment_attributes to spec_helper.rb
32:15 Fix broken undo logic
33:00 Extracting common factory logic
33:08 hmm, can you super from a method defined inside a spec?
33:30 yeah, apparently
35:28 working, check in
36:00 Fixing view specs
36:30 Remove approved_comments_count, don’t do spam checking anymore
37:15 Actually it is still there. Need to fix mocks.
39:15 Fix deprecations while waiting for specs.
39:30 Missing template
40:15 Need to use render :template
40:40 Check in, fixed view specs.
41:05 Running specs, looking all green. Fix RAILS_ENV to Rails.env
41:45 All green!

Part 2

0:30 Removing sphinx
2:20 Add pg gem
4:00 Create databases
4:45 Ah it’s postgres, not pg in database.yml
5:15 derp, postgresql
6:00 What are defensio migrations still doing hanging around?
6:45 Move database migrations around to not collide
7:45 taps
8:40 run tests against PG in background
9:30 don’t have open id columns in prod, it was removed in latest enki
11:25 ffffuuuuuu migrations and schema.rb
12:40 taps install failed on rhnh.net, why installing sqlite?
14:00 Argh can’t parse yaml
14:45 Abort taps remotely, bring mysqldump locally
16:00 Try taps locally
17:20 404 :(
17:50 it’s away!
18:10 Invalid encoding UTF-8, dammit.
18:30 New plan, there’s a different gem that does this.
19:00 What is it? I did it in a screencast, I should know this.
19:40 Found it! mysql2psql
20:20 taps, you’re cut
21:00 Setup mysql2psql.yml config
22:20 Works. That was much easier.
23:20 delayed_job, why is that here? Try removing it.
23:50 Used to use it for spam checking, but not anymore.
24:10 Time to replace search, how to do this?
25:00 Index tag list?
26:00 Hmm need full text search as well.
26:15 Step one: normal search, on title and body
27:00 Spec it, extract faux-factory for posts
29:00 Failing spec, implement
30:00 Search for PG full text search syntax
31:30 Passing, add in title search also
32:40 Passing with title as well
33:10 Adding tag cache to posts for easy searching
36:10 Argh migrations are screwed.
36:40 Move migrations back to where they were
39:09 Amend migration move like it never happened
38:45 Add data migration to tag_cache migration
39:30 WTF already have a tag cache. Where did it come from?
39:40 Delete everything I just did.
41:40 Check in web interface, works.

Part 3

00:20 related posts using full text search
02:55 sort by rank, reading docs
03:50 difference between ts_rank and ts_rank_cd?
4:30 Too hard, just pick one and see what happens
5:15 Syntax error in ts_query
5:45 plainto_tsquery
6:40 working, need to use or rather than and
10:30 Ah, using plainto, fix that.
11:04 Order by rank
12:20 syntax error, need to interpolate keywords
13:45 Search for how to escape SQL string in Activerecord
14:15 Find interpolate_sql, looks promising
14:50 Actually no, find sanitize_sql_array
15:20 Just try it, works. Click around to verify.
16:45 Add spec
21:20 Passing specs, commit
21:45 Why isn’t tagging working?
23:30 Ah, probably case insensitive. Need to use ILIKE.
24:00 Write a test for it
26:00 Have a failing test
26:30 Argh it’s inside acts_as_taggable_on_steroids plugin
27:20 Override the method directly in model, just for now
28:30 Commit that
29:00 Remove searchable_tags
32:00 Fix tags with spaces
34:00 Exclude popular tags from search (fix the wrong thing)
35:40 Back to fixing tags with spaces
37:20 Looking at rankings, good enough for now
38:00 Move sphinx namespace into rhnh

Part 4

00:30 Checking docs for new Cedar stack
1:30 Search for how to import data
2:20 pg_dump of data
2:50 Move dump to public Dropbox so heroku can access it
3:40 Push code to heroku
4:50 Taking a while, hmm repo is big
5:50 Clone a copy to tmp, check if it’s still big.
6:00 Yeah, eh not a big deal, it’s been a while a number of years.
7:00 heroku push done, run heroku ps. Crashed :(
7:30 AWS? I deleted you >:[
8:00 Argh I pushed master, not my branch
9:30 heroku ps, crashed again
10:30 Unclear, probably exception notifier, remove it
11:30 add thin gem while waiting
12:30 Running, expect not to work because database not set up
13:05 Create procfile
13:35 Import pg backup
15:20 Working, click around, make sure it’s working
16:20 Check whether atom feed is working
17:30 Check exception notifications
19:00 Either new comments, or something is wrong.
19:20 Yep new comments, need to reimport data. Do that later.
20:00 Back to exception notification. Used to be an add-on.
21:20 Don’t want hoptoad or get exceptional, maybe sendgrind with exception notifier?
22:00 Searching for examples.
22:20 Found stack overflow answer, looks promising.
24:20 Bring back exception notifier with sendgrind.
26:00 logs show sent mail, arrives in email
26:15 Next steps, DNS settings, extra database dump.

Interface Mocking

UPDATE: This is a gem now: rspec-fire The code in the gem is better than that presented here.

Here is a screencast I put together in response to a recent Destroy All Software screencast on test isolation and refactoring, showing off an idea I’ve been tinkering around with for automatic validation of your implicit interfaces that you stub in tests.

Interface Mocking screencast.

Here is the code for InterfaceMocking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
module InterfaceMocking

  # Returns a new interface double. This is equivalent to an RSpec double,
  # stub or, mock, except that if the class passed as the first parameter
  # is loaded it will raise if you try to set an expectation or stub on
  # a method that the class has not implemented.
  def interface_double(stubbed_class, methods = {})
    InterfaceDouble.new(stubbed_class, methods)
  end

  module InterfaceDoubleMethods

    include RSpec::Matchers

    def should_receive(method_name)
      ensure_implemented(method_name)
      super
    end

    def should_not_receive(method_name)
      ensure_implemented(method_name)
      super
    end

    def stub!(method_name)
      ensure_implemented(method_name)
      super
    end

    def ensure_implemented(*method_names)
      if recursive_const_defined?(Object, @__stubbed_class__)
        recursive_const_get(Object, @__stubbed_class__).
          should implement(method_names, @__checked_methods__)
      end
    end

    def recursive_const_get object, name
      name.split('::').inject(Object) {|klass,name| klass.const_get name }
    end

    def recursive_const_defined? object, name
      !!name.split('::').inject(Object) {|klass,name|
        if klass && klass.const_defined?(name)
          klass.const_get name
        end
      }
    end

  end

  class InterfaceDouble < RSpec::Mocks::Mock

    include InterfaceDoubleMethods

    def initialize(stubbed_class, *args)
      args << {} unless Hash === args.last

      @__stubbed_class__ = stubbed_class
      @__checked_methods__ = :public_instance_methods
      ensure_implemented *args.last.keys

      # __declared_as copied from rspec/mocks definition of `double`
      args.last[:__declared_as] = 'InterfaceDouble'
      super(stubbed_class, *args)
    end

  end
end

RSpec::Matchers.define :implement do |expected_methods, checked_methods|
  match do |stubbed_class|
    unimplemented_methods(
      stubbed_class,
      expected_methods,
      checked_methods
    ).empty?
  end

  def unimplemented_methods(stubbed_class, expected_methods, checked_methods)
    implemented_methods = stubbed_class.send(checked_methods)
    unimplemented_methods = expected_methods - implemented_methods
  end

  failure_message_for_should do |stubbed_class|
    "%s does not publicly implement:\n%s" % [
      stubbed_class,
      unimplemented_methods(
        stubbed_class,
        expected_methods,
        checked_methods
      ).sort.map {|x|
        "  #{x}"
      }.join("\n")
    ]
  end
end

RSpec.configure do |config|

  config.include InterfaceMocking

end

Static Asset Caching on Heroku Cedar Stack

UPDATE: This is now documented at Heroku (thanks Nick)

I recently moved this blog over to Heroku, and in the process added in some proper HTTP caching headers. The dynamic pages use the build in fresh_when and stale? Rails helpers, combined with Rack::Cache and the free memcached plugin available on Heroku. That was all pretty straight forward, what was more difficult was configuring Heroku to serve all static assets (such as images and stylesheets) with a far-future max-age header so that they will be cached for eternity. What I’ve documented here is somewhat of a hack, and hopefully Heroku will provide a better way of doing this in the future.

By default Heroku serves everything in public directly via nginx. This is a problem for us since we don’t get a chance to configure the caching headers. Instead, use the Rack::StaticCache middleware (provided in the rack-contrib gem) to serve static files, which by default adds far future max age cache control headers. This needs to be out of different directory to public since there is no way to disable the nginx serving. I renamed by public folder to public_cached.

1
2
3
4
5
6
7
8
9
10
# config/application.rb
config.middleware.use Rack::StaticCache, 
  urls: %w(
    /stylesheets
    /images
    /javascripts
    /robots.txt
    /favicon.ico
  ),
  root: "public_cached"

I also disabled the built in Rails serving of static assets in development mode, so that it didn’t interfere:

1
2
# config/environments/development.rb
config.serve_static_assets = false

In the production config, I configured the x_sendfile_header option to be “X-Accel-Redirect”. It was “X-Sendfile” which is an apache directive, and was causing nginx to hang (Heroku would never actually serve the assets to the browser).

1
2
# config/environments/production.rb
config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect'

A downside of this approach is that if you have a lot of static assets, they all have to hit the Rails stack in order to be served. If you only have one dyno (the free plan) then the initial load can be slower than it otherwise would be if nginx was serving them directly. As I mentioned in the introduction, hopefully Heroku will provide a nicer way to do this in the future.

Speeding up Rails startup time

In which I provide easy instructions to try a new patch that drastically improves the start up time of Ruby applications, in the hope that with wide support it will be merged into the upcoming 1.9.3 release. Skip to the bottom for instructions, or keep reading for the narrative.

UPDATE: If you have trouble installing, grab a recent copy of rvm: rvm get head.

Background

Recent releases of MRI Ruby have introduced some fairly major performance regressions when requiring files:

For reference, our medium-sized Rails application requires around 2200 files &emdash; off the right-hand side of this graph. This is problematic. On 1.9.2 it takes 20s to start up, on 1.9.3 it takes 46s. Both are far too long.

There are a few reasons for this, but the core of the problem is the basic algorithm which looks something like this:

1
2
3
4
5
6
7
def require(file)
  $loaded.each do |x|
    return false if x == file
  end
  load(file)
  $loaded << file
end

That loop is no good, and gets worse the more files you have required. I have written a patch for 1.9.3 which changes this algorithm to:

1
2
3
4
5
def require(file)
  return false if $loaded[file] 
  load(file)
  $loaded[file] = true
end

That gives you a performance curve that looks like this:

Much nicer.

That’s just a synthetic benchmark, but it works in the real world too. My main Rails application now loads in a mite over 10s, down from 20s it was taking on 1.9.2. A blank Rails app loads in 1.1s, which is even faster than 1.8.7.

Getting the fix

Here is how you can try out my patch right now in just ten minutes using RVM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# First get a baseline measurement
cd /your/rails/app
time script/rails runner "puts 1"

# Install a patched ruby
curl https://gist.github.com/raw/996418/e2b346fbadeed458506fc69ca213ad96d1d08c3e/require-performance-fix-r31758.patch > /tmp/require-performance-fix.patch
rvm install ruby-head --patch /tmp/require-performance-fix.patch -n patched
# ... get a cup of tea, this took about 8 minutes on my MBP

# Get a new measurement
cd /your/rails/app
rvm use ruby-head-patched
gem install bundler --no-rdoc --no-ri
bundle
time script/rails runner "puts 1"

How you can help

I need a lot more eyeballs on this patch before it can be considered for merging into trunk. I would really appreciate any of the following:

Next steps

I imagine there will be a bit more work to get this into Ruby 1.9.3, but after that this is just the first step of many to try and speed up the time Rails takes to start up. Bundler and RubyGems still spend a lot of time doing … something, which I want to investigate. I also want to port these changes over to JRuby which has similar issues (Rubinius isn’t quite as fast out of the gate, but does not degrade exponentially so would not benefit from this patch).

Thank you for your time.

PostgreSQL 9 and ruby full text search tricks

I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:

  • Setup full text search
  • Optimize search with triggers and indexes
  • Use Postgres with Ruby on Rails 3
  • Optimize indexes by including only the rows that you need
  • Use database standards for more reliable queries
  • Write powerful reports in only a few lines of code
  • Convert an existing MySQL application to use Postgres

It’s a steal at only $12. You can buy it over at PeepCode.

In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:

1
2
3
4
CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts 
FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(search_vector, 'pg_catalog.english',  body, title);

That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$
DECLARE
  search TEXT;
  child_search TEXT;
begin
  SELECT string_agg(author_name, ' ') INTO child_search
  FROM comments
  WHERE post_id = new.id;

  search := '';
  search := search || ' ' || coalesce(new.title);
  search := search || ' ' || coalesce(new.body);
  search := search || ' ' child_search;

  new.search_index := to_tsvector(search); 
  return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE
  search_trigger();

Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document in a text field on the model, then use a trigger to only index that field into our search_vector field. The search_document field can then easily be set from your ORM.

Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:

1
2
3
4
5
6
7
8
9
10
class Post
  include DataMapper::Resource
  include Searchable

  property :id, Serial
  property :title, String
  property :body, Text

  searchable :title, :body # Provides Post.search('keyword')
end

You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.

Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Tutorial

This took me a while to figure out, especially since I’m not so great with either windows or SQL server, but in the end the process isn’t so difficult.

Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Screencast

The steps covered in this screencast are:

  1. Create user
  2. Create database
  3. Give user permissions
  4. Create DSN
  5. Install ruby
  6. Install devkit (Needed to complie native extensions for ODBC)
  7. Create a new rails app
  8. Add activerecord-sqlserver-adapter and ruby-odbc to Gemfile
  9. Customize config/database.yml
1
2
3
4
5
6
7
8
# config/database.yml
development:
  adapter: sqlserver
  dsn: testdsn_user
  mode: odbc
  database: test
  username: xavier
  password:

Some errors you may encounter:

The specified module could not be found – odbc.so You have likely copied odbc.so from i386-msvcrt-ruby-odbc.zip. This is for 1.8.7, and does not work for 1.9. Remove the .so file, and install ruby-odbc as above.

The specified DSN contains an architecture mismatch between the Driver and the Application. Perhaps you have created a system DSN. Try creating a user DSN instead. I also found some suggestions that you need to use a different version of the ODBC configuration panel, but this wasn’t relevant for me.

Why I Rewrote Chronic

It seems like a pretty epic yak shave. If you want to parse natural language dates in ruby, you use Chronic. That’s just how it is. (There’s also Tickle for recurring dates, which is similar, but based on Chronic anyways.) It’s the standard, everyone uses it, so why oh why did I write my own version from scratch?

Three reasons I can see.

Chronic is unmaintained. Check the network graph for Chronic. A more avid historian could turn this into an epic teledrama, but for now here’s the summary: The main repository hasn’t had a commit since late 2008. Evaryont made a valiant attempt to take the reins, but his stamina only lasted an extra year to August 2009. Since then numerous people have forked his efforts, mostly to add 1.9 support. These efforts are fragmented though. The inertia of such a large project with no clear leadership sees every man running for himself.

Further, the new maintainers aren’t providing a rock solid base. From Evaryont’s README:
I decided on my own volition that the 40-some (as reported by Github) network should be merged together. I got it to run, but quite haphazardly. There are a lot of new features (mostly undocumented except the git logs) so be a little flexible in your language passed to Chronic. [emphasis mine]

This does not fill me with confidence.

Chronic has a large barrier to entry. Natural date parsing is a big challenge. In the original README, there are ~50 examples of formats it supports, and that is excluding all of the features added in forks in the last two years. The result is a large code base which is intimidating for a new comer, especially with no high level guidance as to how everything fits together. On a project of this size, “the documentation is in the specs” is insufficient. I know what it does, I need to know how it does it.

Chronic solves the wrong problem. I want an alternative to date pickers. As such, I don’t need time support, and I only need very simple day parsing. Chronic seems geared towards a calendar type application (“tomorrow at 6:45pm”), but also parses many expressions which simply are not useful in a real application either because they are obtuse - “7 hours before tomorrow at noon” - or just not how users think about dates - “3 months ago saturday at 5:00 pm”. (Note the last assertion is a totally unsubstantiated claim with no user research to support it.)

Further, it is not hard to find simple examples that Chronic doesn’t support. Omitting a year is an easy one: 14 Sep, April 9.

So what to do?

Chronic needs a leader. Chronic neads a hero. One man to reunite the forks, document the code, and deliver it to the promised land.

I am not that man.

I sketched out the formats I actually needed to support for my application, looked at it and thought “really it can’t be that hard”. Natural date parsing is hard; parsing only the dates your application requires is easy. One hour later I had a gem that not only had 100% support for all of the Chronic features I had been using, but also covered some extra formats I wanted (“14 Sep”), and could also convert a date back into a human readable description. That’s less time than I had already sunk into trying to get Chronic working.

Introducing Kronic.

Less than 100 lines of code, totally specced, totally solved my problem. Ultimately, I don’t want to deal with this problem, so I wanted the easiest solution. While patching Chronic would intuitively appear to be pragmatic, a quick spike in the other direction turned out to be worthwhile. Sometimes 80% just isn’t that hard.

Build time graph with buildhawk

How long your build took to run, in a graph, on a webpage. That’s pretty fantastic. You need to be storing your build time in git notes, as I wrote about a few weeks back. Then simply:

1
2
3
gem install buildhawk
buildhawk > report.html
open report.html

This is a simple gem I hacked together today that parses git log and stuffs the output into an ERB template that uses TufteGraph and some subtle jQuery animation to make it look nice. For extra prettiness, I use the Monofur font, but maybe you are content with your default monospace. If you want to poke around the internals (there’s not much!) have a look on Github.

Six best talks from LSRC 2010

I wrote this last fortnight, but was waiting for videos. Still missing a few, but it’s a start. Enjoy!

I am just finishing up a week in Austin, Texas. I was here for Lone Star Ruby Conference, at which I ran both my Database Is Your Friend Training, and also a full day introduction to MongoDB course. I was then free to enjoy the talks for the remaining two days. Here are my top picks.

Debugging Ruby

Aman Gupta gave a fantastic overview of the low level tools available for debugging ruby applications, including perf-tools, strace, gdb, bleak-house, and some nice ruby wrappers he has written around them. I had heard of these tools before, but was never sure when to use them or where to start if I wanted to use them. Aman’s presentation was the hook I needed to get into these tools, giving plenty of real examples of where they had been useful and how he used them.

Slides

Seven Languages in Seven Weeks

Bruce Tate gave an entertaining talk in which he compared seven languages to movie characters. It was a great narrative, and is energy and excitement about the languages was infectious. He has written a book on the same topic, which I plan on purchasing when I make some time to work through it. There are some sample chapters available at the pragprog site.

Book

Greasing Your Suite

I had seen the content of Nick’s talk “Greasing Your Suite” before in slide format, and it was just as excellent live. Nick takes the run time of a rails test suite from 13 minutes down to eighteen seconds. An incredible effort. While watching his talk I installed and set up his hydra gem, and it was dead simple to get my tests running in parallel. I only added a rake task and a tiny yml file—-no other setup required—-and I got a significant speed up even on trivial test suites. I was impressed at how easy it was to get going, and I’ll be using it on all my apps from now on.

Video (From Goruco, but he gave the same talk)

Deciphering Yehuda

Gregg Pollack’s talk on how some of the techniques used in the internals of rails and bundler work was excellent. While the content wasn’t new to me, I was impressed at Gregg’s ability to explain code on slides, a task difficult to do well. If you ever plan to present you should watch this to pick up some of Gregg’s techniques. I am going to be checking out his Introduction to Rails 3 screencasts for the same reason.

Video

Real Software Engineering

Glenn Vanderburg opened the conference with a fantastic talk on the history of software engineering. This answered a lot of questions that have been floating around my mind, especially to do with the misleading comparisons often made to other engineering disciplines. Give a civil engineer the ability to quickly prototype bridges for little cost, they are going to do a lost less modelling. A mathematical model is simply a way to reduce costs. And cost is always an object. Watch the talk, it’s brilliant.

Video

Keynote

The best overall talk was Tom Preston-Werner’s keynote Friday evening. His mix of story, humour, and inspiration were perfect for a keynote, and his delivery was excellent. He pitched his content expertly and though there was no specific item I hadn’t heard before, it has had a significant impact on my thoughts the past few days. Hopefully a video is up soon.

Speeding Up Rails Rake

On a brand new rails project (this article is rails 3, but the same principle applies to rails 2), rake --tasks takes about a second to run. This is just the time it takes to load all the tasks, as a result any task you define will take at least this amount of time to run, even if it is has nothing to do with rails. Tab completion is slow. That makes me sad.

The issue is that since rails and gems can provide rake tasks for your project, the entire rails environment has to be loaded just to figure out which tasks are available. If you are familiar with the tasks available, you can hack around things to wring some extra speed out of your rake.

WARNING: Hacks abound beyond this point. Proceed at own risk.

Below is my edited Rakefile. Narrative continues in the comments below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Rakefile
def load_rails_environment
  require File.expand_path('../config/application', __FILE__)
  require 'rake'
  Speedtest::Application.load_tasks
end

# By default, do not load the Rails environment. This allows for faster
# loading of all the rake files, so that getting the task list, or kicking
# off a spec run (which loads the environment by itself anyways) is much
# quicker.
if ENV['LOAD_RAILS'] == '1'
  # Bypass these hacks that prevent the Rails environment loading, so that the
  # original descriptions and tasks can be seen, or to see other rake tasks provided
  # by gems.
  load_rails_environment
else
  # Create a stub task for all Rails provided tasks that will load the Rails
  # environment, which in will append the real definition of the task to
  # the end of the stub task, so it will be run directly afterwards.
  #
  # Refresh this list with:
  # LOAD_RAILS=1 rake -T | ruby -ne 'puts $_.split(/\s+/)[1]' | tail -n+2 | xargs
  %w(
    about db:create db:drop db:fixtures:load db:migrate db:migrate:status 
    db:rollback db:schema:dump db:schema:load db:seed db:setup 
    db:structure:dump db:version doc:app log:clear middleware notes 
    notes:custom rails:template rails:update routes secret stats test 
    test:recent test:uncommitted time:zones:all tmp:clear tmp:create
  ).each do |task_name|
    task task_name do
      load_rails_environment
      # Explicitly invoke the rails environment task so that all configuration
      # gets loaded before the actual task (appended on to this one) runs.
      Rake::Task['environment'].invoke
    end
  end

  # Create an empty task that will show up in rake -T, instructing how to
  # get a list of all the actual tasks. This isn't necessary but is a courtesy
  # to your future self.
  desc "!!! Default rails tasks are hidden, run with LOAD_RAILS=1 to reveal."
  task :rails
end

# Load all tasks defined in lib/tasks/*.rake
Dir[File.expand_path("../lib/tasks/", __FILE__) + '/*.rake'].each do |file|
  load file
end

Now rake --tasks executes near instantaneously, and tasks will generally kick off faster (including rake spec). Much nicer!

This technique has the added benefit of hiding all the built in tasks. Depending on your experience this may not be a win, but since I already know the rails ones by heart, I’m usually only interested in the tasks specific to the project.

I don’t pretend this is a pretty or permanent solution, but I share it here because it has made my life better in recent times.

Duplicate Data

UPDATE: If you are on PostgreSQL, check this updated query, it’s more useful.

Forgotten to back validates_uniqueness_of with a unique constraint in your database? Oh no! Here is some SQL that will pull out all the duplicate records for you.

1
2
3
4
5
6
7
8
9
User.find_by_sql <<-EOS
  SELECT * 
  FROM users 
  WHERE name IN (
    SELECT name 
    FROM users 
    GROUP BY name 
    HAVING count(name) > 1);
EOS

You will need your own strategy for resolving the duplicates, since it is totally dependent on your data. Some ideas:

  • Arbitrarily deleting one of the records. Perhaps based on latest update time? Don’t forget about child records! If you have forgotten a uniqueness constraint it is likely you have also forgotten a foreign key, so you will have to delete child records manually.
  • Merge the records, including child records.
  • Manually resolving the conflicts on a case by case basis. Possible if there are not too many duplicates.

STI is the global variable of data modelling

A Single Table Inheritance table is really easy to both update and query. This makes it ideal for rapid prototyping: just throw some extra columns on it and you are good to go! This is why STI is so popular, and it fits perfectly into the Rails philosophy of getting things up and running fast.

Fast coding techniques do not always transfer into solid, maintainable code however. It is really easy to hack something together with global variables, but we eschew them when writing industry code. STI falls into the same category. I have written about the downsides of STI before: it clutters your data model, weakens your data integrity, and can be difficult to index. STI is a fast technique to get started with, but is not necessarily a great option for maintainable applications, especially when there are other modelling techniques such as class table inheritance available.

Updating Class Table Inheritance Tables

My last post covered querying class table inheritance tables; this one presents a method for updating them. Having set up our ActiveRecord models using composition, we can use a standard rails method accepts_nested_attributes_for to allow easy one-form updating of the relationship.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Item < ActiveRecord::Base
  validates_numericality_of :quantity

  SUBCLASSES = [:dvd, :car]
  SUBCLASSES.each do |class_name|
    has_one class_name
  end

  accepts_nested_attributes_for *SUBCLASSES
end

@item = Dvd.create!(
  :title => 'The Matix',
  :item  => Item.create!(:quantity => 1))

@item.update_attributes(
  :quantity => 2,
  :dvd_attributes => {
    :id    => @item.dvd.id,
    :title => 'The Matrix'})

This issues the following SQL to the database:

1
2
UPDATE "items" SET "quantity" = 10 WHERE ("items"."id" = 12)
UPDATE "dvds" SET "title" = 'The Matrix' WHERE ("dvds"."id" = 12)

Note that depending on your application, you may need some extra locking to ensure this method is concurrent, for example if you allow items to change type. Be sure to read the accepts_nested_attributes_for documentation for the full API.

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

Class Table Inheritance and Eager Loading

Consider a typical class table inheritance table structure with items as the base class and dvds and cars as two subclasses. In addition to what is strictly required, items also has an item_type parameter. This denormalization is usually a good idea, I will save the justification for another post so please take it for granted for now.

The easiest way to map this relationship with Rails and ActiveRecord is to use composition, rather than trying to hook into the class loading code. Something akin to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Item < ActiveRecord::Base
  SUBCLASSES = [:dvd, :car]
  SUBCLASSES.each do |class_name|
    has_one class_name
  end

  def description
    send(item_type).description
  end
end

class Dvd < ActiveRecord::Base
  belongs_to :item

  validates_presence_of :title, :running_time
  validates_numericality_of :running_time

  def description
    title
  end
end

class Car < ActiveRecord::Base
  belongs_to :item

  validates_presence_of :make, :registration

  def description
    make
  end
end

A naive way to fetch all the items might look like this:

1
Item.all(:include => Item::SUBCLASSES)

This will issue one initial query, then one for each subclass. (Since Rails 2.1, eager loading is done like this rather than joining.) This is inefficient, since at the point we preload the associations we already know which subclass tables we should be querying. There is no need to query all of them. A better way is to hook into the Rails eager loading ourselves to ensure that only the tables required are loaded:

1
2
3
Item.all(opts).tap do |items|
  preload_associations(items, items.map(&:item_type).uniq)
end

Wrapping that up in a class method on items is neat because we can then use it as a kicker at the end of named scopes or associations – person.items.preloaded, for instance.

Here are some tests demonstrating this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
require 'test/test_helper'

class PersonTest < ActiveRecord::TestCase
  setup do
    item = Item.create!(:item_type => 'dvd')
    dvd  = Dvd.create!(:item => item, :title => 'Food Inc.')
  end

  test 'naive eager load' do
    items = []
    assert_queries(3) { items = Item.all(:include => Item::SUBCLASSES) }
    assert_equal 1, items.size
    assert_queries(0) { items.map(&:description) }
  end

  test 'smart eager load' do
    items = []
    assert_queries(2) { items = Item.preloaded }
    assert_equal 1, items.size
    assert_queries(0) { items.map(&:description) }
  end
end

# Monkey patch stolen from activerecord/test/cases/helper.rb
ActiveRecord::Base.connection.class.class_eval do
  IGNORED_SQL = [/^PRAGMA/, /^SELECT currval/, /^SELECT CAST/, /^SELECT @@IDENTITY/, /^SELECT @@ROWCOUNT/, /^SAVEPOINT/, /^ROLLBACK TO SAVEPOINT/, /^RELEASE SAVEPOINT/, /SHOW FIELDS/]

  def execute_with_query_record(sql, name = nil, &block)
    $queries_executed ||= []
    $queries_executed << sql unless IGNORED_SQL.any? { |r| sql =~ r }
    execute_without_query_record(sql, name, &block)
  end

  alias_method_chain :execute, :query_record
end

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

Last minute training in Seattle

If you or someone you know missed out on Saturday, I’ve scheduled a last minute database training for Seattle tomorrow. Register here. Last chance before I head to Chicago for a training on Friday.

Constraints assist understanding

The hardest thing for a new developer on a project to wrap his head around is not the code. For the most part, ruby code stays the same across projects. My controllers look like your controllers, my models look like your models. What defines an application is not the code, but the domain. The business concepts, and how they are translated into code, can take weeks or months to understand cleanly. Modelling your domain in a way that it is easily understood is an important principle to speed up this learning process.

In an application I am looking at there is an email field in the user model. It is defined as a string that allows null values. This is confusing. I need to figure in what circumstances a null value makes sense (can they choose to withhold that piece of information? Is there a case where a new column I am adding should be null?), which is extra information I need to locate and process before I can understand the code. There is a validates_presence_of declaration on the attribute, but production data has some null values. Two parts of the application are telling me two contradicting stories about the domain.

Further, when I am tracking down a bug in the application, eliminating the possibility that a column could be null is an extra step I need to take. The data model is harder to reason about because there are more possible states than strictly necessary.

Allowing a null value in a column creates another piece of information that a developer has to process. It creates an extra question that needs to be answered when reading the code: in what circumstances is a null value appropriate? Multiply this problem out to multiple columns (and factor in other sub-optimal modeling techniques not covered here), and the time to understanding quickly grows out of hand.

Adding not-null constraints on your database is a quick and cheap way to bring your data model inline with the code that sits on top of it. In addition to cutting lines of code, cut out extraneous information from your data model. For little cost, constraints simplify your application conceptually and allow your data to be reasoned about more efficiently.

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

Concurrency with AASM, Isolation Levels

I’ve posted two guest articles over on the Engine Yard blog this week on database related topics:

They’re in the same vein as what I’ve been posting here, so worth a read if you’ve been digging it.
The US tour kicks off this Saturday in San Francisco, and there’s still a couple of spots available. You can still register over at www.dbisyourfriend.com

“Your Database Is Your Friend” training sessions are happening throughout the US and UK in the coming months. One is likely coming to a city near you. For more information and free screencasts, head on over to www.dbisyourfriend.com

Relational Or NoSQL With Rails?

With all the excitement in the Rails world about “NoSQL” databases like MongoDB, CouchDB, Cassandra and Redis, I am often asked why am I running a course on relational databases?

The “database is your friend” ethos is not about relational databases; it’s about finding the sweet spot compromise between the tools you have available to you. Typically the database has been underused in Rails applications—to the detriment of both quality and velocity—and my goal is to provide tools and understanding to ameliorate this neglect, no matter whether you are using Oracle or Redis.

The differences between relational and NoSQL databases have been documented extensively. To quickly summarize the stereotypes: relational gives you solid transactions and joins, NoSQL is fast and scales. In addition, the document oriented NoSQL databases (NoSQL is a bit of a catch-all: there’s a big difference between key/value stores and document databases) enable you to store “rich” documents, a powerful modelling tool.

That’s a naive summary, but gives you a general idea of the ideologies. To make a fair comparison between the two you need to understand both camps. If you don’t know what a relational database can do for you in terms of transactional support or data integrity, you will not know what your are losing when choosing NoSQL. Conversely, if you are not familiar with document modelling techniques and why denormalization isn’t so scary, you are going to underrate NoSQL technologies and handicap yourself with a relational database.

For example, representing a many-to-many relationship in a relational database might look something like:

1
2
3
Posts(id, title, body)
PostTags(post_id, tag_id)
Tags(id, name)

This is a standard normalization, and relational databases are tuned to deal with this scenario using joins and foreign keys. In a document database, the typical way to represent this is:

1
2
3
4
5
{
  title: 'My Post',
  body: 'This post has a body',
  tags: ['ruby', 'rails']
}

Notice the denormalization of tags so that there is no longer a table for it, creating a very nice conceptual model—everything to do with a post is included in the one object. The developer only superficially familiar with document modelling will quickly find criticisms, however. To choose just one, how do you get a list of all tags? This specific problem has been addressed by the document crowd, but not in a way that relational developers are used to thinking: map/reduce.

1
2
3
4
5
6
7
8
9
10
db.runCommand({
  mapreduce: 'posts',
  map: function() { 
    for (index in this.tags) {
        emit(this.tags[index], 1);
    }
  },
  reduce: function(key, values) { return; },
  out: "tags"
})

This function can be run periodically to create a tags collection from the posts collection. It’s not quite real-time, but will be close enough for most uses. (Of course if you do want real-time, there are other techniques you can use.) Yes, the query is more complicated than just selecting out of Tags, but inserting and updating an individual post (the main use case) is simpler.

I’m not arguing one side or another here. This is just one simplistic example to illustrate my point that if you don’t know how to use document database specific features such as map/reduce, or how to model your data in such a way as to take advantage of them, you won’t be able to adequately evaluate those databases. Similarly, if you don’t know how to use pessimistic locking or referential integrity in a relational database, you will not see how much time and effort it could be saving you over trying to implement such robustness in a NoSQL database that wasn’t designed for it.

It is imperative that no matter which technology you ultimately choose for your application (or even if you mix the two!), that you understand both sides thoroughly so that you can accurately weigh up the costs and benefits of each.

The pitch

This is why I’m excited to announce a brand new training session on MongoDB. For the upcoming US tour, this session will be only be offered once exclusively at the Lone Star Ruby Conference. The original relational training is the day before the conference (at the same venue), to create a two day database training bonanza: relational on Wednesday 25th August, MongoDB on Thursday 26th.

We’ll be adding MongoDB to an existing site—Spacebook, the social network for astronauts!—to not only learn MongoDB in isolation, but practically how to integrate it into your existing infrastructure. The day starts with the basics: What it is, what it isn’t, how to use it, how to integrate with Rails, and we’ll build and investigate some of the typical MongoDB use cases like analytics tracking. As we become comfortable, we will move into some more advanced querying and data modelling techniques that MongoDB excels at to ensure we are getting the most out of the technology, and discuss when such techniques are appropriate.

Since I am offering the MongoDB training in association with the Lone Star Ruby Conference, you will have to register for the conference to attend. At only an extra $175 above the conference ticket, half price of the normal cost, the Lone Star Ruby Conference MongoDB session is the cheapest this training will ever be offered, not to mention all the win of the rest of the conference! Aside from the training, it has a killer two-day line up of talks which are going to be awesome. I’m especially excited about the two keynotes by Tom Preson-Werner and Blake Mizerany, and there’s some good database related talks to get along to: Adam Keys is giving the low down on the new ActiveModel in rails 3, Jesse Wolgamott is comparing different NoSQL technologies, and Bernerd Schaefer will be talking about what Mongoid (the ORM we’ll be using with Spacebook) is doing to stay at the head of the pack. I’ll certainly be hanging around.

Register for the relational training separately. There’s a $50 early bird discount for the next week (in addition to the half price Mongo training), but if you miss that and are attending both sessions get in touch and I’ll extend the offer for you. This is probably going to send me broke, but I really just want to get this information out there. Cheaper, higher quality software makes our industry better for everyone.

“Your Database Is Your Friend” training sessions are happening throughout the US and UK in the coming months. One is likely coming to a city near you. For more information and free screencasts, head on over to www.dbisyourfriend.com

Five Tips For Adding Foreign Keys To Existing Apps

You’re convinced foreign keys are a good idea, but how should you retroactively add them to your production application? Here are some tips to help you out.

Identify and fix orphan records. If orphan records exist, creating a foreign key will fail. Use the following SQL to identify children that reference a parent that doesn’t exist:

1
SELECT * FROM children LEFT JOIN parents ON parent_id = parents.id WHERE parents.id IS NULL

Begin with new or unimportant relationships. With any new change, it’s best to walk before you run. Targeting the most important relationships in your application head on can quickly turn into a black hole. Adding foreign keys to new or low value relationships first means you have a smaller code base that is affected, and allows you to test your test suite and plugins for compatibility over a smaller area. Get this running in production early, so any issues will crop up early on low value code where they’ll be easier to fix. Be agile in your approach and iterate.

Move away from fixtures and mocking in your tests. Rails fixture code is not designed to work well with foreign keys. (Fixtures are generally not a good idea regardless.) Also, the intense stubbing of models that was in vogue back when rspec first came on the scene doesn’t play nice either. The current best practice is to use object factories (such as Machinist) to create your test data, and this works well with foreign keys.

Use restrict rather than cascade for ON DELETE. You still want to keep on_destroy logic in your models, so even if conceptually a cascading delete makes sense, implement it using the :dependent => :destroy option to has_many, with a restrict option at the database level to ensure all cascading deletes run through your callbacks.

Be pragmatic. Ideally every relationship will have a foreign key, but for that model filled with weird hacks and supported by a massive old school test suite, it may be just too much effort to get everything working smoothly with database constraints. In this case, set up a test suite that runs over your production data regularly to quickly identify any data problems that arise (see the SQL above).

Foreign keys give you confidence and piece of mind about your data and your application. Rails may be afraid of them, but that doesn’t mean you have to be.

July through September I am running full day training sessions in the US and UK on how to make use of your database and write solid Rails code, increasing your quality without compromising your velocity. Chances are I’m coming to your city, so check it out at http://www.dbisyourfriend.com

acts_as_state_machine is not concurrent

Here is a short 4 minute screencast in which I show you how the acts as state machine (AASM) gem fails in a concurrent environment, and also how to fix it.


(If embedding doesn’t work or the text is too small to read, you can grab a high resolution version direct from Vimeo)

It’s a pretty safe bet that you want to obtain a lock before all state transitions, so you can use a bit of method aliasing to do just that. This gives you much neater code than the quick fix I show in the screencast, just make sure you understand what it is doing!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class ActiveRecord::Base
  def self.obtain_lock_before_transitions
    AASM::StateMachine[self].events.keys.each do |t|
      define_method("#{t}_with_lock!") do
        transaction do
          lock!
          send("#{t}_without_lock!")
        end
      end
      alias_method_chain "#{t}!", :lock
    end
  end
end

class Tractor
  # ...

  aasm_event :buy do
    transitions :to => :bought, :from => [:for_sale]
  end

  obtain_lock_before_transitions
end

This is a small taste of my DB is your friend training course, that helps you build solid rails applications by finding the sweet spot between stored procedures and treating your database as a hash. July through September I am running full day sessions in the US and UK. Chances are I’m coming to your city. Check it out at http://www.dbisyourfriend.com

Three Reasons Why You Shouldn't Use Single Table Inheritance

It creates a cluttered data model. Why don’t we just have one table called objects and store everything as STI? STI tables have a tendency to grow and expand as an application develops, and become intimidating and unweildy as it isn’t clear which columns belong to which models.

It forces you to use nullable columns. A comic book must have an illustrator, but regular books don’t have an illustrator. Subclassing Book with Comic using STI forces you to allow illustrator to be null at the database level (for books that aren’t comics), and pushes your data integrity up into the application layer, which is not ideal.

It prevents you from efficiently indexing your data. Every index has to reference the type column, and you end up with indexes that are only relevant for a certain type.

The only time STI is the right answer is when you have models with exactly the same data, but different behaviour. You don’t compromise your data model, and everything stays neat and tidy. I have yet to see a case in the wild where this rule holds, though.

If you are using STI (or inheritance in general) to share code, you’re doing it wrong. Having many tables does not conflict with the Don’t-Repeat-Yourself principle. Ruby has modules, use them. (I once had a project where a 20 line hash drove the creation of migrations, models, data loaders and test blueprints.)

What you should be doing is using Class Table Inheritance. Rails doesn’t “support it natively”, but that doesn’t particularly mean much since it’s a simple pattern to implement yourself, especially if you take advantage of named scopes and delegators. Your data model will be much easier to work with, easier to understand, and more performant.

I expand on this topic and guide you through a sample implementation in my DB is your friend training course. July through September I am running full day sessions in the US and UK. Chances are I’m coming to your city. Check it out at http://www.dbisyourfriend.com

Debugging Deadlocks In Rails

Here is an 13 minute long screencast in which I show you how to go about tracking down a deadlock in a ruby on rails application. I make two assumptions:

  1. You are using MySQL
  2. You know the difference between shared and exclusive locks (in short: a shared lock allows other transactions to read the row, an exclusive blocks out everyone)


(If embedding doesn’t work or the text is too small to read, you can grab a high resolution version direct from Vimeo)

This is only one specific example of a deadlock, in reality there are many ways this can occur. The process for tracking them down is always the same though. If you get stuck, read through the innodb documentation again. Something normally jumps out. If you are not sure what ruby code is generating what SQL, the query trace plugin is excellent. It gives you a stack trace for every single SQL statement ActiveRecord generates.

This is a small taste of the type of thing I cover in my DB is your friend training course. July through September I am running full day sessions in the US and UK. Chances are I’m coming to your city. Check it out at http://www.dbisyourfriend.com

acts_as_list will break in production

acts_as_list doesn’t work in a typical production deployment. It pretends to for a while, but every application will eventually have issues with it that result in real problems for your users. Here is a short 4 minute long screencast showing you how it breaks, and also a quick fix which will prevent your data from becoming corrupted.

(View it over at Vimeo if embedding doesn’t work for you)

Here is the “quick fix” I apply in the screencast. It’s ugly, but it will work.

1
2
3
4
5
6
7
8
def move_down
  Tractor.transaction do
    Tractor.connection.execute("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE")
    @tractor = Tractor.find(params[:id])
    @tractor.move_to_bottom
  end
  redirect_to(tractors_path)
end

Some things to note when fixing your application in a nicer way:

  1. This is not MySQL specific, all databases will exhibit this behaviour.
  2. The isolation level needs to be set as the first statement in the transaction (or globally, but you don’t want serializable globally!)
  3. For bonus points, add a unique index to the position column, though you’ll have to re-implement most of acts_as_list to make it work.
  4. It’s possible to do this under read committed, but it’s pretty complicated and optimised for concurrent access rather than individual performance.
  5. Obtaining a row lock before moving will fix this specific issue, but won’t address all the edge cases.

_This is a small taste of the type of thing I cover in my DB is your friend training course. July through September I am running full day sessions in the US and UK. Chances are I’m coming to your city. Check it out at http://www.dbisyourfriend.com _

Rails DB Training - US/UK Tour

You may not know, but I run an advanced rails training session titled “Your Database Is Your Friend”. Previously, I have only done this in Australia. Starting late July, I will be running this session throughout the United States and the United Kingdom. I’m still planning dates and venues, if you or someone you know is interested in hosting a session, please get in touch.

For details, see the DB is your friend rails training website.

Ruby debugging with puts, tap and Hirb

I use puts heaps when debugging. Combined with tap, it’s pretty handy. You can jump right in the middle of a method chain without having to move things around into variables.

1
x = long.chain.of.methods.tap {|x| puts x }.to.do.something.with

I thought hey why don’t I merge the two? And for bonus points, add in Hirb’s table display to format my models nicely. These are fairly personal customizations, and aren’t specific to a project, so I put them in my own ~/.railsrc file rather than each project.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# config/initializers/developer_specific_customizations.rb
if %w(development test).include?(Rails.env)
  railsrc = "#{ENV['HOME']}/.railsrc"
  load(railsrc) if File.exist?(railsrc)
end

# ~/.railsrc
require 'hirb'

Hirb.enable :pager => false

class Object
  def tapp(prefix = nil, &block)
    block ||= lambda {|x| x }

    tap do |x|
      value = block[x]
      value = Hirb::View.formatter.format_output(value) || value.inspect

      if prefix
        print prefix
        if value.lines.count > 1
          print ":\n"
        else
          print ": "
        end
      end
      puts value
    end
  end
end

# Usage (in your spec files, perhaps?)
"hello".tapp           # => hello
"hello".tapp('a')      # => a - "hello
"hello".tapp(&:length) # => 5
MyModel.first.tapp # =>
#  +----+-------------------------+
#  | id | created_at              |
#  +----+-------------------------+
#  | 7  | 2009-12-29 00:15:56 UTC |
#  +----+-------------------------+
#  1 row in set

BacktraceCleaner and gems in rails

UPDATE: Fixed the monkey-patch to match the latest version of the patch, and to explicitly require Rails::BacktraceCleaner before patching it to make sure it has been loaded

If there’s one thing my mother taught me, if you’re going to clean something up you may as well do it properly. Be thorough, cover every surface.

Rails::BacktraceCleaner is a bit sloppy when it comes to gem directories. It misses all sorts of dust – hyphens, underscores, upper case letters, numbers. That’s not going to earn any pocket money. Let’s teach it a lesson.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# config/initializers/this_is_what_a_gem_looks_like.rb
require 'rails/backtrace_cleaner'

module Rails
  class BacktraceCleaner < ActiveSupport::BacktraceCleaner
    private
      GEM_REGEX = "([A-Za-z0-9_-]+)-([0-9.]+)"

      def add_gem_filters
        Gem.path.each do |path|
          # http://gist.github.com/30430
          add_filter { |line| line.sub(/(#{path})\/gems\/#{GEM_REGEX}\/(.*)/, '\2 (\3) \4')}
        end

        vendor_gems_path = Rails::GemDependency.unpacked_path.sub("#{RAILS_ROOT}/",'')
        add_filter { |line| line.sub(/(#{vendor_gems_path})\/#{GEM_REGEX}\/(.*)/, '\2 (\3) [v] \4')}
      end
  end
end

I’ve submitted a patch to rails, please review if you like.

Kudos to Matthew Todd for pairing with me on this.

Acts_as_state_machine locking

consider the following!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Door < ActiveRecord::Base
  acts_as_state_machine :initial => :closed

  state :closed
  state :open, :enter => :say_hello

  event :open do
    transitions :from => :closed, :to => :open
  end

  def say_hello
    puts "hello"
  end
end

door = Door.create!

fork do
  transaction do
    door.open!
  end
end

door.open!

# >> hello
# >> hello

It’s broken, you can only open a door once. This is a classic double-update problem. One way to solve is with pessimistic locking. I made some codes that automatically lock any object when you call an event on it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class ActiveRecord::Base
  # Forces all state transition events to obtain a DB lock
  def self.obtain_lock_before_all_state_transitions
    event_table.keys.each do |transition|
      define_method("#{transition}_with_lock!") do
        self.class.transaction do
          lock!
          send("#{transition}_without_lock!")
        end
      end
      alias_method_chain "#{transition}!", :lock
    end
  end
end

class Door < ActiveRecord::Base
  # ... as before

  obtain_lock_before_all_state_transitions
end

beware! Your state transitions can now throw ActiveRecord::RecordNotFound errors (from lock!), since the object may have been deleted before you got a chance to play with it.

If you’re not using any locking in your web app, you’re probably doing it wrong. Just sayin’.

Selenium, webrat and the firefox beta

*UPDATE: * Latest version of webrat (0.5.3, maybe earlier) includes a fixed version of selenium, so you shouldn’t need this hack.

I needed a few hacks to get selenium running with webrat.

First, make sure you are running at least 0.4.4 of webrat. Don’t make the same mistake I did and upgrade your gem version, but not the plugin installed in vendor/plugins.

1
2
3
gem install webrat
gem install selenium-client
gem install bmabey-database_cleaner --source=http://gems.github.com

There is a trick to get Firefox 3.5 beta working. The selenium server package with webrat 0.4.4 only supports FF 3.0.*. Follow these instructions, patching the jar that is packaged with webrat (vendor/selenium-server.jar) so that the extensions that selenium uses will be valid for the new FF.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cd vendor/ # In webrat dir
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/readystate@openqa.org/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/{538F0036-F358-4f84-A764-89FB437166B4}/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/\{503A0CD4-EDC8-489b-853B-19E0BAA8F0A4\}/install.rdf 
jar xf selenium-server.jar \
customProfileDirCUSTFF/extensions/readystate\@openqa.org/install.rdf 
jar xf selenium-server.jar \
customProfileDirCUSTFF/extensions/\{538F0036-F358-4f84-A764-89FB437166B4\}/install.rdf

replace "3.0.*" "3.*" -- `find . | grep rdf`

jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/readystate@openqa.org/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/{538F0036-F358-4f84-A764-89FB437166B4}/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/\{503A0CD4-EDC8-489b-853B-19E0BAA8F0A4\}/install.rdf 
jar uf selenium-server.jar \
customProfileDirCUSTFF/extensions/readystate\@openqa.org/install.rdf 
jar uf selenium-server.jar \
customProfileDirCUSTFF/extensions/\{538F0036-F358-4f84-A764-89FB437166B4\}/install.rdf

(hat tip to space vatican)

I haven’t been able to get Safari working yet.

I want to run selenium tests besides normal webrat tests, so I created a new environment “acceptance” that I can run tests under. Modify your test helper file:

1
2
3
4
5
6
7
8
# test/test_helper.rb
ENV["RAILS_ENV"] ||= "test"
raise "Can't run tests in #{ENV['RAILS_ENV']} environment" unless %w(test acceptance).include?(ENV["RAILS_ENV"])

require 'webrat'
require "test/env/#{ENV["RAILS_ENV"]}"

# ...
1
2
3
4
5
6
7
# test/env/test.rb
require 'webrat/rails'

Webrat.configure do |config|
  config.mode = :rails
  config.open_error_files = false
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# test/env/acceptance.rb
require 'webrat/selenium/silence_stream'
require 'webrat/selenium'
require 'test/selenium_helpers'
require 'test/element_helpers'

# Required because we aren't isolating tests inside a transaction
require 'database_cleaner'
DatabaseCleaner.strategy = :truncation

Webrat.configure do |config|
  config.mode = :selenium
end

class ActionController::IntegrationTest
  self.use_transactional_fixtures = false # Necessary, otherwise selenium will never see any changes!

  setup do |session|
    session.host! "localhost:3001"
  end

  teardown do
    DatabaseCleaner.clean
  end
end

# Hack: webrat requires this, even though we're not using rspec
module Spec
  module Expectations
    class ExpectationNotMetError < Exception
    end
  end
end
1
2
3
4
5
6
7
8
9
10
11
# lib/tasks/test.rake
namespace :test do
  task :force_acceptance do
    ENV["RAILS_ENV"] = 'acceptance'
  end

  Rake::TestTask.new(:acceptance => :force_acceptance) do |t|
    t.test_files = FileList['test/acceptance/*_test.rb']
    t.verbose = true
  end
end

Notes

  • selenium and javascript helpers are from pivotallabs pat, they’re really handy for testing visibilty of DOM elements
  • there’s some magic in webrat to conditionally require silence_stream based on something in active_support. I don’t understand it quite enough, but requiring it explicitly was necessary to get things running for me
  • webrat/selenium assumes some classes are loaded that only happens if you’re using rspec. I’m not, so stubbed out the ExpectationNotMetError (it is only referred to in a rescue block).
  • rake test:acceptance runs the selenium tests. Running acceptance tests directly as a ruby script runs them using normal webrat – this is actually handy when writing tests because you get a quicker turnaround.
  • to pause selenium mid test run (to see wtf is going on), just add gets at the appropriate line in your test

Faster rails testing with ruby_fork

A long running test suite isn’t the problem. Your build server can take care of that. A second or two here or there, no one notices.

The killer wait is in the red/green/refactor loop. You’re only running one or two tests, and an extra second can mean the difference between getting into flow or switching to twitter. And you know what kills you in rails?

1
2
3
4
5
$ time ruby -e '' -r config/environment.rb

real    0m3.784s
user    0m2.707s
sys     0m0.687s

Yep, the environment. That’s a lot of overhead to be waiting for everytime you run a test, especially since it’s the same code every time! You fix this with a clever script called ruby_fork that’s included in the ZenTest package. It loads up your environment, then just chills out, waiting. You send a ruby file to it, and it forks itself (the process containing the environment) to execute that file. The beauty of this is that forking is really quick, and it leaves a pristine copy of the environment around for the next test run.

‘Environment’ doesn’t just have be environment.rb, for bonus points you can load up test_helper.rb, which will also load your testing framework into memory. In fact, you can preload any ruby code at all – ruby_fork isn’t rails specific.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ ruby_fork -r test/test_helper.rb &
/opt/local/bin/ruby_fork Running as PID 526 on 9084

$ time ruby_fork_client -r test/unit/your_test.rb
Started
...
Finished in 0.565636 seconds. # Aside: this time is bollocks

3 tests, 4 assertions, 0 failures, 0 errors

real    0m0.972s # This is the time you're interested in
user    0m0.225s
sys     0m0.035s

That’s fantastic, though you’ll notice in newer versions of rails your application code is not reloaded. By default your test environment caches classes – which normally isn’t a problem except that newer rails versions also eager load those classes (so they’re loaded when you load enviornment.rb). You can fix this by clearing out the eager load paths in your test environment file:

1
2
# config/environments/test.rb
config.eager_load_paths = []

On my machine this gets individual test runs down from about 4 seconds to less than 1 second. You can sell that to your boss as a four-fold productivity increase.

Backup MySQL to S3 with Rails

UPDATE: This code is too old. Use db2fog instead. It does the same thing but better.

Here is some code I wrote over the weekend – db2s3. It’s a rails plugin that provides rake tasks for backing up your database and storing it on Amazon’s S3 cloud storage. S3 is a trivially cheap offsite backup solution – for small databases it costs about 4 cents per month, even if you’re sending full backups every hour.

There are many scripts around that do this already, but they fail to address the biggest actual problem. The aws-s3 gem provides a really nice ruby interface to S3, and dumping a backup then storing it really isn’t that hard. The real problem is that I really hate system administration. I want to spend as little time as possible and I want things to Just Work.

A script is great but there’s still too many things for me to do. Where does it go in my project? How do I set my credentials? How do I call it?

That’s why a plugin was needed. It’s as little work as possible for a rails developer to backup their database, so they can get back to making their app awesome.

db2s3. Check it out.

Singleton resource, pluralized controller fix for rails

map.resource still looks for a pluralized controller. This has always bugged me. Here’s a quick monkey patch to fix. Tested on rails 2.2.2.

1
2
3
4
5
6
7
8
9
10
11
12
13
# config/initializers/singleton_resource_fix.rb
module ActionController
  module Resources
    class SingletonResource < Resource #:nodoc:
      def initialize(entity, options)
        @singular = @plural = entity
        # options[:controller] ||= @singular.to_s.pluralize
        options[:controller] ||= @singular.to_s # This is the only line to change
        super
      end
    end
  end
end
1
2
3
4
5
6
# config/routes.rb
# before fix
map.resource :session, :controller => 'sessions'

# after fix
map.resource :session

Integration testing with Cucumber, RSpec and Thinking Sphinx

Ideally you would want to include sphinx in your integration tests. It’s really just like your database. In practice, this is problematic. Ensuring the DB is started and triggering a re-index after each model load is doable, if slow, with a small bit of hacking of thinking sphinx (hint – change the initializer for the ThinkingSphinx::Configuration to allow you to specify the environment). Here’s the rub though – if you’re using transactional fixtures the sphinx indexer won’t be able to see any of your data! Turning that off can really slow down your tests, and once you add in the re-indexing time you’re going to be making a few cups of coffee while they run.

One approach I’ve been taking is to stub out the search methods with RR. I know, I know, stubbing in your integration tests is evil. I’m being pragmatic here. For most applications your search is trivial (find me results for this keyword), and if you unit test your define_index block you’re pretty well covered. To go one step further you could unit test your controllers with an expect on the search method, or have a separate suite of non-transactional integration tests running against sphinx. I like the latter, but haven’t done it yet.

Enough talk! Here’s the magic you need to get it working with cucumber:

1
2
3
4
5
6
7
8
9
# features/steps/env.rb
require 'rr'
Cucumber::Rails::World.send(:include, RR::Adapters::RRMethods)

# features/steps/*_steps.rb
Given /a car with model '(\w+)' exists/ do |model|
  car = Car.create!(:model => model)
  stub(Car).search(model) { [car] }
end

Speeding up Rails Initialization

Chad Wooley just posted a tip to get rails starting up faster. Which is real, except it doesn’t work if you’re using ActiveScaffold. This is due to a load ordering problem – ActiveScaffold monkey patches the Resource class used by routes after routes have been parsed the first time, and relies on the re-parsing triggered by the inflections change.

To fix this, you can explicitly require the monkey patch just before you draw your routes (it doesn’t depend on anything else in ActiveScaffold).

1
2
3
4
5
6
7
# config/routes.rb
ActionController::Routing::Routes.draw do |map|
  # Explicitly require this, otherwise it won't get loaded before we parse our resources time
  require 'vendor/plugins/active_scaffold/lib/extensions/resources.rb'

  # Your routes go here...
end

Yes it’s a hack on top of hack, but I get my console 30% quicker, so I’m running with it.

Tested on 2.0.2

Testing flash.now with RSpec

flash.now has always been a pain to test. The the traditional rails approach is to use assert_select and find it in your views. This clearly doesn’t work if you want to test your controller in isolation.

Other folks have found work arounds to the problem, including mocking out the flash or monkey patching it.

These solutions feel a bit like using a sledgehammer to me. If you’re going to monkey patch/mock something, you want it to be as discreet as possible so to minimize the chance of the implementation changing underneath you and also to reduce the affect on other areas of your application. Also, why duplicate perfectly good code that is provided elsewhere?

The real problem with testing flash.now is that it gets cleaned up (via #sweep) at the end of the action before you get to test anything. So let’s solve that problem and that problem only: disable sweeping of flash.now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# spec/spec_helper.rb
module DisableFlashSweeping
  def sweep
  end
end

# A spec
describe BogusController, "handling GET to #index" do
  it "sets flash.now[:message]" do
    @controller.instance_eval { flash.extend(DisableFlashSweeping) }
    get :index
    flash.now[:message].should_not be_nil
  end
end

instance_eval is used to access the flash, since it’s a protected method, and we extend with the minimum possible code to do what we want – blanking out the sweep method. This should not cause problems because sweeping is only relevant across multiple requests, which we shouldn’t be doing in our controller specs.

Unobtrusive live comment preview with jQuery

Live preview is shiny. First get your self a URL that renders a comment. In rails maybe something like the following.

1
2
3
4
5
6
7
8
9
def new
  @comment = Comment.build_for_preview(params[:comment])

  respond_to do |format|
    format.js do
      render :partial => 'comment.html.erb'
    end
  end
end

Now you should have a form or div with an ID something like “new_comment”. Just drop in the following JS (you may need to customize the submit_url).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$(function() { // onload
  var comment_form = $('#new_comment')
  var input_elements = comment_form.find(':text, textarea')
  var submit_url = '/comments/new'  
  
  var fetch_comment_preview = function() {
    jQuery.ajax({
      data: comment_form.serialize(),
      url:  submit_url,
      timeout: 2000,
      error: function() {
        console.log("Failed to submit");
      },
      success: function(r) { 
        if ($('#comment-preview').length == 0) {
          comment_form.after('<h2>Your comment will look like this:</h2><div id="comment-preview"></div>')
        }
        $('#comment-preview').html(r)
      }
    })
  }

  input_elements.keyup(function () {
    fetch_comment_preview.only_every(1000);
  })
  if (input_elements.any(function() { return $(this).val().length > 0 }))
    fetch_comment_preview();
})

The only_every function is they key to this piece – it ensures that an AJAX request will be sent at most only once a second so you don’t overload your server or your client’s connection.

Obviously you’ll need jQuery, less obviously you’ll also need these support functions

1
2
3
4
5
6
7
8
9
10
11
12
13
// Based on http://www.germanforblack.com/javascript-sleeping-keypress-delays-and-bashing-bad-articles
Function.prototype.only_every = function (millisecond_delay) {
  if (!window.only_every_func)
  {
    var function_object = this;
    window.only_every_func = setTimeout(function() { function_object(); window.only_every_func = null}, millisecond_delay);
   }
};

// jQuery extensions
jQuery.prototype.any = function(callback) { 
  return (this.filter(callback).length > 0)
}

Viola, now you’re shimmering in awesomeness.
Demo up soon, but it’s similar to what you see on this blog (though this blog is done with inline prototype).

AtomFeedHelper produces invalid feeds

Summary: atom_feed is broken until changeset 8529

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# http://api.rubyonrails.org/classes/ActionView/Helpers/AtomFeedHelper.html#M000931
atom_feed do |feed|
  feed.title("My great blog!")
  feed.updated((@posts.first.created_at))

  for post in @posts
    feed.entry(post) do |entry|
      entry.title(post.title)
      entry.content(post.body, :type => 'html')

      entry.author do |author|
        author.name("DHH")
      end
    end
  end
end

Produces the following feed (rails 2.0.2)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <id>tag:localhost:posts</id>
  <link type="text/html" rel="alternate" href="http://localhost:3000"/>
  <title>My great blog!</title>
  <updated>2007-12-23T04:23:07+11:00</updated>
  <entry>
    <id>tag:localhost:3000:Post1</id>
    <published>2007-12-23T04:23:07+11:00</published>
    <updated>2007-12-30T15:29:55+11:00</updated>
    <link type="text/html" rel="alternate" href="http://localhost:3000/posts/1"/>
    <title>First post</title>
    <content type="html">Check out the first post</content>
    <author>
      <name>DHH</name>
    </author>
  </entry>
</feed>

Let’s run that through the feed validator

1
2
3
line 3, column 25: id is not a valid TAG
line 2, column 0: Missing atom:link with rel="self"
line 8, column 32: id is not a valid TAG

Oh dear. Not a happy result. Let’s fix it.

Problem the first is the feed ID tag. It doesn’t include a date, as per the Tag URI specification. This is a little bit tricky – you can’t just add Time.now.year as a default because that will change every year, and we need IDs to stay the same. We will provide an option to the user to specify the schema date, and produce a warning if they do not (as much as I’d like to just break it, the pragmatic side of me keeps backwards compatibility in).

The entry tag has the same problem, but you’ll also note it concatenates the class and the ID with no separator to create the ID. While it’s an edge case, this will break if you have a class name ending in a number, so we need to add in a separator. I vote for a slash. Also, the port in the tag URI is inconsistent with the feed URI (no port), so remove it.

For further reading, I recommend How to make a good ID in Atom.

The missing self link is just your garden variety bug – the documentation says it should be provided by default, but the code does not.

I went ahead and fixed these problems. Changeset 8529. The example above, when you change the call to atom_feed(:schema_date => 2008), looks like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <id>tag:localhost:/posts</id>
  <link type="text/html" rel="alternate" href="http://localhost:3000"/>
  <link type="application/atom+xml" rel="self" href="http://localhost:3000/posts.atom"/>
  <title>My great blog!</title>
  <updated>2007-12-23T04:23:07+11:00</updated>
  <entry>
    <id>tag:localhost:Post/1</id>
    <published>2007-12-23T04:23:07+11:00</published>
    <updated>2007-12-30T15:29:55+11:00</updated>
    <link type="text/html" rel="alternate" href="http://localhost:3000/posts/1"/>
    <title>First post</title>
    <content type="html">HOORAY. About ruby.</content>
    <author>
      <name>DHH</name>
    </author>
  </entry>
</feed>

mmm, semantic goodness

Test setup broken in Rails 2.0.2

Some changes went into rails 2.0.2 that mean the setup method in test subclasses won’t get called. Here’s how it went down:

  • 8392 broke it
  • 8430 tagged 2.0.2
  • 8442 reverted 8392
  • 8445 added a test so it doesn’t break again

You can see some code illustrating the problem in 8445. This affects two plugins that we’re using – helper_test and activemessaging.

For the helper test, the work around is to rename your helper test setup methods to setup_with_fixtures.

1
2
3
def setup_with_fixtures
  super
end

For activemessaging, add the following line to the setup of your functionals that are failing (from the mailing list):

1
ActiveMessaging.reload_activemessaging

Rails devs, reclaim your harddrive

1
2
cd code-dir
find . | egrep "(development|test)\\.log" | grep -v .svn | xargs rm

I’d forgotten to clear out my logs for a long while. This found me 9.5Gb!

  • Posted on December 17, 2007
  • Tagged code, rails

Logging SQL statistics in rails

When your sysadmin comes to you whinging with a valid concern that your app is reading 60 gazillion records from the DB, you kinda wish you had a bit more information than % time spent in the DB. So I wrote a plugin that counts both the number of selects/updates/inserts/deletes and also the number of records affected. [This plugin is no longer available, the code is below for posterity.]

That does the counting, you need to decide how to log it. I am personally quite partial to adding it to the request log line, thus getting stats per request:

1
2
3
4
5
# vendor/rails/actionpack/lib/action_controller/benchmarking.rb:75
log_message << " | Select Records: #{ActiveRecord::Base.connection.select_record_count}"
log_message << " | Selects: #{ActiveRecord::Base.connection.select_count}"

ActiveRecord::Base.connection.reset_counters!

Don’t forget the last line, otherwise you get cumulative numbers. That may be handy, but I doubt it. We’re only logging selects because that’s all we care about at the moment. I am sure this will change in time.

UPDATE: Moved to github, bzr repo is no longer available

UPDATE 2: Pasted code inline below, it’s way old and probably doesn’t work anymore.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
module ActiveRecord::ConnectionAdapters
  class MysqlAdapter
    class << self
      def counters
        @counters ||= []
      end

      def attr_accessor_with_default(name, default)
        attr_accessor name
        define_method(name) do
          instance_variable_get(:"@#{name}") || default 
        end
      end

      def define_counter(name, record_func = lambda {|ret| ret })
        attr_accessor_with_default("#{name}_count", 0)
        attr_accessor_with_default("#{name}_record_count", 0)

        define_method("#{name}_with_counting") do |*args|
          ret = send("#{name}_without_counting", *args)
          send("#{name}_count=", send("#{name}_count") + 1)
          send("#{name}_record_count=", send("#{name}_record_count") + record_func[ret])
          ret
        end
        alias_method_chain name, :counting

        self.counters << name
      end
    end

    define_counter :select, lambda {|ret| ret.length }
    define_counter :update
    define_counter :insert
    define_counter :delete

    def reset_counters!
      self.class.counters.each do |counter|
        self.send("#{counter}_count=", 0)
        self.send("#{counter}_record_count=", 0)
      end
    end
  end
end

Extending Rails

Previously, I extended rails by monkey patching stuff in lib/. This was good because it kept vendor/rails clean.

I have changed my mind!

I now just patch vendor/rails directly with a comment prefixed by RBEXT explaining why. This means that when I piston update rails, I get notified of any conflicts immediately, rather than having to remember what was in lib. It’s also much easier and quicker than monkey patching. Theoretically, I could also run the rails tests to make sure everything is still kosher, but I must confess I haven’t gotten around to patching the tests as well…

And the comments are ace because I can use this sweet rake task to see what rb-rails currently looks like:

1
2
3
4
desc "Show all RB extensions in vendor/"
task :core_extensions do
  FileList["vendor/**/*.rb"].egrep(/RBEXT/)
end

How we use the Presenter pattern

FAKE EDIT: I wrote this article just after RailsConf but have just got around to publishing it. Jay has since written a follow up which is worthwhile reading.

I may have been zoning out during Jay Fields talk at RailsConf – not sleeping for a few days will do that to you – but I think I got the gist of his presentation: “Presenter” isn’t really a pattern because it’s use is to specific and there isn’t anything that be generalized from it. Now, I’m not going to argue with Jay, but I thought it may be helpful to give an example of how we’re using this “pattern” and how it is helpful for us at redbubble.

Uploading a piece of work to redbubble requires us to create two different models – a work and a storage, and associate them with each other. Initially, this logic was simply in the create method of one of our controllers. My problem with this was it obscured the intent of the controller. To my mind a controller is responsible for the flow of the application – the logic governing which page the user is directed to next – and kicking off any changes that need to happen at the model layer. In this case the controller was also dealing with the exact associations between the models, roll back conditions. Code that as we will see wasn’t actually specific to the controller. In addition, passing validation errors through to the views was hard because errors could exist on one or more of the models. So we introduced a psuedo-model that handled the aggregation of the models for us, it looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class UploadWorkPresenter < Presenter
  include Validatable

  attr_reader :storage
  attr_reader :work

  delegate_attributes :storage, :attributes => [:file]
  delegate_attributes :work,    :attributes => [:description]

  include_validations_for :storage
  include_validations_for :work

  def initialize(work_type, user, attributes = {})
    @work_type = work_type
    @work = work_type.new(:user => user, :publication_state => Work::PUBLISHED)
    @storage = work_type.storage_type.new

    initialize_from_hash(attributes)
  end

  def save
    return false if !self.valid?

    if @storage.save
      @work.storage = @storage
      if @work.save
        return true
      else
        @storage.destroy
      end
    end

    return false
  end
end

We have neatly encapsulated the logic of creating a work in a nice testable class that not only slims our controller, but can be reused. This came in handy when our UI guy thought it would be awesome if we could allow a user to signup and upload a work all on the same screen:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class SignupWithImagePresenter < UploadWorkPresenter
  attr_reader :user

  delegate_attributes :user, :attributes => [:user_name, :email_address]

  include_validations_for :user

  def initialize(attributes)
    @user = User.new
    super(ImageWork, @user, attributes)
  end

  def save
    return false if !self.valid?

    begin
      User.transaction do
        raise(Validatable::RecordInvalid.new(self)) unless @user.save && super
        return true
      end
    rescue Validatable::RecordInvalid
      return false
    end
  end
end

So why does Jay think this is such a bad idea? I think it stems from a terminology issue. Presenters on Jay’s project were cloudy with their responsibilties – handling aggregation, helper functions, and navigation. As you can see, the Presenters we use solely deal with aggregation, keeping their responsibility narrow.

For reference, here is our base Presenter class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Presenter
  extend Forwardable
  
  def initialize_from_hash(params)
    params.each_pair do |attribute, value| 
      self.send :"#{attribute}=", value
    end unless params.nil?
  end
  
  def self.protected_delegate_writer(delegate, attribute, options)
    define_method "#{attribute}=" do |value|
      self.send(delegate).send("#{attribute}=", value) if self.send(options[:if])
    end
  end
  
  def self.delegate_attributes(*options)
    raise ArgumentError("Must specify both a delegate and an attribute list") if options.size != 2
    delegate = options[0]
    options = options[1]
    prefix = options[:prefix].blank? ? "" : options[:prefix] + "_"
    options[:attributes].each do |attribute|
      def_delegator delegate, attribute, "#{prefix}#{attribute}"
      def_delegator delegate, "#{attribute}=".to_sym, "#{prefix}#{attribute}=".to_sym
      def_delegator delegate, "#{attribute}?".to_sym, "#{prefix}#{attribute}?".to_sym
    end
  end
end

Object#send_with_default

Avoid those pesky whiny nils! send_with_default won’t complain.

1
2
3
"hello".send_with_default(:length, 0)      # => 5
    nil.send_with_default(:length, 0)      # => 0
"hello".send_with_default(:index, -1, 'e') # => 1

So sending parameters is a little clunky, but I don’t reckon’ you’ll want to do that much. Here is the extension you want:

1
2
3
4
5
6
7
8
9
module CoreExtensions
  module Object
    def send_with_default(method, default, *args)
      !self.nil? && self.respond_to?(method) ? self.send(*args.unshift(method)) : default
    end
  end
end

Object.send(:include, CoreExtensions::Object)

I'm a rails contributor

Allow me to gloat for a moment. Please turn your attention to changeset 7692 you’ll notice my name in the credits. So it’s not much, but there’s a certain amount of geek cred there.

Counting ActiveRecord associations: count, size or length?

Short answer: size. Here’s why.

length will fall through to the underlying array, which will force a load of the association

1
2
3
>> user.posts.length
  Post Load (0.620579)   SELECT * FROM posts WHERE (posts.user_id = 1321) 
=> 162

This is bad. You loaded 162 objects into memory, just to count them. The DB can do this for us! That’s what count does.

1
2
3
>> user.posts.count
  SQL (0.060506)   SELECT count(*) AS count_all FROM posts WHERE (posts.user_id = 1321) 
=> 162

Now we’re on to something. The problem is, count will always issue a count to the DB, which is kind of redundant if you’ve already loaded the association. That’s were size comes in. It’s got smarts. Observe!

1
2
3
4
5
6
7
>> User.find(1321).posts.size
  User Load (0.003610)   SELECT * FROM users WHERE (users.id = 1321) 
  SQL (0.000544)   SELECT count(*) AS count_all FROM posts WHERE (posts.user_id = 1321) 
=> 162
>> User.find(1321, :include => :posts).posts.size 
  User Load Including Associations (0.124950)   SELECT ...
=> 162

Notice it uses count, but if the association is already loaded (i.e. we already know how many objects there are), it uses length, for optimum DB usage.

But know that’s not all. There’s always more. If you also store the number of posts on the user object, as is common for performance reasons, size will use that also. Just make sure the column is named _association__count (i.e. posts_count).

1
2
3
4
5
>> User.columns.collect(&:name).include?("posts_count")
=> true
>> User.find(1321).posts.size
  User Load (0.003869)   SELECT * FROM users WHERE (users.id = 1321) 
=> 162

The bad news

So now you’re all excited, I better tell you why this is only fantastic until you start using has_many :through.

Now, the situation is slightly different between 1.2.x (r4605) and edge (r7639), so I’ll start with stable. Now, they may look the same but a normal has_many association and one with the :through option are actually implememted by two entirely separate classes under the hood. And it so happens that the has_many :through version kind of, well, doesn’t have quite the same smarts. It loads up the association just as length does (then falls through to Array#size). Edge is sharp enough to use a count, but still doesn’t know about any caches you may be using. This was commited in r7237, so it’s pretty easy to patch in to stable. Or you can use this extension (on either branch – here is the trac ticket): This patch was added to edge in 7692

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module CoreExtensions::HasManyThroughAssociation
  def size
    return @owner.send(:read_attribute, cached_counter_attribute_name) if has_cached_counter?
    return @target.size if loaded?
    return count
  end

  def has_cached_counter?
    @owner.attribute_present?(cached_counter_attribute_name)
  end

  def cached_counter_attribute_name
    "#{@reflection.name}_count"
  end
end

ActiveRecord::Associations::HasManyThroughAssociation.send(:include, CoreExtensions::HasManyThroughAssociation)

How it doesn’t work

1
user.posts.find(:all, :conditions => ["reply_count > ?", 50]).size

size normally works because assocations use a proxy – when I call user.posts it won’t actually load any posts until I call a method that requires them. So user.posts.size can work without ever loading the posts because they aren’t required for the operation. The above code won’t work well because find does not use a proxy – it will straight away load the requested posts from the DB, without size getting a chance to send a COUNT instead. You may be better off moving this finder logic into an association so that size will work as expected. This also has the benefit that if you decide to add a counter cache later on you won’t have to change any code to use it.

1
has_many :popular_posts, :class_name => "Post", :foreign_key => "post_id", :conditions => ["reply_count > ?", 50]

So use size when counting associations unless you have a good reason not to. Most importantly thought, ensure you’re watching your development log so to be aware what SQL your app is generating.

UPDATE: Added link to my patch on trac

UPDATE 2: … which is now closed, see r7692

Straight Sailing with Magellan

Magellan is a Ruby on Rails plugin that provides a framework for abstracting navigation logic out of your views and controllers, allowing you to write neater, more reusable code.

Table of Contents

  1. Using Magellan
    1. Dynamic Links
    2. State
    3. Testing
  2. Extra Morsels
  3. Conclusion
  4. Footnotes
  5. Bonus Material

Why should I use Magellan?

The short answer is you probably shouldn’t. Sorry, thanks for stopping by, please visit the gift shop. To elaborate, many applications don’t actually have complex navigational requirements. They are more generally of the type “go from page A to page B, then from there to page C”, and that’s that. While of course Magellan can neatly express these relationships, it adds a layer of complexity to your application for questionable benefit.

Where Magellan excels is in expressing more complex requirements: “go from page A to page B, unless it’s a Thursday, in which case go to page C. If we got to page C from page A, then go to page B, otherwise go to page A”. Urgh. Where do you put this logic in a traditional rails app? You don’t want this kind of logic in your views, and if you put it in your controllers you’ll end up duplicating code. You need a better solution.

You need Magellan.

Using Magellan

To use Magellan you need to understand three concepts:

  1. Pages
  2. Links
  3. State

State is a more advanced topic, so we’ll go over that bit later on. You covered the first two in Web Coding 101, so I’ll go over them first. The only difference in Magellan’s usage of the terms “page” and “links” is a level of abstraction. Simply, a Magellan page represents a URL (rails or otherwise). Drop the following code into your environment.rb:

1
2
3
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :home, {:controller => 'home', :action => 'list'}
end

Easy. To link to this page in a view, we use the nav_link_to helper in our .rhtml file instead of link_to. The first parameter is the name of the page we are currently on – in this case it is not strictly required and could be set to nil.

1
nav_link_to :current_page, :home

That in of itself isn’t particularly exciting. Where things get tasty is when we start using links. Now, in basic usage a link acts the same way as a page1. We can create a next link that is different depending on which page you are on.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :home1 do |p|
    p.url = { :controller => 'home1' }
    p.add_link :next, :home2
  end
  
  map.add_page :home2 do |p|
    p.url = { controller => 'home2' }
    p.add_link :next, :home1
  end
end

# Then in both home1.rhtml and home2.rhtml
# @current_page is either :home1 or :home2
nav_link_to @current_page, :next

As you can see we have de-coupled our navigation from the page itself. If we wanted to we could change the next link for home2 to home3 without having to change any of the code associated with home2. This makes our pages more modular and reusable, which is generally a Good Thing.

Let’s go back to our original example. I want the next link on page A to go to page B except on Thursdays, where it should go page C. The trick here is that in addition to just accepting a symbol for the link name (a “static link”), it can also accept a lambda block that is evaluated at runtime. This is a little bit more convoluted, the block needs to return not a link name, but the actual page we want to go to. While initially slightly unintuitive, it allows for more flexibility and less code than having to specify extra links.

1
2
3
4
5
6
7
8
9
10
11
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :page_a do |p|
    p.add_link :back, lambda {|pages, state|
      # Thursday is the 4th day the of week
      Time.new.wday == 4 ? pages[:page_b] : pages[:page_c]
    }
  end

  map.add_page :page_b, { :controller => 'page_b' }
  map.add_page :page_c, { :controller => 'page_c' }
end

State

State is just like session storage for your navigation logic. In fact, it actually uses a subset of session storage2. The reason we differentiate it from normal session variables is simply to keep a neat separation between our navigation logic and other modules that may require the session. In typical usage, you modify the state in your controller (using set_nav_state, and then make a decision based on that state in your navigation logic (using the state parameter). A simple example is to have a dynamic back link depending on the previous page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Both page A and B have a link to page C
def page_a; set_nav_state :back_page => 'page_a'; end;
def page_b; set_nav_state :back_page => 'page_b'; end;

# Page C
nav_link_to 'Back', :page_c, :back

# environment.rb
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :page_a, { :controller => 'page_b' }
  map.add_page :page_b, { :controller => 'page_c' }

  map.add_page :page_c, { do |p|
    p.add_link :back, lambda {|pages, state|
      pages[state[:back_page]]
    }
  end
end

Testing your navigation

As with any code, it is important to test your navigation logic. There are many ways to do this, depending on the requirements and complexity of your application. I recommend at least one class of unit tests for your logic, and also to add code to your functional tests to ensure your controllers are setting the correct state. Magellan provides one helper function here – nav_state – which returns a hash of the current state.

1
2
3
4
5
6
7
8
9
10
11
12
class UnitTest < Test::Unit::TestCase
  def setup
    @nav = RHNH::Magellan::Navigator.instance
  end
  
  def test_back_link
    state = { :homepage => :home1 }
    expected = { :controller => 'example', :action => 'home1' }
      
    assert_equal expected, @nav.get_url(:page1, :back, state)
  end
end
1
2
3
4
5
6
7
8
9
class FunctionalTest < Test::Unit::TestCase
  # Standard functional test setup code...
  
  def test_index
    get 'index'
    
    assert_equal :home1, nav_state[:homepage]
  end
end

The tests included with the example that comes with Magellan provide a more complex example of navigation testing. I highly recommend you look over them.

Extra morsels

You can specify a default link by adding a link to the map rather than a page. For instance, to specify a default :back link:

1
2
3
4
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :home, { controller => 'home' }
  map.add_link :back, :home
end

To be extra fancy, you can return extra parameters from your navigation logic that are added to the :params hash of the url. This is done by returning an array with both the page and the parameters in it.

1
2
3
4
5
6
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :home, { controller => 'home' }
  map.add_link :back, lambda { |pages, state|
    [pages[:home], {:message => 'You just hit a default link'}]
  }
end

To conclude

Magellan is a great way of managing the complexity of larger projects. By abstracting navigation logic out of your controllers and views you make your project much more modular and reusable. It can even be introduced incrementally – all your old link_to calls will still work.

Footnotes

1 To be technically correct, a page acts like a link. Magellan creates default links to pages with the same name as the page. For instance, unless you specify otherwise, :home is actually a link to the page :home

2 Magellan uses session[:rhnh_navigator_state], so you may want to steer clear of that to avoid stepping on anyone’s toes.

Rails XHTML Validation with LibXML/HTML Tidy

I improved upon the XHTML validation technique I showed yesterday to add nicer error messages, and also support for local testing via HTML Tidy. HTML Tidy isn’t quite as good as W3C – for example it missed a label that was pointing to an invalid ID, but it runs hell fast. For W3C testing I’m now using libXML to parse the response to actually list the errors rather than just tell you they exist.

And it’s all customizable by setting the MARKUP_VALIDATOR environment variables. Options are: w3c, tidy, tidy_no_warnings. Tidy is the default.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def assert_valid_markup(markup=@response.body)
  ENV['MARKUP_VALIDATOR'] ||= 'tidy'
  case ENV['MARKUP_VALIDATOR']
  when 'w3c'
    # Thanks http://scottraymond.net/articles/2005/09/20/rails-xhtml-validation
    require 'net/http'
    response = Net::HTTP.start('validator.w3.org') do |w3c|
      query = 'fragment=' + CGI.escape(markup) + '&output=xml'
      w3c.post2('/check', query)
    end
    if response['x-w3c-validator-status'] != 'Valid'
      error_str = "XHTML Validation Failed:\n"
      parser = XML::Parser.new
      parser.string = response.body
      doc = parser.parse

      doc.find("//result/messages/msg").each do |msg|
        error_str += "  Line %i: %s\n" % [msg["line"], msg]
      end

      flunk error_str
    end

  when 'tidy', 'tidy_no_warnings'
    require 'tidy'
    errors = []
    Tidy.open(:input_xml => true) do |tidy|
      tidy.clean(markup)
      errors.concat(tidy.errors)
    end
    Tidy.open(:show_warnings=> (ENV['MARKUP_VALIDATOR'] != 'tidy_no_warnings')) do |tidy|
      tidy.clean(markup)
      errors.concat(tidy.errors)
    end
    if errors.length > 0
      error_str = ''
      errors.each do |e|
        error_str += e.gsub(/\n/, "\n  ")
      end
      error_str = "XHTML Validation Failed:\n  #{error_str}"
      
      assert_block(error_str) { false }
    end    
  end
end

Getting Tidy to work was an ordeal, the ruby documentation is rather lacking. It also behaves in weird ways – the call to errors returns a one element array, with all the errors bundled together in the one string.

LibXML was a little tricky – there’s no obvious way to parse an XML document in memory. You’d think XML::Document.new(xml) would do the trick, since there’s a XML::Document.file(filename) method, but that actually uses the entire XML document as the version string. Not so handy. Turns out you need to create an XML::Parser object instead, as I’ve done above. The docs don’t mention this (anywhere obvious, that is), I found a thread in the LibXML mailing list.

Testing rails

I was working on creating functional tests for some of my code today, a task made ridiculously easy by rails. To add extra value, I added an assertion (from Scott Raymond) to validate my markup against the w3c online validator:

1
2
3
4
5
6
7
8
9
10
def assert_valid_markup(markup=@response.body)
  if ENV["TEST_MARKUP"]
    require "net/http"
    response = Net::HTTP.start("validator.w3.org") do |w3c|
      query = "fragment=" + CGI.escape(markup) + "&output=xml"
      w3c.post2("/check", query)
    end
    assert_equal "Valid", response["x-w3c-validator-status"]
  end
end

The ENV test means it isn’t run by default since it slows down my tests considerably, but I don’t want to move markup checks out of the functional tests because that’s where they belong. Next step is to validate locally, which I’ve heard you can do with HTML Tidy.

Another problem is testing code that relies on DateTime.now, since this is a singleton call and not easily mockable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def pin_time
  time = DateTime.now
  DateTime.class_eval <<-EOS
    def self.now
      DateTime.parse("#{time}")
    end
  EOS
  yield time
end

# Usage
pin_time do |test_time|
  assert_equal test_time, DateTime.now
  sleep 2
  assert_equal test_time, DateTime.now
end

I haven’t found a neat way of resetting the behaviour of now. Using load 'date.rb' works but produces warnings for redefined constants. I couldn’t get either aliasing the original method, undefining the new one, or even just calling Date.now to work.

UPDATE: Ah, how young I was. A better way to do this is to use a library like mocha

A pretty flower Another pretty flower