Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Integration testing with Cucumber, RSpec and Thinking Sphinx

Ideally you would want to include sphinx in your integration tests. It’s really just like your database. In practice, this is problematic. Ensuring the DB is started and triggering a re-index after each model load is doable, if slow, with a small bit of hacking of thinking sphinx (hint – change the initializer for the ThinkingSphinx::Configuration to allow you to specify the environment). Here’s the rub though – if you’re using transactional fixtures the sphinx indexer won’t be able to see any of your data! Turning that off can really slow down your tests, and once you add in the re-indexing time you’re going to be making a few cups of coffee while they run.

One approach I’ve been taking is to stub out the search methods with RR. I know, I know, stubbing in your integration tests is evil. I’m being pragmatic here. For most applications your search is trivial (find me results for this keyword), and if you unit test your define_index block you’re pretty well covered. To go one step further you could unit test your controllers with an expect on the search method, or have a separate suite of non-transactional integration tests running against sphinx. I like the latter, but haven’t done it yet.

Enough talk! Here’s the magic you need to get it working with cucumber:

1
2
3
4
5
6
7
8
9
# features/steps/env.rb
require 'rr'
Cucumber::Rails::World.send(:include, RR::Adapters::RRMethods)

# features/steps/*_steps.rb
Given /a car with model '(\w+)' exists/ do |model|
  car = Car.create!(:model => model)
  stub(Car).search(model) { [car] }
end

Finding related content with Sphinx

Previous efforts to find related posts with the classifier gem yielded no fruit, so I tried another approach using sphinx. Turned out to be a winner.

The basic theory is to index all posts by tag, then to find related posts just use the current post’s tags as a search string. Remember to exclude the current post from the search results. For this blog, I use tags for the main categories, which were corrupting the results – most everything is tagged ‘Ruby’ so it doesn’t add any value in determining likeness. So rather than indexing all tags I excluded some of the main ones.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Post < ActiveRecord::Base
  has_many :searchable_tags, 
           :through    => :taggings,
           :source     => :tag,
           :conditions => "tags.name NOT IN ('Ruby', 'Code', 'Life')"
  
  def related_posts(number = 3)
    Post.search(:limit => number + 1, :conditions => {
      :tag_list => tag_list.join("|")
    }).reject {|x| x == self }.first(number)
  end

  define_index do
    indexes searchable_tags(:name), :as => :tag_list
    # If you want to use this for normal search as well you'll have to 
    # add in title/body here as well
  end
end

For a more complete example, see the relevant RHNH commits: cdc0bf and d4d844

Showing links to related content is a good way to stop the bottom of your page from being a ‘dead end’. In the event that no related posts are found, I’m linking to the archives instead.

A pretty flower Another pretty flower