Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Classifier gem rubbish for recommending posts

Chatting with Tim today he suggested maybe using Classifier::LSI would be a cool way to offer ‘related posts’ suggestions for a blog.

Not really knowing anything about it, I whipped up a prototype rake task. It creates the index then marshals it to disk because it takes ages to create and it’s not much fun to play with when you have to wait minutes each time. It then presents 3 related suggestions for each post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
require 'classifier'

namespace :lsi do
  task :test => :environment do
    if File.exists?("lsidata.dump")
      lsi = File.open("lsidata.dump") {|f| Marshal.load(f) }
    else  
      lsi = Classifier::LSI.new
      Post.find(:all, :order => 'published_at DESC').each do |post|
        text = post.body
        categories = post.tags.collect(&:name)
        puts "Indexing " + post.title
        lsi.add_item(text, *categories)
      end
      File.open("lsidata.dump", "w") {|f| Marshal.dump(lsi, f) }
    end

    Post.find(:all).each do |post|
      puts post.title
      puts lsi.find_related(post.body, 3).collect {|i| Post.find_by_body(i).title }.inspect
    end
  end
end

Here’s the data for my last 5 articles. I don’t know what I was expecting, but this just doesn’t seem very helpful. I don’t have a very rich set of tags on my posts, so that probably has something to do with it. Was kind of hoping it would just look at text and all just work * waves hands *.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Seagate 500Gb FreeAgent Pro external drive - first impressions
  - Building Firefox Extensions
  - The Colemak Diaries
  - Counting ActiveRecord associations: count, size or length?
Coconut Oats
  - The Colemak Diaries
  - Summertime Tagliarini
  - Mary Iron Chef - Chocolate Jaffa Boxes
Mary Iron Chef - Chocolate Jaffa Boxes
  - The Colemak Diaries
  - Building Firefox Extensions
  - Summertime Tagliarini
Paypal IPN fails date standards
  - Building Firefox Extensions
  - Straight Sailing with Magellan
  - The Colemak Diaries
I'm number 8!
  - Extending Rails
  - Practical Hpricot: SVG
  - Day of days

Next step is to try tagging my stuff better and seeing if that helps out.

Getting classifier working

Quick side note – pure ruby classifier doesn’t work out of the box with rails because it also redefines Array#sum. If you install the GSL lib and the ruby bindings (see classifier docs) you’ll still need this one line patch to classifier to get it to work:

1
2
3
4
5
6
7
8
9
10
11
12
Index: lib/classifier/lsi.rb
===================================================================
--- lib/classifier/lsi.rb       (revision 31)
+++ lib/classifier/lsi.rb       (working copy)
@@ -25,6 +25,8 @@
   # please consult Wikipedia[http://en.wikipedia.org/wiki/Latent_Semantic_Indexing].
   class LSI
     
+    include GSL if $GSL
+    
     attr_reader :word_list
     attr_accessor :auto_rebuild

UPDATE: I’ve forked classifier on github, so you can just grab that version if you like.

A pretty flower Another pretty flower