Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Migrating Enki to Jekyll

I just converted this blog from a dynamic Enki site to a static Jekyll one. I wanted to get rid of the comments, add SSL, and not have to upgrade Rails so often. I prefer composing locally also.

First, I exported all of the posts to lesstile templates using a rake task.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
task :export_posts => :environment do
  Post.find_each do |post|
    filename = "%s-%s.lesstile" % [
      post.published_at.strftime("%Y-%m-%d"),
      post.slug
    ]

    dir = "_posts"
    yaml_sep = "---"

    puts filename

    body = <<-EOS
#{yaml_sep}
layout: post
title:  #{post.title.inspect}
date:   #{post.published_at.strftime("%F %T %:z")}
tags:   #{post.tags.map {|x| x.name.downcase }.sort.inspect}
#{yaml_sep}
{% raw %}
#{post.body}
{% endraw %}
    EOS

    File.write(File.join(dir, filename), body)
  end
end

Lesstile is a wrapper around Textile that provides some extra functionality, so a custom converter is also needed. Put the following in _plugins/lesstile.rb (with associated additions to your Gemfile):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
require 'lesstile'
require 'coderay'
require 'RedCloth'

module Jekyll
  class LesstileConverter < Converter
    safe true
    priority :low

    def matches(ext)
      ext =~ /^\.lesstile$/i
    end

    def output_ext(ext)
      ".html"
    end

    def convert(content)
      Lesstile.format_as_xhtml(
        content,
        :text_formatter => lambda {|text|
          RedCloth.new(CGI::unescapeHTML(text)).to_html
        },
        :code_formatter => Lesstile::CodeRayFormatter
      )
    end
  end
end

The permalink configuration option needs to be set to match existing URLs, and to create the tag pages, use the jekyll-archives plugin.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
permalink: "/:year/:month/:day/:title/"

assets:
  digest: true

"jekyll-archives":
  enabled:
    - tags
  layout: 'tag'
  permalinks:
    tag: '/:name/'

gems:
  - jekyll-feed
  - jekyll-assets
  - jekyll-archives

For the archives page, use an empty archives.md in the root directory with a custom layout:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{% include head.html %}
{% assign last_month = nil %}
<ul>
{% for post in site.posts %}
  {% assign current_month = post.date | date: '%B %Y' %}
  {% if current_month != last_month %}
    </ul>
    <h3>{{ current_month }}</h3>
    <ul>
  {% endif %}

  <li>
    <a href="{{ post.url }}">{{ post.title }}</a>

    {% if post.tags != empty %}
    ({% for tag in post.tags %}<a href='/{{ tag }}'>{{ tag }}</a>{% if forloop.last %}{% else %}, {% endif %}{% endfor %})
    {% endif %}
  </li>

  {% assign last_month = current_month %}
{% endfor %}
</ul>
{% include footer.html %}

For a full example, including a recommended set of layouts and includes, see the new sources for this site.

Dropwizard logger for Ruby and WEBrick

Wouldn’t it be great if instead of webrick logs looking like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> ruby server.rb
[2014-08-17 15:29:10] INFO  WEBrick 1.3.1
[2014-08-17 15:29:10] INFO  ruby 2.1.1 (2014-02-24) [x86_64-darwin13.0]
[2014-08-17 15:29:10] INFO  WEBrick::HTTPServer#start: pid=17304 port=8000
D, [2014-08-17T15:29:11.452223 #17304] DEBUG -- : hello from in the request
localhost - - [17/Aug/2014:15:29:11 PDT] "GET / HTTP/1.1" 200 13
- -> /
E, [2014-08-17T15:29:12.787505 #17304] ERROR -- : fail (RuntimeError)
server.rb:57:in `block in <main>'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/prochandler.rb:38:in `call'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/prochandler.rb:38:in `do_GET'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/abstract.rb:106:in `service'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpserver.rb:138:in `service'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpserver.rb:94:in `run'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/server.rb:295:in `block in start_thread'
localhost - - [17/Aug/2014:15:29:12 PDT] "GET /fail HTTP/1.1" 500 6
- -> /fail

They looked like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
> ruby server.rb

   ,~~.,''"'`'.~~.
  : {` .- _ -. '} ;
   `:   O(_)O   ;'
    ';  ._|_,  ;`   i am starting the server
     '`-.\_/,.'`

INFO  [2014-08-17 22:28:13,186] webrick: WEBrick 1.3.1
INFO  [2014-08-17 22:28:13,186] webrick: ruby 2.1.1 (2014-02-24) [x86_64-darwin13.0]
INFO  [2014-08-17 22:28:13,187] webrick: WEBrick::HTTPServer#start: pid=17253 port=8000
DEBUG [2014-08-17 22:28:14,738] app: hello from in the request
INFO  [2014-08-17 15:28:14,736] webrick: GET / 200
ERROR [2014-08-17 22:28:15,603] app: RuntimeError: fail
! server.rb:57:in `block in <main>'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/prochandler.rb:38:in `call'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/prochandler.rb:38:in `do_GET'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/abstract.rb:106:in `service'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpserver.rb:138:in `service'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpserver.rb:94:in `run'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/server.rb:295:in `block in start_thread'
INFO  [2014-08-17 15:28:15,602] webrick: GET /fail 500

I thought so, hence:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
require 'webrick'
require 'logger'

puts <<-BANNER

   ,~~.,''"'`'.~~.
  : {` .- _ -. '} ;
   `:   O(_)O   ;'
    ';  ._|_,  ;`   i am starting the server
     '`-.\\_/,.'`

BANNER

class DropwizardLogger < Logger
  def initialize(label, *args)
    super(*args)
    @label = label
  end

  def format_message(severity, timestamp, progname, msg)
    "%-5s [%s] %s: %s\n" % [
      severity,
      timestamp.utc.strftime("%Y-%m-%d %H:%M:%S,%3N"),
      @label,
      msg2str(msg),
    ]
  end

  def msg2str(msg)
    case msg
    when String
      msg
    when Exception
      ("%s: %s" % [msg.class, msg.message]) +
        (msg.backtrace ? msg.backtrace.map {|x| "\n! #{x}" }.join : "")
    else
      msg.inspect
    end
  end

  def self.webrick_format(label)
    "INFO  [%{%Y-%m-%d %H:%M:%S,%3N}t] #{label}: %m %U %s"
  end
end

server = WEBrick::HTTPServer.new \
  :Port      => 8000,
  :Logger    => DropwizardLogger.new("webrick", $stdout).tap {|x|
                  x.level = Logger::INFO
                },
  :AccessLog => [[$stdout, DropwizardLogger.webrick_format("webrick")]]

$logger = DropwizardLogger.new("app", $stdout)

server.mount_proc '/fail' do |req, res|
  begin
    raise 'fail'
  rescue => e
    $logger.error(e)
  end
  res.body = "failed"
  res.status = 500
end

server.mount_proc '/' do |req, res|
  $logger.debug("hello from in the request")
  res.body = 'Hello, world!'
end

trap 'INT' do
  server.shutdown
end

server.start

Querying consul with range

Disclaimer: this has not been tried in a production environment. It is a weekend hack.

Consul is a highly available, datacenter aware, service discovery mechanism. Range is a query language for selecting information out of arbitrary, self-referential metadata. I combined the two!

Start by firing up a two node consul cluster, per the getting started guide. On the master node, grab the consul branch of grange-server and run it with the following config:

1
2
3
[rangeserver]
loglevel=DEBUG
consul=true

(It could run against any consul agent, but it’s easier to demo on the master node.)

Querying range, we already see a consul cluster, cluster. This is a default service containing the consul servers.

1
2
3
4
5
> export RANGE_HOST=172.20.20.10
> erg "allclusters()"
consul
> erg "%consul"
agent-one

Add a new service to the agents, and it shows up in range!

1
2
3
4
5
6
7
8
9
10
11
n2> curl -v -X PUT --data '{"name": "web", "port": 80}' http://localhost:8500/v1/agent/service/register

> erg "allclusters()"
consul,web
> erg "%web"
agent-two

n1> curl -v -X PUT --data '{"name": "web", "port": 80}' http://localhost:8500/v1/agent/service/register

> erg "%web"
agent-one,agent-two

Though eventually consistent, range is a big improvement over the consul HTTP API for quick ad-hoc queries against your production layout, particularly when combined with other metadata. How many nodes are running redis? What services are running on a particular rack?

This is just a proof of concept for now, but I’m excited about the potential. To be useable it needs to be tested against production sized clusters, better handling of error conditions, and some code review (in particular around handling cluster state changes).

Bash script to keep a git clone synced with a remote

Use the following under a process manager (such as runit) to keep a local git clone in sync with a remote, when a push based solution isn’t an option. Most other versions either neglect to verify remote is correct, or use git pull which can fail if someone has been monkeying with the local version.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
function update_git_repo() {
  GIT_DIR=$1
  GIT_REMOTE=$2
  GIT_BRANCH=${3:-master}

  if [ ! -d $GIT_DIR ]; then
    CURRENT_SHA=""
    git clone --depth 1 $GIT_REMOTE $GIT_DIR -b $GIT_BRANCH
  else
    CURRENT_REMOTE=$(cd $GIT_DIR && git config --get remote.origin.url || true)

    if [ "$GIT_REMOTE" == "$CURRENT_REMOTE" ]; then
      CURRENT_SHA=$(cat $GIT_DIR/.git/refs/heads/$GIT_BRANCH)
    else
      rm -Rf $GIT_DIR
      exit 0 # Process manager should restart this script
    fi
  fi

  cd $GIT_DIR && \
    git fetch && \
    git reset --hard origin/$GIT_BRANCH

  NEW_SHA=$(cat $GIT_DIR/.git/refs/heads/$GIT_BRANCH)
}

update_git_repo "/tmp/myrepo" "git://example.com/my/repo.git"

sleep 60 # No need for a tight loop

Ruby progress bar, no gems

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def import(filename, out = $stdout, &block)
  # Yes, there are gems that do progress bars.
  # No, I'm not about to add another dependency for something this simple.
  width     = 50
  processed = 0
  printed   = 0
  total     = File.read(filename).lines.length.to_f
  label     = File.basename(filename, '.csv')

  out.print "%11s: |" % label

  CSV.foreach(filename, headers: true) do |row|
    yield row

    processed += 1
    wanted = (processed / total * width).to_i
    out.print "-" * (wanted - printed)
    printed = wanted
  end
  out.puts "|"
end
1
2
     file_1: |--------------------------------------------------|
     file_2: |--------------------------------------------------|
  • Posted on March 29, 2014
  • Tagged code, ruby

New in RSpec 3: Verifying Doubles

One of the features I am most excited about in RSpec 3 is the verifying double support1. Using traditional doubles has always made me uncomfortable, since it is really easy to accidentally mock or stub a method that does not exist. This leads to the awkward situation where a refactoring can leave your code broken but with green specs. For example, consider the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# double_demo.rb
class User < Struct.new(:notifier)
  def suspend!
    notifier.notify("suspended as")
  end
end

describe User, '#suspend!' do
  it 'notifies the console' do
    notifier = double("ConsoleNotifier")

    expect(notifier).to receive(:notify).with("suspended as")

    user = User.new(notifier)
    user.suspend!
  end
end

ConsoleNotifier is defined as:

1
2
3
4
5
6
# console_notifier.rb
class ConsoleNotifier
  def notify!(msg)
    puts msg
  end
end

Note that the method notify! does not match the notify method we are expecting! This is broken code, but the spec still passes:

1
2
3
4
5
> rspec -r./console_notifier double_demo.rb
.

Finished in 0.0006 seconds
1 example, 0 failures

Verifying doubles solve this issue.

Verifying doubles to the rescue

A verifying double provides guarantees about methods that are being expected, including whether they exist, whether the number of arguments is valid for that method, and whether they have the correct visibility. If we change double('ConsoleNotifier') to instance_double('ConsoleNotifier') in the previous spec, it will now ensure that any method we expect is a valid instance method of ConsoleNotifier. So the spec will now fail:

1
2
3
4
5
6
7
8
9
10
11
12
13
> rspec -r./console_notifier.rb double_demo.rb
F

Failures:

  1) User#suspend! notifies the console
     Failure/Error: expect(notifier).to receive(:notify).with("suspended as")
       ConsoleNotifier does not implement:
         notify
    # ... backtrace
         
Finished in 0.00046 seconds
1 example, 1 failure         

Other types of verifying doubles include class_double and object_double. You can read more about them in the documentation.

Isolation

Even though we have a failing spec, we now have to load our dependencies for the privilege. This is undesirable when those dependencies take a long time to load, such as the Rails framework. Verifying doubles provide a solution to this problem: if the dependent class does not exist, it simply operates as a normal double! This is often confusing to people, but understanding it is key to understanding the power of verifying doubles.

Running the spec that failed above without loading console_notifier.rb, it actually passes:

1
2
3
4
5
> rspec double_demo.rb
.

Finished in 0.0006 seconds
1 example, 0 failures

This is the killer feature of verifying doubles. You get both confidence that your specs are correct, and the speed of running them isolation. Typically I will develop a spec and class in isolation, then load up the entire environment for a full test run and in CI.

There are a number of other neat tricks you can do with verifying doubles, such as enabling them for partial doubles and replacing constants, all covered in the documentation.
There really isn’t a good reason to use normal doubles anymore. Install the RSpec 3 beta (via 2.99) to take them for a test drive!

1 This functionality has been available for a while now in rspec-fire. RSpec 3 fully replaces that library, and even adds some more features.

Ruby Style Guide

My coding style has evolved over time, and has always been something I kept in my head. This morning I tried to document it explicitly, so I can point offending pull requests at it. My personal Ruby Style Guide

What is it missing?

  • Posted on July 04, 2013
  • Tagged code, ruby

Writing About Code

I wrote some words about The Mathematical Syntax of Small-step Operational Semantics

It’s the latest in a sequence of experiments on techniques for presenting ideas and code, xspec being another that you may be interested in.

  • Posted on June 29, 2013
  • Tagged code, ruby

How I Test Rails Applications

The Rails conventions for testing provide three categories for your tests:

  • Unit. What you write to test your models.
  • Integration. Used to test the interaction among any number of controllers.
  • Functional. Testing the various actions of a single controller.

This tells you where to put your tests, but the type of testing you perform on each part of the system is the same: load fixtures into the database to get the app into the required state, run some part of the system either directly (models) or using provided harnesses (controllers), then verify the expected output.

This techinque is simple, but is only one of a number of ways of testing. As your application grows, you will need to add other approaches to your toolbelt to enable your test suite to continue providing valuable feedback not just on the correctness of your code, but its design as well.

I use a different set of categories for my tests (taken from the GOOS book):

  • Unit. Do our objects do the right thing, and are they convenient to work with?
  • Integration. Does our code work against code we can’t change?
  • Acceptance. Does the whole system work?

Note that these definitions of unit and integration are radically different to how Rails defines them. That is unfortunate, but these definitions are more commonly accepted across other languages and frameworks and I prefer to use them since it facilitates an exchange of information across them. All of the typical Rails tests fall under the “integration” label, leaving two new levels of testing to talk about: unit and acceptance.

Unit Tests

“A test is not a unit test if it talks to the database, communicates across a network, or touches the file system.” – Working with Legacy Code, p. 14

This type of test is typically referred to in the Rails community as a “fast unit test”, which is unfortunate since speed is far from the primary benefit. The primary benefit of unit testing is the feedback it provides on the dependencies in your design. “Design unit tests” would be a better label.

This feedback is absolutely critical in any non-trivial application. Unchecked dependency is crippling, and Rails encourages you not to think about it (most obviously by implicitly autoloading everything).

By unit testing a class you are forced to think about how it interacts with other classes, which leads to simpler dependency trees and simpler programs.

Unit tests tend to (though don’t always have to) make use of mocking to verify interactions between classes. Using rspec-fire is absolutely critical when doing this. It verifies your mocks represent actual objects with no extra effort required in your tests, bridging the gap to statically-typed mocks in languages like Java.

As a guideline, a single unit test shouldn’t take more than 1ms to run.

Acceptance Tests

A Rails integration test doesn’t exercise the entire system, since it uses a harness and doesn’t use the system from the perspective of a user. As one example, you need to post form parameters directly rather than actually filling out the form, making the test both brittle in that if you change your HTML form the test will still pass, and incomplete in that it doesn’t actually load the page up in a browser and verify that Javascript and CSS are not intefering with the submission of the form.

Full system testing was popularized by the cucumber library, but cucumber adds a level of indirection that isn’t useful for most applications. Unless you are actually collaborating with non-technical stakeholders, the extra complexity just gets in your way. RSpec can easily be written in a BDD style without extra libraries.

Theoretically you should only be interacting with the system as a black box, which means no creating fixture data or otherwise messing with the internals of the system in order to set it up correctly. In practice, this tends to be unweildy but I still maintain a strict abstraction so that tests read like black box tests, hiding any internal modification behind an interface that could be implemented by black box interactions, but is “optimized” to use internal knowledge. I’ve had success with the builder pattern, also presented in the GOOS book, but that’s another blog post (i.e. build_registration.with_hosting_request.create).

A common anti-pattern is to try and use transactional fixtures in acceptance tests. Don’t do this. It isn’t executing the full system (so can’t test transaction level functionality) and is prone to flakiness.

An acceptance test will typically take seconds to run, and should only be used for happy-path verification of behaviour. It makes sure that all the pieces hang together correctly. Edge case testing should be done at the unit or integration level. Ideally each new feature should have only one or two acceptance tests.

File Organisation.

I use spec/{unit,integration,acceptance} folders as the parent of all specs. Each type of spec has it’s own helper require, so unit specs require unit_helper rather than spec_helper. Each of those helpers will then require other helpers as appropriate, for instance my rails_helper looks like this (note the hack required to support this layout):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)

# By default, rspec/rails tags all specs in spec/integration as request specs,
# which is not what we want. There does not appear to be a way to disable this
# behaviour, so below is a copy of rspec/rails.rb with this default behaviour
# commented out.
require 'rspec/core'

RSpec::configure do |c|
  c.backtrace_clean_patterns << /vendor\//
  c.backtrace_clean_patterns << /lib\/rspec\/rails/
end

require 'rspec/rails/extensions'
require 'rspec/rails/view_rendering'
require 'rspec/rails/adapters'
require 'rspec/rails/matchers'
require 'rspec/rails/fixture_support'
require 'rspec/rails/mocks'
require 'rspec/rails/module_inclusion'
# require 'rspec/rails/example' # Commented this out
require 'rspec/rails/vendor/capybara'
require 'rspec/rails/vendor/webrat'

# Added the below, we still want access to some of the example groups
require 'rspec/rails/example/rails_example_group'
require 'rspec/rails/example/controller_example_group'
require 'rspec/rails/example/helper_example_group'

Controllers specs go in spec/integration/controllers, though I’m trending towards using poniard that allows me to test controllers in isolation (spec/unit/controllers).

Helpers are either unit or integration tested depending on the type of work they are doing. If it is domain level logic it can be unit tested (though I tend to use presenters for this, which are also unit tested), but for helpers that layer on top of Rails provided helpers (like link_to or content_tag) they should be integration tested to verify they are using the library in the correct way.

I have used this approach on a number of Rails applications over the last 1-2 years and found it leads to better and more enjoyable code.

Blocking (synchronous) calls in Goliath

Posting for my future self. A generic function to run blocking code in a deferred thread and resume the fiber on completion, so as not to block the reactor loop.

1
2
3
4
5
6
7
8
9
10
def blocking(&f)
  fiber = Fiber.current
  result = nil
  EM.defer(f, ->(x){
    result = x
    fiber.resume
  })
  Fiber.yield
  result
end

Usage

1
2
3
4
5
6
class MyServer < Goliath::API
  def response(env)
    blocking { sleep 1 }
    [200, {}, 'Woken up']
  end
end

Form Objects in Rails

For a while now I have been using form objects instead of nested attributes for complex forms, and the experience has been pleasant. A form object is an object designed explicitly to back a given form. It handles validation, defaults, casting, and translation of attributes to the persistence layer. A basic example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class Form::NewRegistration
  include ActiveModel::Validations

  def self.scalar_attributes
    [:name, :age]
  end

  attr_accessor *scalar_attributes
  attr_reader :event

  validates_presence_of :name

  def initialize(event, params = {})
    self.class.scalar_attributes.each do |attr|
      self.send("%s=" % attr, params[attr]) if params.has_key?(attr)
    end
  end

  def create
    return unless valid?

    registration = Registration.create!(
      event: event,
      data_json: {
        name: name,
        age:  age.to_i,
      }.to_json
    )

    registration
  end

  # ActiveModel support
  def self.name; "Registration"; end
  def persisted?; false; end
  def to_key; nil; end
end

Note how this allows an easy mapping from form fields to a serialized JSON blob.

I have found this more explicit and flexible than tying forms directly to nested attributes. It allows more fine tuned control of the form behaviour, is easier to reason about and test, and enables you to refactor your data model with minimal other changes. (In fact, if you are planning on refactoring your data model, adding in a form object as a “shim” to protect other parts of the system from change before you refactor is usually desirable.) It even works well with nested attributes, using the form object to build up the required nested hash in the #create method.

Relationships

A benefit of this approach, albeit still a little clunky, is having accessors map one to one with form fields even for one to many associations. My approach takes advantages of Ruby’s flexible object model to define accessors on the fly. For example, say a registration has multiple custom answer fields, as defined on the event, I would call the following method on initialisation:

1
2
3
4
5
6
7
8
9
def add_answer_accessors!
  event.questions.each do |q|
    attr = :"answer_#{q.id}"
    instance_eval <<-RUBY
      def #{attr};     answers[#{q.id}]; end
      def #{attr}=(x); answers[#{q.id}] = x; end
    RUBY
  end
end

With the exception of the above code (which isn’t too bad), this greatly simplifies typical code for handling one to many relationships: it avoids fields_for, index, and is easier to set up sane defaults for.

Casting

I use a small supporting module to handle casting of attributes to certain types.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
module TypedWriter
  def typed_writer(type, attribute)
    class_eval <<-EOS
      def #{attribute}=(x)
        @#{attribute} = type_cast(x, :#{type})
      end
    EOS
  end

  def type_cast(x, type)
    case type
    when :integer
      x.to_s.length > 0 ? x.to_i : nil
    when :boolean
      x.to_s.length > 0 ? x == true || x == "true" : nil
    when :boolean_with_nil
      if x.to_s == 'on' || x.nil?
        nil
      else
        x.to_s.length > 0 ? x == true || x == "true" : nil
      end
    when :int_array
      [*x].map(&:to_i).select {|x| x > 0 }
    else
      raise "Unknown type #{type}"
    end
  end

  def self.included(klass)
    # Make methods available both as class and instance methods.
    klass.extend(self)
  end
end

It is used like so:

1
2
3
4
5
6
7
class Form::NewRegistration
  # ...

  include TypedWriter

  typed_writer :age, :integer
end

Testing

I don’t load Rails for my form tests, so an explicit require of active model is necessary. I do this in my form code since I like explicitly requiring third-party dependencies everywhere they are used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
require 'unit_helper'

require 'form/new_registration'

describe Form::NewRegistration do
  include RSpec::Fire

  let(:event) { fire_double('Event') }

  subject { described_class.new(event) }

  def valid_attributes
    {
      name: 'don',
      age:  25
    }
  end

  def form(extra = {})
    described_class.new(event, valid_attributes.merge(extra))
  end

  describe 'validations' do
    it 'is valid for default attributes' do
      form.should be_valid
    end

    it { form(name: '').should have_error_on(:name) }
  end

  describe 'type-casting' do
    let(:f) { form } # Memoize the form

    # This pattern is overkill in this example, but useful when you have many
    # typed attributes.
    let(:typecasts) {{
      int: {
        nil  => nil,
        ""   => nil,
        23   => 23,
        "23" => 23,
      }
    }}

    it 'casts age to an int' do
      typecasts[:int].each do |value, expected|
        f.age = value
        f.age.should == expected
      end
    end
  end

  describe '#create' do
    it 'returns false when not valid' do
      subject.create.should_not be
    end

    it 'creates a new registration' do
      f = form
      dao = fire_replaced_class_double("Registration")
      dao.should_receive(:create).with {|x|
        x[:event].should == event

        data = JSON.parse(x[:data_json])

        data['name'].should == valid_attributes[:name]
        data['age'].should == valid_attributes[:age]
      }
      f.create.should new_rego
    end
  end

  it { should_not be_persisted }
end

Code Sharing

I tend to have a parent object Form::Registration, with subclasses for Form::{New,Update,View}Registration. A common mixin would also work. For testing, I use a shared spec that is run by the specs for each of the three subclasses.

Conclusion

There are other solutions to this problem (such as separating validations completely) which I haven’t tried yet, and I haven’t used this approach on a team yet. It has worked well for my solo projects though, and I’m just about confident enough to recommend it for production use.

Poniard: a Dependency Injector for Rails

I just open sourced poniard, a dependency injector for Rails. It’s a newer version of code I posted a few weeks back that allows you to write controllers using plain ruby objects:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
module Controller
  class Registration
    def update(response, now_flash, update_form)
      form = update_form

      if form.save
        response.respond_with SuccessfulUpdateResponse, form
      else
        now_flash[:message] = "Could not save registration."
        response.render action: 'edit', ivars: {registration: form}
      end
    end

    SuccessfulUpdateResponse = Struct.new(:form) do
      def html(response, flash, current_event)
        flash[:message] = "Updated details for %s" % form.name
        response.redirect_to :registrations, current_event
      end

      def js(response)
        response.render json: form
      end
    end
  end
end

This makes it possible to test them in isolation, leading to a better appreciation of your dependencies and nicer code.

Check it out!

Guice in your JRuby

At work we have a Java application container that uses Google Guice for dependency injection. I thought it would be fun to try and embed some Ruby code into it.

Guice uses types and annotations to wire components together, neither of which Ruby has. It also uses Java meta-class information heavily (SomeClass.class). High hurdles, but we can clear them.

Warming Up

Normally JRuby is used to interpret Ruby code inside a Java environment, but it also provides functionality to compile a Ruby class to a Java one. In essence, it creates a Java wrapper class that delegates all calls to Ruby. Let’s look at a simple example.

1
2
3
4
5
6
# SayHello.rb
class SayHello
  def hello(name)
    puts "Hello #{name}"
  end
end

Compile using the jrubyc script. By default it compiles directly to a .class file, but it doesn’t work correctly at the moment. Besides, going to Java first allows us to see what is going on.

1
jrubyc --java SayHello.rb

The compiled Java is refreshingly easy to understand. It even has comments!

Imports are redacted from all Java examples for brevity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// SayHello.java
public class SayHello extends RubyObject  {
    private static final Ruby __ruby__ = Ruby.getGlobalRuntime();
    private static final RubyClass __metaclass__;

    static {
        String source = new StringBuilder("class SayHello\n" +
            "  def hello(name)\n" +
            "    puts \"Hello #{name}\"\n" +
            "  end\n" +
            "end\n" +
            "").toString();
        __ruby__.executeScript(source, "SayHello.rb");
        RubyClass metaclass = __ruby__.getClass("SayHello");
        metaclass.setRubyStaticAllocator(SayHello.class);
        if (metaclass == null) throw new NoClassDefFoundError("Could not load Ruby class: SayHello");
        __metaclass__ = metaclass;
    }

    /**
     * Standard Ruby object constructor, for construction-from-Ruby purposes.
     * Generally not for user consumption.
     *
     * @param ruby The JRuby instance this object will belong to
     * @param metaclass The RubyClass representing the Ruby class of this object
     */
    private SayHello(Ruby ruby, RubyClass metaclass) {
        super(ruby, metaclass);
    }

    /**
     * A static method used by JRuby for allocating instances of this object
     * from Ruby. Generally not for user comsumption.
     *
     * @param ruby The JRuby instance this object will belong to
     * @param metaclass The RubyClass representing the Ruby class of this object
     */
    public static IRubyObject __allocate__(Ruby ruby, RubyClass metaClass) {
        return new SayHello(ruby, metaClass);
    }

    /**
     * Default constructor. Invokes this(Ruby, RubyClass) with the classloader-static
     * Ruby and RubyClass instances assocated with this class, and then invokes the
     * no-argument 'initialize' method in Ruby.
     *
     * @param ruby The JRuby instance this object will belong to
     * @param metaclass The RubyClass representing the Ruby class of this object
     */
    public SayHello() {
        this(__ruby__, __metaclass__);
        RuntimeHelpers.invoke(__ruby__.getCurrentContext(), this, "initialize");
    }

    public Object hello(Object name) {
        IRubyObject ruby_name = JavaUtil.convertJavaToRuby(__ruby__, name);
        IRubyObject ruby_result = RuntimeHelpers.invoke(__ruby__.getCurrentContext(), this, "hello", ruby_name);
        return (Object)ruby_result.toJava(Object.class);
    }
}

Simple: A Java class with concrete type and method definitions, delegating each method to Ruby. For the next step, JRuby supports metadata provided in Ruby to control the exact types and annotations that are used in the generated code.

1
2
3
4
5
6
7
# SayHello.rb
class SayHello
  java_signature 'void hello(String)'
  def hello(name)
    puts "Hello #{name}"
  end
end
1
2
3
4
5
public void hello(String name) {
    IRubyObject ruby_name = JavaUtil.convertJavaToRuby(__ruby__, name);
    IRubyObject ruby_result = RuntimeHelpers.invoke(__ruby__.getCurrentContext(), this, "hello", ruby_name);
    return;
}

Perfect! Now we have all the pieces we need to start wiring our Ruby into Guice.

Guice

Let’s start by injecting an object that our Ruby class can use to do something interesting.

1
2
3
4
5
6
7
public class JrubyGuiceExample {
  public static void main(String[] args) {
    Injector injector = Guice.createInjector();
    SimplestApp app = injector.getInstance(SimplestApp.class);
    app.run();
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
require 'java'

java_package 'net.rhnh'

java_import 'com.google.inject.Inject'

class SimplestApp
  java_annotation 'Inject'
  java_signature 'void MyApp(BareLogger logger)'
  def initialize(logger)
    @logger = logger
  end

  def run
    @logger.info("Hello from Ruby")
  end
end

Guice will see the BareLogger type, and automatically create an instance of that class to be passed to the initializer.

Guice also allows more complex dependency graphs, such as knowing which concrete class to provide for an interface. These are declared using a module, which — though probably not a good idea — we can also write in ruby. The following example tells Guice to provide an instance of PrefixLogger whenever an interface of SimpleLogger is asked for.

1
2
3
4
5
6
7
public class JrubyGuiceExample {
  public static void main(String[] args) {
    Injector injector = Guice.createInjector(new ComplexModule());
    ComplexApp app = injector.getInstance(ComplexApp.class);
    app.run();
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
require 'java'

java_package 'net.rhnh'

java_import 'com.google.inject.Provides'
java_import 'com.google.inject.Binder'

class ComplexModule
  java_implements 'com.google.inject.Module'

  java_signature 'void configure(Binder binder)'
  def configure(binder)
    binder.
      bind(java::SimpleLogger.java_class).
      to(java::PrefixLogger.java_class)
  end

  protected

  def java
    Java::net.rhnh
  end
end

You can also provide more complex setup logic in dedicated methods with the Provides annotation. See the example project linked at the bottom of the post.

Maven integration

Running jrubyc all the time is a drag. Thankfully, someone has already made a maven plugin that puts everything in the right place.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<plugin>
  <groupId>de.saumya.mojo</groupId>
  <artifactId>jruby-maven-plugin</artifactId>
  <version>0.29.1</version>
  <configuration>
    <generateJava>true</generateJava>
    <generatedJavaDirectory>target/generated-sources/jruby</generatedJavaDirectory>
  </configuration>
  <executions>
    <execution>
      <phase>process-resources</phase>
      <goals>
        <goal>compile</goal>
      </goals>
    </execution>
  </executions>
</plugin>

Now running mvn package will compile Ruby code from src/main/ruby to java code in target, which is then available for the main Java build to compile.

For more examples and runnable code, see the jruby-guice project on GitHub.

Benchmarking RSpec double versus OpenStruct

I noticed a number of my unit tests were taking upwards of 10ms, an order of magnitude slower than they should be. Turns out I was abusing rspec doubles, in particular I was using one instead of a value object. Doubles are far slower than plain Ruby objects, in particular as the number of attributes goes up. It looks linear, but the constant factor is bad. The following benchmark demonstrates using a double versus an OpenStruct, which can often be used as a drop in replacement. (Normally I just use the value object itself, but it this case it was an ActiveRecord subclass.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
require 'ostruct'

describe 'benchmark' do
  let(:attributes) {
    ENV['N'].to_i.times.each_with_object({}) {|x, h| h["attr_#{x}"] = 'hello' }
  }

  5.times do
    it 'measures doubles' do
      double(attributes)
    end

    it 'measures structs' do
      OpenStruct.new(attributes)
    end
  end
end

Only 6-8 attributes before the 1ms barrier is broken, and this is only for construction!

To graph it, I threw out the first result for each measurement, since it tended to be all over the shop during warm up. The following script is a hack that relies on a priori knowledge that double is slower, since it doesn’t try to match rspec profile out measurements to label. The measurements are so different in this case that it works.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
> for N in {1..20}; do env N=$N rspec benchmark_spec.rb -p | \
  grep seconds | \
  grep benchmark_spec | \
  awk '{print $1}' | \
  xargs echo $N; done > results.dat

> gnuplot << eor
set terminal jpeg size 600,200 font "arial,9"
set key left
set output 'graph.jpg'
set datafile separator " "
set xlabel '# of attributes'
set ylabel 'construction time (s)'
plot 'results.dat' u 1:( (\$3+\$4+\$5+\$6)/4) with lines title 'Double', \
       '' u 1:( (\$8+\$9+\$10+\$11) / 4) with lines title 'Struct'
eor

My next project: what is the best way to get the elevated guarantees provided by rspec-fire without taking the speed hit?

Testing Stripe OAuth Connect with Capybara and Selenium

Stripe only allows you to set a fixed redirect URL in your test OAuth settings. This is problematic because you need to redirect to a different host and port depending on whether you are in development or test mode. In other words, there is a global callback that needs to be routed correctly to local callbacks.

My workaround is to use a simple rack application that redirects any incoming requests to the selected host and port. The Capybara host and port is written out to a file on spec start, and if that isn’t present it assumes development. It is clearly a hack, but works fairly well until Stripe provides a better way to do it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# stripe.ru
run lambda {|env|
  req = Rack::Request.new(env)

  server_file = "/tmp/capybara_server"
  host_and_port = if File.exists?(server_file)
    File.read(server_file)
  else
    "localhost:3000"
  end

  response = Rack::Response.new(env)
  url = "http://#{host_and_port}"
  url << "#{req.path}"
  url << "?#{req.query_string}" unless req.query_string.empty?

  response.redirect(url)
  response.finish
}
1
2
3
4
5
6
7
8
9
10
11
12
13
# spec/acceptance_helper.rb
SERVER_FILE = "/tmp/capybara_server"

Capybara.server {|app, port|
  File.open(SERVER_FILE, "w") {|f| f.write("%s:%i" % ["127.0.0.1", port]) }
  Capybara.run_default_server(app, port)
}

RSpec.configure do |config|
  config.after :suite do
    FileUtils.rm(SERVER_FILE) if File.exists?(SERVER_FILE)
  end
end

This requires the rack application to be running already (much like the database is expected to be running), which can be done thusly:

1
bundle exec rackup --port 3001 stripe.ru

Set your Stripe callback to http://localhost:3001/your/callback.

Upload Forerunner 410 to Strava with Garmin Communicator for Ubuntu Linux

I didn’t figure this out, these instructions were kindly emailed to be by Andreas, the author of Linux Garmin Communicator.

1. Install Linux Garmin Communicator
2. Uncompress Forerunner410.tar.gz to ~/forerunner (this was sent to me by Andreas.)
3. Configure your ~/.config/garminplugin/garminplugin.xml thusly, substituting in your own home folder:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<GarminPlugin logfile="/tmp/garminplugin.log" level="ERROR">
    <Devices>
        <Device>
            <Name>Forerunner 410</Name>
            <StoragePath>/home/xavier/forerunner</StoragePath>
            <StorageCommand></StorageCommand>
            <FitnessDataPath></FitnessDataPath>
            <GpxDataPath></GpxDataPath>
        </Device>
    </Devices>
    <Settings>
        <ForerunnerTools enabled="false" />
    </Settings>
</GarminPlugin>

4. Install python-ant-downloader
5. Set tcx_output_dir = ~/forerunner/Garmin/History in ~/.antd/antd.cfg
6. With your watch on, run ant-downloader. It will download raw data from the device and create a TCX file in the above mentioned output directory.
7. At Strava, Upload Activity.

`ant-downloader` also has a daemon mode that automatically downloads files from your watch, but I’m not using it (I don’t like things running when not necessary).

Automatically backup Zoho Calendar, Google Calendar

Quick script I put together to automatically back up all of Jodie’s calendars for her.

Works for any online calendar that exposes an iCal link. You’ll need to replace “http://icalurl” in the script with the private iCal URL of your calendar. In Zoho, this is under Settings > My Calendars > Share > Enable private Address for this calendar.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
require 'date'
require 'fileutils'

calendars = {
  'My Calendar'    => 'http://icalurl',
  'Other Calendar' => 'http://icalurl'
}

folder = Date.today.to_s

FileUtils.mkdir_p(folder)

calendars.each do |name, url|
  puts %|Backing up "#{name}"...|
  `curl -s "#{url}" > "#{folder}/#{name}.ics"`
end
puts "Done!"

Stores a folder per day. For bonus points, put it straight into Dropbox.

  • Posted on June 02, 2012
  • Tagged code, ruby

Setting isolation level in MySQL 5.1, 5.5, Postgres

From the I-want-my-evening-back department, differences in behaviour when
setting isolation levels between MySQL 5.1, 5.5, and postgres. Documenting here
for my poor future self.

In postgres and MySQL 5.1, the following is the correct ordering:

1
2
3
4
5
6
7
ActiveRecord::Base.transaction do
  ActiveRecord::Base.connection.execute(
    "SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE"
  )

  # ...
end

On MySQL 5.5 with mysql2 gem, no error will be raised, but the isolation
level will not be set correctly. If you run the same commands in a mysql shell,
you see an error informing that the isolation level cannot be set after the
transaction has started.

Ok well, let’s move it outside then:

1
2
3
4
5
6
ActiveRecord::Base.connection.execute(
  "SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE"
)
ActiveRecord::Base.transaction do
  # ...
end

That works on 5.5, but fails on postgres.

Screencast: moving to Heroku

A treat from the archives! I found a screen recording with commentary of me moving this crusty old blog from a VPS on to Heroku from about a year ago. It’s still pretty relevant, not just technology wise but also how I work (except I wasn’t using tmux then).

This is one take with no rehersal, preparation or editing, so you get my development and thought process raw. All two and a half hours of it. That has positives and negatives. I don’t know how interesting this is to others, but putting it out there in case. Make sure you watch them in a viewer that can speed up the video.

An interesting observation I noted was that I tend to have two tasks going in parallel most of the time to context switch between when I’m blocked on one waiting for a gem install or the like.

I have divided it into four parts, each around 40 minutes long and 350mb in size.

  • Part 1 gets the specs running, green, fixes deprecations, moves from 1.8 to 1.9.
  • Part 2 moves from MySQL to Postgres, replaces sphinx with full text search.
  • Part 3 continues the sphinx to postgres transition, implementing related posts
  • Part 4 deploys the finished product to heroku, copies data across, and gets exception notification working.

Rough indexes are provided below.

Part 1

0:00 Introduction
0:50 rake, bundle
1:42 Search for MySQL to PG conversion, maybe taps gem?
3:22 bundle finishes
3:42 couldn’t parse YAML file, switch to 1.8.7 for now
4:10 Add .rvmrc
4:39 bundle again for 1.8.7
4:50 Search for Heroku cedar stack docs (back when it was new), reading
6:30 Gherkin fails to build
8:50 Can’t find solution, update gherkin to latest
9:10 Find YAML fix while waiting for gherkin to update
10:08 Cancel gherkin update, switch to 1.9.2 and apply YAML fix
10:20 AWS S3 gem not 1.9 compatible, but not needed anymore so delete
11:10 Remove db2s3 gem also
11:20 nil.[] error, non-obvious
11:50 Missing test db config
12:20 Tests are running, failures
12:50 Debug missing partial error, start local server to click around and it works here
14:15 Back to fixing specs
14:25 Removed functionality but not specs, clearly haven’t been running specs regularly. Poor form.
15:45 Target specs passing
16:13 Fix a deprecation warning along the way
16:40 Commit fixes for 1.9.2
17:50 While waiting for specs, check for sphinx code
18:05 author_ip can’t be null, why is that still there?
18:50 make it nullable, don’t want to delete old data right now
19:40 Search for MySQL syntax
21:06 Oh actually author_ip does get set, specs actually are broken
22:07 Add blank values to spec, fixes spec.
22:39 Add blank values in again, would be nice to extract duplicate code
23:35 Start fixing tagging
24:30 Why no backtraces? Argh color scheme hiding them, must have reset recently
25:50 This changed recently? Look at git log
26:46 Looks like a dodgy merge, fixed. That’ll learn me for not running specs
28:15 Tackle view specs, long time since I’ve used these.
29:06 Be easier if I had factories, look for them.
29:23 Find them under cucumber
30:11 Extract valid_comment_attributes to spec_helper.rb
32:15 Fix broken undo logic
33:00 Extracting common factory logic
33:08 hmm, can you super from a method defined inside a spec?
33:30 yeah, apparently
35:28 working, check in
36:00 Fixing view specs
36:30 Remove approved_comments_count, don’t do spam checking anymore
37:15 Actually it is still there. Need to fix mocks.
39:15 Fix deprecations while waiting for specs.
39:30 Missing template
40:15 Need to use render :template
40:40 Check in, fixed view specs.
41:05 Running specs, looking all green. Fix RAILS_ENV to Rails.env
41:45 All green!

Part 2

0:30 Removing sphinx
2:20 Add pg gem
4:00 Create databases
4:45 Ah it’s postgres, not pg in database.yml
5:15 derp, postgresql
6:00 What are defensio migrations still doing hanging around?
6:45 Move database migrations around to not collide
7:45 taps
8:40 run tests against PG in background
9:30 don’t have open id columns in prod, it was removed in latest enki
11:25 ffffuuuuuu migrations and schema.rb
12:40 taps install failed on rhnh.net, why installing sqlite?
14:00 Argh can’t parse yaml
14:45 Abort taps remotely, bring mysqldump locally
16:00 Try taps locally
17:20 404 :(
17:50 it’s away!
18:10 Invalid encoding UTF-8, dammit.
18:30 New plan, there’s a different gem that does this.
19:00 What is it? I did it in a screencast, I should know this.
19:40 Found it! mysql2psql
20:20 taps, you’re cut
21:00 Setup mysql2psql.yml config
22:20 Works. That was much easier.
23:20 delayed_job, why is that here? Try removing it.
23:50 Used to use it for spam checking, but not anymore.
24:10 Time to replace search, how to do this?
25:00 Index tag list?
26:00 Hmm need full text search as well.
26:15 Step one: normal search, on title and body
27:00 Spec it, extract faux-factory for posts
29:00 Failing spec, implement
30:00 Search for PG full text search syntax
31:30 Passing, add in title search also
32:40 Passing with title as well
33:10 Adding tag cache to posts for easy searching
36:10 Argh migrations are screwed.
36:40 Move migrations back to where they were
39:09 Amend migration move like it never happened
38:45 Add data migration to tag_cache migration
39:30 WTF already have a tag cache. Where did it come from?
39:40 Delete everything I just did.
41:40 Check in web interface, works.

Part 3

00:20 related posts using full text search
02:55 sort by rank, reading docs
03:50 difference between ts_rank and ts_rank_cd?
4:30 Too hard, just pick one and see what happens
5:15 Syntax error in ts_query
5:45 plainto_tsquery
6:40 working, need to use or rather than and
10:30 Ah, using plainto, fix that.
11:04 Order by rank
12:20 syntax error, need to interpolate keywords
13:45 Search for how to escape SQL string in Activerecord
14:15 Find interpolate_sql, looks promising
14:50 Actually no, find sanitize_sql_array
15:20 Just try it, works. Click around to verify.
16:45 Add spec
21:20 Passing specs, commit
21:45 Why isn’t tagging working?
23:30 Ah, probably case insensitive. Need to use ILIKE.
24:00 Write a test for it
26:00 Have a failing test
26:30 Argh it’s inside acts_as_taggable_on_steroids plugin
27:20 Override the method directly in model, just for now
28:30 Commit that
29:00 Remove searchable_tags
32:00 Fix tags with spaces
34:00 Exclude popular tags from search (fix the wrong thing)
35:40 Back to fixing tags with spaces
37:20 Looking at rankings, good enough for now
38:00 Move sphinx namespace into rhnh

Part 4

00:30 Checking docs for new Cedar stack
1:30 Search for how to import data
2:20 pg_dump of data
2:50 Move dump to public Dropbox so heroku can access it
3:40 Push code to heroku
4:50 Taking a while, hmm repo is big
5:50 Clone a copy to tmp, check if it’s still big.
6:00 Yeah, eh not a big deal, it’s been a while a number of years.
7:00 heroku push done, run heroku ps. Crashed :(
7:30 AWS? I deleted you >:[
8:00 Argh I pushed master, not my branch
9:30 heroku ps, crashed again
10:30 Unclear, probably exception notifier, remove it
11:30 add thin gem while waiting
12:30 Running, expect not to work because database not set up
13:05 Create procfile
13:35 Import pg backup
15:20 Working, click around, make sure it’s working
16:20 Check whether atom feed is working
17:30 Check exception notifications
19:00 Either new comments, or something is wrong.
19:20 Yep new comments, need to reimport data. Do that later.
20:00 Back to exception notification. Used to be an add-on.
21:20 Don’t want hoptoad or get exceptional, maybe sendgrind with exception notifier?
22:00 Searching for examples.
22:20 Found stack overflow answer, looks promising.
24:20 Bring back exception notifier with sendgrind.
26:00 logs show sent mail, arrives in email
26:15 Next steps, DNS settings, extra database dump.

Ubuntu 12.04 dual boot on Macbook Pro

The official instructions are mostly right, but I still needed a bit of black magic to get everything working. Here are my supplementary instructions:

  1. Install rEFIt. Just works.
  2. Use Disk Utility to shrink main disk partition to make space for Ubuntu (leave “free space” in the rest). Failed because my disk has errors.
  3. Restart into single user mode. The internet tells you to hold down option+s as your computer boots. If you have rEFIt installed, this will take you to the rEFIt shell instead. Instead, let rEFIt boot, select OSX, then press F2. An option to boot to single-user mode will be presented.
  4. fsck -fy, as directed by the prompt. Interesting excerpt from OSX fsck manpage: “this should be used with great caution as this is a free license to continue after essentially unlimited trouble has been encountered.” Don’t worry about it, it’s fine.
  5. Reboot back into OSX, try Disk Utility again. Fails with “The partition cannot be resized. Try reducing the amount of change in the size of the partition.” Protip: don’t do that, it won’t help. Instead follow the instructions at this Superuser answer. It may take a few runs through to fix all the problems.
  6. Reboot with the Ubuntu LiveCD (I used ubuntu-12.04-desktop-i386.iso.torrent). rEFIt will present it as a bootable option to you. Select “Try Ubuntu” (not “install”).
  7. Select “Dash Home” icon (top left), find gparted tool. Create 1Gb swap partition, 24Gb, ext4 partition.
  8. Select the “Non-free firmware” icon that shows up in the icons top right and follow the prompts. Without this, your wireless won’t work.
  9. Select “Install Ubuntu” from desktop. Select custom install, change the ext4 partition to mount /, ensure the swap partition is labeled as such, and choose to install the boot loader to ext4 partition, not the main disk. Follow the rest of the prompts.
  10. After install, don’t reboot, instead keep trying Ubuntu. Shutdown (not reboot), then power on again. I tried to select linux, but it froze on the penguin grey screen and never got to linux. Following instructions from this post Hard power off, power on again and hold down option. Ubuntu shows up as “windows” [wtf], boot that, which loads up the grub prompt. Boot into the GUI then shut down again. Now Ubuntu will boot correctly from rEFIt.

To make it feel more like home, switch alt and command keys using this configuration and in the “Mouse and Trackpad” system settings enable two finger scroll and disable clicking with touchpad (otherwise you’ll accidentally click all the time while typing).

Last time I used Ubuntu was around version 6 days, which you can’t even download anymore. It’s a lot slicker. The icons and fonts are actually quite nice. Colemak is a first class citizen, I could select it during install and use it on my login screen, which is awesome.

Automatically pushing git repositories to Bitbucket

Bitbucket gives you unlimited private repositories. It’s the perfect place to archive all my crap to. Here is a script to create remotes for all repositories in a folder and push them up. I had 38 of them.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
$usr    = "xaviershay"
$remote = "bitbucket"

def main
  directories_in_cwd.each do |entry|
    existing_remotes = remotes_for(entry)

    action_performed = if existing_remotes
      if already_added?(existing_remotes)
        "EXISTING"
      else
        create_remote_repository(entry)
        push_local_repository_to_remote(entry)
        "ADD"
      end
    else
      "SKIP"
    end

    puts action_performed + " #{entry}"
  end
end

def directories_in_cwd
  Dir.entries(".").select {|entry|
    File.directory?(entry) && !%w(. ..).include?(entry)
  }
end

def remotes_for(entry)
  gitconfig = "#{entry}/.git/config"
  return unless File.exists?(gitconfig)
  existing_remotes = `cat #{gitconfig} | grep "url ="`.split("\n")
end

def already_added?(existing)
  existing.any? {|x| x.include?($remote) }
end

def create_remote_repository(entry)
  run %{curl -s -i --netrc -X POST -d "name=#{entry}" } +
          %{-d "is_private=True" -d "scm=git" } +
          %{https://api.bitbucket.org/1.0/repositories/}
end

def push_local_repository_to_remote(entry)
  Dir.chdir(entry) do
    run "git remote add #{$remote} git@bitbucket.org:#{$usr}/#{entry}.git"
    run "git push #{$remote} master"
  end
end

def run(cmd)
  `#{cmd}`
end

main

So you aren’t prompted for username and password every time, you should create a `.netrc` file.

1
2
> cat ~/.netrc
machine api.bitbucket.org login xaviershay password notmyrealpassword

Code to test ratio per commit with git

I came across a post titled visualizing commits with bubble charts

That seems pretty neat. I don’t have the visualization yet, but I put together a script to pull the required data from a git repository:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# usage: gitstats HEAD~5..

revs=`git log --format="%H" $1`

for rev in $revs; do
  author=`git log --format="%an" -n 1 $rev`
  date=`git log --format="%at" -n 1 $rev`

  git show --stat $rev |
    sed '$d' |
    egrep "(lib|spec)" |
    awk -v author="$author" -v rev="$rev" -v date="$date" '{
      split($1,a,"/"); sum[a[1]] += $3
    } END {
      if (sum["lib"]) print rev "," date "," author "," (sum["spec"] + sum["lib"]) "," (sum["spec"]/sum["lib"])
    } '
done

Would be nice not to shell out to git log three times, if anyone has any suggestions. This gives you one line per commit with the ref, timestamp, author, lines changed, code:test ratio, for example:

1
e10db7972b236c9b5e3eddc13e879f120cc4a82f,1333223104,Xavier Shay,42,1.33333
  • Posted on May 13, 2012
  • Tagged code, git

Conway's Game of Life in Haskell

Today I came across this excellent game of life implementation in Clojure, and also was learning about monads in Haskell. So I ported the former, using the latter!

The logic translates pretty much the same. Wondering if there is more monads to be had on the newCell assignment line (the one with concatMap and friends), even at the expense of readability. This is a learning exercise, after all. I went for bonus points by writing a function to render the grid, it didn’t go as well. Would love some feedback on it. Here is a forkable version.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import Data.List
import Control.Monad

type Cell = (Int, Int)
type Grid = [Cell]

-- Game Logic

neighbours :: Cell -> Grid
neighbours (x, y) = do
  dx <- [-1..1]
  dy <- [-1..1]
  guard (dx /= 0 || dy /= 0)
  return (x + dx, y + dy)

step :: Grid -> Grid
step cells = do
  (newCell, n) <- frequencies $ concatMap neighbours cells
  guard $ (n == 3) || (n == 2 && newCell `elem` cells)
  return newCell

-- This is the only deviation from the Clojure version, since it is not a
-- built-in in Haskell.
frequencies :: Ord a => [a] -> [(a, Int)]
frequencies xs = do
  x <- group $ sort xs
  return (head x, length x)


-- UI

-- Feel like I'm missing a concept. Not so happy with this function:
-- * Can `eol` be done a better way? I tried nested maps but it was urgh.
-- * `marker` seems long for a simple tenary. Same issue as `eol` I guess.
formatGrid :: Grid -> String
formatGrid grid = do
  y <- ys
  x <- xs
  [marker x y] ++ eol x
  where
    marker x y
      | (x, y) `elem` grid = '*'
      | otherwise          = ' '
    eol x
      | x == maximum xs = ['\n']
      | otherwise       = []

    xs = gridRange fst
    ys = gridRange snd
    gridRange f = [min grid .. max grid]
      where
        min = minimum . map f
        max = maximum . map f

main = do
  mapM_ printGrid . take 3 $ iterate step beacon
  where
    beacon = [(0, 0), (1, 0), (0, 1), (3, 3), (2, 3), (3, 2)]

    printGrid :: Grid -> IO ()
    printGrid grid = do
      putStrLn $ formatGrid grid
      putStrLn ""

DataMapper Retrospective

I introduced DataMapper on my last two major projects. As those projects matured after I had left, they both migrated to a different ORM. That deserves a retrospective, I think. As I’ve left both projects, I don’t have the insider level of detail on the decision to abandon DataMapper, but developers from both projects kindly provided background for this blog post.

Project A

Web application and a batch processing component built on top of a legacy Oracle database.

Good

  • Field mappings, nice ruby names and able to ignore fields we didn’t care about.

Bad

  • Had to roll our own locking and time zone integration.
  • Not great for batch processing (trying to write SQL through DM abstraction.)

It turned out this project required a lot more batch processing than we anticipated, which DataMapper does not shine at. It was migrated to Sequel which provides a far better abstraction for working closer to SQL.

Project B

A fairly typical Rails 3 application. A couple of tens of thousands of lines of code.

Good

  • No migrations (pre-release).
  • Foreign keys, composite primary keys.
  • Auto-validations.

Bad

  • Auto-validations with nested attributes was uncharted territory (needed bug fixes).
  • Performance on large object graphs was unusable for page rendering (close to two seconds for our home page, which admittedly had a stupid amount of stuff on it).
  • Performance was suboptimal (though passable) on smaller pages.
  • Tracing through what his happening across multiple gems (particularly around transactions) was tricky.
  • The maintenance/interactions of all the various gems was problematic (e.g. gems X,Y work with 1.9.3 but Z doesn’t yet).
  • Inability to easily “break the abstraction” when SQL was required.

The performance issues were clear in our code base, but eluded much effort to reduce them down to smaller reproducible problems. The best quick win I found was ~15% by disabling assertions, but I suspect that given the large scope of the problem DataMapper is trying to solve there may not be any approachable way of tackling the issue (would love to be proven wrong!)

We ran into obvious integration bugs (apologies for not having kept a concrete list), a symptom of a library not widely used. As a commiter on the project this wasn’t an issue, since they were easily fixed and moved past (the DataMapper code base is really nice to work on), but having a commiter on your team isn’t a tenable strategy.

DataMapper takes an all-ruby-all-the-time approach, which means things get tricky when the abstraction leaks. Much of the SQL generation is hidden in private methods. Compare some code to create a composable full text search query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def self.search(keywords, options = {})
  options = {
    conditions: ["true"]
  }.merge(options)

  current_query = query.merge(options)

  a           = repository.adapter
  columns_sql = a.send(:columns_statement,    current_query.fields,     false)
  conditions  = a.send(:conditions_statement, current_query.conditions, false)
  order_sql   = a.send(:order_statement,      current_query.order,      false)
  limit_sql   = current_query.limit || 50
  conditions_sql, conditions_values = *conditions

  bind_values = [keywords] + conditions_values

  find_by_sql([<<-SQL, *bind_values])
    SELECT #{columns_sql}, ts_rank_cd(search_vector, query) AS rank
    FROM things
    CROSS JOIN plainto_tsquery(?) query
    WHERE #{conditions_sql} AND (query @@ search_vector)
    ORDER BY rank DESC, #{order_sql}
    LIMIT #{limit_sql}
  SQL
end

To the ActiveRecord equivalent (Sequel is similar):

1
2
3
4
5
6
def self.search(keywords)
  select("things.*, ts_rank_cd(search_vector, query) AS rank")
    .joins(sanitize_sql_array(["CROSS JOIN plainto_tsquery(?) query", keywords]))
    .where("query @@ search_vector")
    .order("rank DESC")
end

Switching to ActiveRecord took a week of all hands (~4) on deck, plus another week alongside other feature work to get it stable. From beginning to in production was two weeks. The end result was a drop in response time (the deploy is pretty blatant in the graph below), start up time, plus 3K less lines of code (a lot of custom code for dropping down to SQL was able to be removed).

Do differently

Ultimately, DataMapper provides an abstraction that I just don’t need, and even if I did it hasn’t had its tires kicked sufficiently that a team can use it without having to delve down to the internals. The applications I find myself writing are about data, and the store in which that data lives is vitally important to the application. Abstracting away those details seems to be heading in the wrong direction for writing simple applications. As an intellectual achievement in its own right I really dig DataMapper, but it is too complicated a component to justify using inside other applications.

Rich Hickey’s talk Simple Made Easy has been rattling around my head a lot.

Nowadays I’m back to ActiveRecord for team conformance. It’s more work to keep on top of foreign keys and the like, but overall it does the job. It’s still too complicated, but has the non-trivial benefit of being used by lots of people. This is my responsible choice at the moment.

On my own projects I first reach for Sequel. It supports all the nice database features I want to use, while providing a thin layer over SQL. In other words, I don’t have to worry about the abstraction leaking because the abstraction is still SQL, just expressed in ruby (which is a huge win for composeability that you don’t get with raw SQL). While it does have “ORM” features, it feels more like the most convenient way of accessing my database rather than an abstraction layer. It’s actively maintained and the only bug I have found was something that Rails broke, and a patch was already available. There are no open issues in the bug tracker. My experiences have been overwhelmingly positive. I haven’t built anything big enough with it yet to have confidence using it on a team project though.

I still have a soft spot in my heart for DataMapper, I just don’t see anywhere for me to use it anymore.

Exercises in style

Let us make a stack machine! It can add numbers! This may be a winding journey. Have some time and an irb up your sleeve. Maybe it is more of a meditation than a blog post? Onwards!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def push_op(value)
  lambda {|x| [value, x + [value]] }
end

def add_op
  lambda {|x| [x[-1] + x[-2], x[0..-3]] }
end

[
  push_op(1),
  push_op(2),
  add_op
].inject([nil, []]) {|(result, state), op|
  op[state]
}

Get it? Pushes 1, pushes 2, then the add_op pops them off the stack and makes 3. Not a lot of metadata in those lambdas though, and we can’t combine them in interesting way.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class Operation < Struct.new(:block)
  def +(other)
    CompositeOperation.new(self, other)
  end

  def run(state)
    @block.call(state)
  end
end

class CompositeOperation < Operation
  def initialize(a, b)
    @a = a
    @b = b
    super(lambda {|x| @b.block[@a.block[x][1]] })
  end

  def desc
    @a.desc + "\n" + @b.desc
  end
end

class PushOperation < Operation
  def initialize(value)
    @value = value
    super(lambda {|x| [value, x + [value]] })
  end

  def desc
    "push #{@value}"
  end
end

class AddOperation < Operation
  def initialize
    super(lambda {|x| [x[-1] + x[-2], x[0..-3]] })
  end

  def desc
    "add top two digits on stack"
  end
end

A lot more setup, but now we also get a description of operations!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def tagged_push_op(value)
  PushOperation.new(value)
end

def tagged_add_op
  AddOperation.new
end

ops =
  tagged_push_op(1) +
  tagged_push_op(2) +
  tagged_add_op

puts ops.desc
puts ops.run(start_state).inspect

Ok you get that. What else can we do?

“every monad [.] embodies a particular computational strategy. A ‘motto of computation,’ if you will.”Mental Guy

hmmm. What does it mean?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class VerboseStackEvaluator < Struct.new(:stack)
  attr_accessor :result, :stack

  def pass(op)
    puts op.desc
    results = op.call(stack)
    self.class.new(results[1]).tap do |x|
      x.result = results[0]
    end
  end

  def self.identity
    new([])
  end
end

e = evaluator.identity.
  pass(tagged_push_op(1)).
  pass(tagged_push_op(2)).
  pass(tagged_add_op)

p [e.result, e.stack]

Oh so now we have one structure (the pass stuff) that we can run through different evaluators. Let us make a recursive one!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class RecursiveLazyStackEvaluator < Struct.new(:stack)
  def pass(op)
    self.class.new(lambda {
      op.call(stack)
    })
  end

  def self.identity
    new(lambda { [nil, []] })
  end

  def result; evaled[0]; end
  def stack;  evaled[1]; end

  private

  def evaled
    @evaled ||= @stack.call
  end
end

Do you see it is now lazy. Rather than evaluate each operation when pass is called, it saves them up until a result is requested. Look out! Haskell in your Ruby! Recursion might blow out our stack though. Let us isomorphically (I just learned this word) translate it to use iteration!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class LazyStackEvaluator
  attr_accessor :steps

  def initialize(stack, steps = [])
    @stack  = stack
    @steps  = steps
  end

  def pass(op)
    self.class.new(@stack, steps + [op])
  end

  def self.identity
    new([])
  end

  def result; evaled[0]; end
  def stack;  evaled[1]; end

  protected

  def evaled
    @evaled ||= steps.inject([nil, @stack]) {|(r, s), op|
      op.call(s)
    }
  end
end

Not too shabby. Let’s try something more useful. Given we only have one operation that pops the stack (add), and it only pops two numbers, if we have more than two numbers in a row they start becoming redundant. Let us optimize!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class OptimizingEvaluator < LazyStackEvaluator
  def evaled
    @evaled ||= begin
      accumulator = []
      new_steps   = []
      steps.each do |step|
        accumulator << step
        if !step.is_a?(PushOperation)
          new_steps += accumulator
          accumulator = []
        elsif accumulator.length > 2
          accumulator = accumulator[1..-1]
        end
      end
      new_steps += accumulator
      new_steps.inject([nil, @stack]) {|(r, s), op|
        op.call(s)
      }
    end
  end
end

e = evaluator.identity.
  pass(tagged_push_op(1)). # This won't get run!
  pass(tagged_push_op(1)).
  pass(tagged_push_op(2)).
  pass(tagged_add_op)

p [e.result, e.stack]

Ok one more. This one is pretty useless for this problem, but perhaps it will inspire thought. Let us multithread!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class ThreadingEvaluator < LazyStackEvaluator
  def evaled
    @evaled ||= begin
      accumulator = []
      workers     = []
      steps.each do |step|
        accumulator << step
        if step.is_a?(AddOperation)
          workers << spawn_thread(accumulator)
          accumulator = []
        end
      end
      workers << spawn_thread(accumulator) unless accumulator.empty?
      workers.each(&:join)

      workers.last[:result]
    end
  end

  def spawn_thread(accumulator)
    Thread.new do
      sleep rand / 3
      Thread.current[:result] = begin
        e = accumulator.inject(VerboseStackEvaluator.identity) {|e, s| e.pass(s) }
        [e.result, e.stack]
      end
    end
  end
end

e = evaluator.identity.
  pass(tagged_push_op(1)).
  pass(tagged_push_op(1)).
  pass(tagged_push_op(2)).
  pass(tagged_add_op).
  pass(tagged_push_op(3)).
  pass(tagged_push_op(4)).
  pass(tagged_add_op)

p [e.result, e.stack]

Ok that is all. Here is an exercise for you: how would you allow the threading and optimizing evaluators to be combined?

  • Posted on September 05, 2011
  • Tagged code, ruby

SICP Lisp interpreter in Clojure

On a lazy Sunday morning I can oft be found meandering through the classics of computer science literature. This weekend was no exception, as I put together a LISP interpreter in Clojure based off chapter 4 of The Structure and Interpretation of Computer Progams.

The code is on github, rather than including it inline here, since at 90 lines plus tests it’s getting a tad long for a snippet.

It differs from the SICP version in that the environment variable is immutable, so new versions have to be passed through to each function. This resulted in the “context” concept that encapsulates both the current expression and the environment that does with. It causes a small amount of clunky code (see map-reducer), but also allows easier managing of scoping for lambdas (see do-apply and env-extend). It matches the functional paradigm much better anyway. I also used some higher level primitives such as map and reduce that SICP doesn’t - SICP is demonstrating that they aren’t necessary, but that’s a point I’ve already conceeded and don’t feel I need to replicate.

Critique of my style warmly encouraged, I’m still new to Clojure.

Vim and tmux on OSX

I recently switched from MacVim to vim inside tmux, using iTerm in full screen mode (Command+Enter). It’s pretty rad. I tried screen first, but even after a lot of screwing around there was still a lot of brokeness, and I don’t like how it does split panes anyways. Follows are some notes about what is required for tmux.

Get the latest vim and tmux

Latest vim required for proper clipboard sharing, if you don’t want to install it you can use the pbcopy plugin mentioned below.

1
2
brew install --HEAD vim
brew install tmux

Set up pretty colors

my vim/tmux setup

I use the solarized color scheme. To make this work, ensure you are not overriding the TERM variable in your .{bash|zsh}rc, then create an alias for tmux:

1
2
# .zshrc
alias tmux="TERM=screen-256color-bce tmux"

I also have a tmux config:

1
2
# .tmux.conf
set -g default-terminal "screen-256color"

Clipboard sharing

Up until I wrote this blog post, I had been using the pbcopy plugin to share clipboard using a cute hack involving ssh’ing back into your machine to run pbcopy/pbpaste. In researching some more details on this though I found an excellent write up of the problem and a far better solution by Chris Johnsen that enables proper sharing without ssh’ing, and therefore also the * register (use "*y to copy, "*p to paste – note this does not work with the vim that ships with OSX).

Mouse integration

The mouse is good for two things: scrolling, and selecting text from your scrollback.

For the first, put the following config:

1
2
# ~/.tmux.conf
set -g mode-mouse on

For the second, hold the option key while you select.

Workflow

Find another reference for basic keys, this here are notes on top of that. Ctrl-B sucks as an escape sequence, rebind it to Ctrl-A to match screen. Most online references don’t mention it, but the default binding for horizontal split is prefix " (it’s in the man page). I tend to have a main pane for editing and a smaller pane for a REPL or log. If I need to investigate the smaller pane, I press Ctrl-A Ctrl-O, which switches the two panes to give me the log in the larger one.

I use the tslime.vim plugin to send text directly from vim to the supplementary pane. This is a killer feature. As well as the built in Ctrl-C shortcut, I also use a trick I learned from Gary Bernhardt and remap <leader>t on the fly to send whatever command I am currently testing to the other pane. Some examples:

1
2
3
4
; Load a file into a clojure repl
:map ;t :w\|:call Send_to_Tmux("\n\n\n(load-file \"./myfile.clj\")\n")<CR>
; Run rspec in zsh
:map ;t :w\|:call Send_to_Tmux("rspec spec/my_spec.rb\n")<CR>

If I need to interact with a shell I’ll usually Ctrl-Z vim, do what I need to do, then fg back again. If it’s a context switch, I’ll start a new tmux window then exit it after I’m done with the distraction.

I don’t use sessions. I prefer setting up from scratch each time since it takes no time at all, and eases my brain into the problem. Clean desk and all that.

That’s it. Nothing too fancy, but I’ve been meaning to make the switch from MacVim for a while and with this set up I can’t ever see myself going back.

OCR with Clojure and ImageMagick

Let’s write some Clojure to recognize hand-written digits. It will be fun. But first, some notes.

NOTE THE FIRST: If you actually want proper OCR with Clojure that is actually useful, perhaps try this blog post on using OpenCV and Tesseract. If you want to have some fun from first principles, come with me.

NOTE THE SECOND: This post was heavily inspired by Chapter 2 in Machine Learning in Action, which details the K nearest neighbour algorithm and pointed me to the dataset. If you dig this post, you should buy that book.

OK let’s go! Here’s what we’re going to do:

  • Take a snapshot of your handwriting.
  • Use ImageMagick to post-process it.
  • Convert the snapshot to a text format matching our training data.
  • Download and parse a training set of data.
  • Identify the digit written in the snapshot using the training data.

It’s going to be great.

Take a snapshot

Draw a single numeric digit on a piece of paper. Take a photo of it and get it on your computer. I used Photo Booth and the built-in camera on my Mac. Tight crop the picture around the number, so it looks something like:

Don’t worry if it’s a bit grainy or blurry, our classifier is going to be pretty smart.

Use ImageMagick to post-process it

The ImageMagick command line utility convert is one of those magic tools that once you learn you can never imagine how you did without it. It can do anything you need to an image. Anything. For instance, resize our image to 32×32 pixels and convert it into black and white.

1
2
3
4
5
6
7
(ns ocr.main
  (:use [clojure.contrib.shell-out    :only (sh)]))

(defn convert-image
  [in out]
  (sh "convert" in "-colorspace" "gray" "+dither" "-colors" "2"
      "-normalize" "-resize" "32x32!" out))

It took me a while to figure out this incantation. The user manual for quantize is probably the best reference you’ll find. Note that the exclamation mark in “32×32!” will stretch the dimensions of the image to be square. This is desirable since most people write too skinny, and maybe some write too fat, but we need the digits to be square otherwise everything will look like a “1”. Converting the above “5” will look like this:

I am shelling out from Clojure to transform the file. There are two other options: JMagick, which uses the C API directly using JNI, and im4java which still shells out but gives you a nice interface over the top of it. I couldn’t get the first one working (it looks like a pretty dead project, no updates for a few years), and the latter wouldn’t give me anything helpful in this case.

Convert the image into a text format

The convert program automatically formats the output file based on the file extension, you can easily convert between any graphic format you choose. For instance, convert JPG to PNG:

1
convert myfile.jpg myfile.png

As well as graphic formats though, it also supports the txt format, which looks like this:

1
2
3
4
# ImageMagick pixel enumeration: 32,32,255,rgb
0,0: (255,255,255)  #FFFFFF  white
1,0: (  0,  0,  0)  #000000  black
# etc...

That’s handy, because it can be easily translated into a bitmap with “1” representing black and “0” representing white. The “5” from above will look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
10000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000001111111111
00000000000000111111111111111111
00000000000011111111111111111111
00000000000011111111111111111110
00000000000111111111100000000000
00000000000111100000000000000000
00000000001111100000000000000000
00000000001111000000000000000000
00000000011110000000000000000000
00000000111110000000000000000000
00000000111110000000000000000000
00000000111110000000000000000000
00000000111111111000000000000000
00000000111111111000000000000000
00000000001111111100000000000000
00000000000111111110000000000000
00000000000001111111000000000000
00000000000000111111000000000000
00000000000000011111000000000000
00000000000000001111000000000000
00000000000000000111100000000000
00000000000000000111100000000000
00000000000000011111000000000000
00011111111111111111000000000000
00011111111111111110000000000000
00011111111111111100000000000000
00000111111111111000000000000000
00000000001110000000000000000000
00000000000000000000000000000000

I used the duck-streams library found in clojure.contrib to read and write the file from disk, and applied some light processing to get the data into the required format. I also used a temporary file on disk to store the data - I’m pretty sure there would be a way to get convert to write to STDOUT then process that in memory, but I didn’t figure it out. It’s handy for debugging to have the file there anyways.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(ns ocr.main
  (:use [clojure.contrib.shell-out    :only (sh)]))
  (:use [clojure.contrib.duck-streams :only (read-lines write-lines)]))

(defn read-text-image-line [line]
  (if (= "white" (last (split line #"[,:\s]+"))) "0" "1"))

(defn load-text-image
  [filename]
  (let [lines (vec (drop 1 (read-lines filename)))
        converted (map read-text-image-line lines) ]
    (map #(apply str %) (partition 32 converted))))

(defn convert-image
  [in out]
  (sh "convert" in "-colorspace" "gray" "+dither" "-colors" "2"
      "-normalize" "-resize" "32x32!" out)
  (write-lines out (load-text-image out)))

(def temp-outfile "/tmp/clj-converted.txt")

One more function is needed to be able to load that file up again into memory. This one doesn’t need to use read-lines, since the desired format for the classification below is actually just a vector of ones and zeros, so slurp is a quick alternative which is in the core libraries.

1
2
3
4
5
6
(defn load-char-file [file]
  (let [filename (.getName file)
        tokens   (split filename #"[_\.]")
        label    (first tokens)
        contents (parse-char-row (slurp file))]
    [label contents]))

Fetch some training data

The University of California Irving provides some sweet datasets if you’re getting into machine learning. In particular, the Optical Recognition of Handwritten Digits Data Set contains nearly 2000 labeled digits provided in the 32×32 text format the snapshot is now in. All digits are in one file, with a few header rows that can be dropped and ignored.

1
2
wget http://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits-orig.tra.Z
gunzip optdigits-orig.tra.Z
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(defn parse-char-row [row]
  (map #(Integer/parseInt %) (filter #(or (= % "1") (= % "0")) (split row #""))))

(defn parse-char-data [element]
  (let [label (trim (last element))
        rows  (take 32 element)]
    [label (vec (flatten (map parse-char-row rows)))]))

(defn load-training-data
  [filename]
  (let [lines (drop 21 (read-lines filename))
        elements (partition 33 lines)]
    (map parse-char-data elements)
  ))

(def training-set (load-training-data "optdigits-orig.tra"))

This code returns an array of all the training data, each element being an array itself with the first element a label (“0”, “1”, “2”, etc…) and the second element a vector of all the data (new lines ignored, they’re not important).

Note that I’m using vec throughout. This is to force lazy sequences to be evaluated, which is a required performance optimization for this program otherwise it won’t finish calculating.

Classify our digit

This is the exciting part! I won’t go into the algorithm here (buy the Machine Learning book!), but it’s called K Nearest Neighbour and it’s not particularly fancy but works surprisingly well. If you read my last blog post, you’ll note I’ve dropped the Incanter library. It was too much mucking about and didn’t provide any value for this project. Reading datasets is pretty easy with Clojure anyways.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(defn minus-vector [& args]
  (map #(apply - %) (apply map vector args)))

(defn sum-of-squares [coll]
  (reduce (fn [a v] (+ a (* v v))) coll))

(defn calculate-distances [in]
  (fn [row]
    (let [vector-diff (minus-vector (last in) (last row))
          label       (first row)
          distance    (sqrt (sum-of-squares vector-diff))]
    [label distance])))

(defn classify [in]
  (let [k                  10
        diffs              (map (calculate-distances in) training-set)
        nearest-neighbours (frequencies (map first (take k (sort-by last diffs))))
        classification     (first (last (sort-by second nearest-neighbours)))]
    classification))

Now to tie it all together with a main function that converts all the snapshots you pass in as arguments.

1
2
3
4
5
6
7
(defn classify-image [filename]
  (convert-image filename temp-outfile)
  (classify (load-char-file (java.io.File. temp-outfile))))

(defn -main [& args]
  (doseq [filename args]
    (println "I think that is the number" (classify-image filename))))

That’s the lot. Use it like so:

1
2
> lein run myDigits/5_0.jpg
I think that is the number 5

Hooray! Here is the full script as a gist. Let me know if you do anything fun with it.

Profiling Clojure

Tonight I was so impressed by how easy it was to profile some Clojure code using built-in JVM tools that I had to share:

Profiling Clojure.

Today I also learned more about the Incanter API, and wrote some good code to transform columns, among other things.

Exploring data with Clojure, Incanter, and Leiningen

I’m working through Machine Learning in Action at the moment, and it’s done in Python. I don’t really know Python, but I’d prefer to learn Clojure, so I’m redoing the code samples.

This blog posts show how to read a CSV file, manipulate it, then graph it. Turns out Clojure is pretty good for this, in combination with the Incanter library (think R for the JVM). It took me a while to get an environment set up since I’m unfamiliar with basically everything.

Install Clojure

I already had it installed so can’t remember if there were any crazy steps to get it working. Hopefully this is all you need:

1
sudo brew install clojure

Install Leiningen

Leiningen is a build tool which does many things, but most importantly for me is it manages the classpath. I was jumping through all sorts of hoops trying to get Incanter running without it.

There are easy to follow instructions in the README

*UPDATE: * As suggested in the comments, you can probably just `brew install lein` here and that will get you Leiningen and Clojure in one command.

Create a new project

1
lein new hooray-data && cd hooray-data

Add Incanter as a dependency to the project.clj file, and also a main target:

1
2
3
4
5
6
(defproject clj "1.0.0-SNAPSHOT"
  :description "FIXME: write"
  :dependencies [[org.clojure/clojure "1.2.0"]
                 [org.clojure/clojure-contrib "1.2.0"]
                 [incanter "1.2.3-SNAPSHOT"]]
  :main hooray_data.core)

Add some Incanter code to src/hooray_data/core.clj

1
2
3
4
5
6
(ns hooray_data.core
  (:gen-class)
  (:use (incanter core stats charts io datasets)))

(defn -main [& args]
  (view (histogram (sample-normal 1000)))

Then fire it up:

1
2
lein deps
lein run

If everything runs to plan you’ll see a pretty graph.

Code

First, a simple categorized scatter plot. read-dataset works with both URLs and files, which is pretty handy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
(ns hooray_data.core
  (:use (incanter core stats charts io)))

; Sample data set provided by Incanter
(def plotData (read-dataset 
            "https://raw.github.com/liebke/incanter/master/data/iris.dat" 
            :delim \space 
            :header true))

(def plot (scatter-plot
            (sel plotData :cols 0)
            (sel plotData :cols 1)
            :x-label "Sepal Length"
            :y-label "Sepal Width"
            :group-by (sel plotData :cols 4)))

(defn -main [& args]
  (view plot))

Second, the same data but normalized. The graph will look the same, but the underlying data is now ready for some more math.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
(ns hooray_data.core
  (:use (incanter core stats charts io)))

; Sample data set provided by Incanter
(def data (read-dataset 
            "https://raw.github.com/liebke/incanter/master/data/iris.dat" 
            :delim \space 
            :header true))

(defn extract [f]
  (fn [data]
     (map #(apply f (sel data :cols %)) (range 0 (ncol data)))))

(defn fill [n row] (map (fn [x] row) (range 0 n)))

(defn matrix-row-operation [operand row matrix] 
  (operand matrix 
    (fill (nrow matrix) row)))

; Probably could be much nicer using `reduce`
(defn normalize [matrix]
  (let [shifted (matrix-row-operation minus ((extract min) matrix) matrix)]
   (matrix-row-operation div ((extract max) shifted) shifted)))

(def normalized-data
  (normalize (to-matrix (sel data :cols [0 1]))))

(def normalized-plot (scatter-plot
            (sel normalized-data :cols 0)
            (sel normalized-data :cols 1)
            :x-label "Sepal Length"
            :y-label "Sepal Width"
            :group-by (sel data :cols 4)))

(defn -main [& args]
  (view normalized-plot))

I was kind of hoping the normalize function would have already been written for me in a standard library, but I couldn’t find it.

I’ll report back if anything else of interest comes up as I’m working through the book.

Interface Mocking

UPDATE: This is a gem now: rspec-fire The code in the gem is better than that presented here.

Here is a screencast I put together in response to a recent Destroy All Software screencast on test isolation and refactoring, showing off an idea I’ve been tinkering around with for automatic validation of your implicit interfaces that you stub in tests.

Interface Mocking screencast.

Here is the code for InterfaceMocking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
module InterfaceMocking

  # Returns a new interface double. This is equivalent to an RSpec double,
  # stub or, mock, except that if the class passed as the first parameter
  # is loaded it will raise if you try to set an expectation or stub on
  # a method that the class has not implemented.
  def interface_double(stubbed_class, methods = {})
    InterfaceDouble.new(stubbed_class, methods)
  end

  module InterfaceDoubleMethods

    include RSpec::Matchers

    def should_receive(method_name)
      ensure_implemented(method_name)
      super
    end

    def should_not_receive(method_name)
      ensure_implemented(method_name)
      super
    end

    def stub!(method_name)
      ensure_implemented(method_name)
      super
    end

    def ensure_implemented(*method_names)
      if recursive_const_defined?(Object, @__stubbed_class__)
        recursive_const_get(Object, @__stubbed_class__).
          should implement(method_names, @__checked_methods__)
      end
    end

    def recursive_const_get object, name
      name.split('::').inject(Object) {|klass,name| klass.const_get name }
    end

    def recursive_const_defined? object, name
      !!name.split('::').inject(Object) {|klass,name|
        if klass && klass.const_defined?(name)
          klass.const_get name
        end
      }
    end

  end

  class InterfaceDouble < RSpec::Mocks::Mock

    include InterfaceDoubleMethods

    def initialize(stubbed_class, *args)
      args << {} unless Hash === args.last

      @__stubbed_class__ = stubbed_class
      @__checked_methods__ = :public_instance_methods
      ensure_implemented *args.last.keys

      # __declared_as copied from rspec/mocks definition of `double`
      args.last[:__declared_as] = 'InterfaceDouble'
      super(stubbed_class, *args)
    end

  end
end

RSpec::Matchers.define :implement do |expected_methods, checked_methods|
  match do |stubbed_class|
    unimplemented_methods(
      stubbed_class,
      expected_methods,
      checked_methods
    ).empty?
  end

  def unimplemented_methods(stubbed_class, expected_methods, checked_methods)
    implemented_methods = stubbed_class.send(checked_methods)
    unimplemented_methods = expected_methods - implemented_methods
  end

  failure_message_for_should do |stubbed_class|
    "%s does not publicly implement:\n%s" % [
      stubbed_class,
      unimplemented_methods(
        stubbed_class,
        expected_methods,
        checked_methods
      ).sort.map {|x|
        "  #{x}"
      }.join("\n")
    ]
  end
end

RSpec.configure do |config|

  config.include InterfaceMocking

end

Static Asset Caching on Heroku Cedar Stack

UPDATE: This is now documented at Heroku (thanks Nick)

I recently moved this blog over to Heroku, and in the process added in some proper HTTP caching headers. The dynamic pages use the build in fresh_when and stale? Rails helpers, combined with Rack::Cache and the free memcached plugin available on Heroku. That was all pretty straight forward, what was more difficult was configuring Heroku to serve all static assets (such as images and stylesheets) with a far-future max-age header so that they will be cached for eternity. What I’ve documented here is somewhat of a hack, and hopefully Heroku will provide a better way of doing this in the future.

By default Heroku serves everything in public directly via nginx. This is a problem for us since we don’t get a chance to configure the caching headers. Instead, use the Rack::StaticCache middleware (provided in the rack-contrib gem) to serve static files, which by default adds far future max age cache control headers. This needs to be out of different directory to public since there is no way to disable the nginx serving. I renamed by public folder to public_cached.

1
2
3
4
5
6
7
8
9
10
# config/application.rb
config.middleware.use Rack::StaticCache, 
  urls: %w(
    /stylesheets
    /images
    /javascripts
    /robots.txt
    /favicon.ico
  ),
  root: "public_cached"

I also disabled the built in Rails serving of static assets in development mode, so that it didn’t interfere:

1
2
# config/environments/development.rb
config.serve_static_assets = false

In the production config, I configured the x_sendfile_header option to be “X-Accel-Redirect”. It was “X-Sendfile” which is an apache directive, and was causing nginx to hang (Heroku would never actually serve the assets to the browser).

1
2
# config/environments/production.rb
config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect'

A downside of this approach is that if you have a lot of static assets, they all have to hit the Rails stack in order to be served. If you only have one dyno (the free plan) then the initial load can be slower than it otherwise would be if nginx was serving them directly. As I mentioned in the introduction, hopefully Heroku will provide a nicer way to do this in the future.

Speeding up Rails startup time

In which I provide easy instructions to try a new patch that drastically improves the start up time of Ruby applications, in the hope that with wide support it will be merged into the upcoming 1.9.3 release. Skip to the bottom for instructions, or keep reading for the narrative.

UPDATE: If you have trouble installing, grab a recent copy of rvm: rvm get head.

Background

Recent releases of MRI Ruby have introduced some fairly major performance regressions when requiring files:

For reference, our medium-sized Rails application requires around 2200 files &emdash; off the right-hand side of this graph. This is problematic. On 1.9.2 it takes 20s to start up, on 1.9.3 it takes 46s. Both are far too long.

There are a few reasons for this, but the core of the problem is the basic algorithm which looks something like this:

1
2
3
4
5
6
7
def require(file)
  $loaded.each do |x|
    return false if x == file
  end
  load(file)
  $loaded << file
end

That loop is no good, and gets worse the more files you have required. I have written a patch for 1.9.3 which changes this algorithm to:

1
2
3
4
5
def require(file)
  return false if $loaded[file] 
  load(file)
  $loaded[file] = true
end

That gives you a performance curve that looks like this:

Much nicer.

That’s just a synthetic benchmark, but it works in the real world too. My main Rails application now loads in a mite over 10s, down from 20s it was taking on 1.9.2. A blank Rails app loads in 1.1s, which is even faster than 1.8.7.

Getting the fix

Here is how you can try out my patch right now in just ten minutes using RVM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# First get a baseline measurement
cd /your/rails/app
time script/rails runner "puts 1"

# Install a patched ruby
curl https://gist.github.com/raw/996418/e2b346fbadeed458506fc69ca213ad96d1d08c3e/require-performance-fix-r31758.patch > /tmp/require-performance-fix.patch
rvm install ruby-head --patch /tmp/require-performance-fix.patch -n patched
# ... get a cup of tea, this took about 8 minutes on my MBP

# Get a new measurement
cd /your/rails/app
rvm use ruby-head-patched
gem install bundler --no-rdoc --no-ri
bundle
time script/rails runner "puts 1"

How you can help

I need a lot more eyeballs on this patch before it can be considered for merging into trunk. I would really appreciate any of the following:

Next steps

I imagine there will be a bit more work to get this into Ruby 1.9.3, but after that this is just the first step of many to try and speed up the time Rails takes to start up. Bundler and RubyGems still spend a lot of time doing … something, which I want to investigate. I also want to port these changes over to JRuby which has similar issues (Rubinius isn’t quite as fast out of the gate, but does not degrade exponentially so would not benefit from this patch).

Thank you for your time.

Deleting duplicate data with PostgreSQL

Here is an update to a query I posted a while back for detecting duplicate data. It allows you to select all but one of the resulting duplicates, for easy deletion. It only works on PostgreSQL, but is pretty neat. It uses a window function!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
DELETE FROM users 
USING (
  SELECT id, first_value(id) OVER (
    PARTITION BY name ORDER BY created_at DESC
  ) first_id
  FROM users
  WHERE name IN (
    SELECT name 
    FROM users 
    GROUP BY name 
    HAVING count(name) > 1
  )
) dups
WHERE dups.id != dups.first_id AND users.id = dups.id;

The order by is optional, but handy if you need to select a particular row rather than just an arbitrary one. You need an extra sub-query because you can’t have window functions in a where clause.

For more tasty PostgreSQL tricks, check out my Meet PostgreSQL screencast, a steal at only $12 plug plug plug.

New Column: Code Safari

I am writing a regular weekly column at the newly launched Sitepoint project RubySource. The column is named “Code Safari”, where I explore the jungle of ruby libraries and gems and figure out how they work. It’s an introductory series designed to not just explain how things operate, but show you the tools and techniques so that you can figure it out yourself.

Three posts have already been published:

The format is a bit different but I’m really happy with how it is working so far. Let me know what you think.

  • Posted on April 18, 2011
  • Tagged code, ruby

YAML Tutorial

Many years ago I wrote a tutorial on using YAML in ruby. It still sees the most google traffic of any post, by far. So people want to know about YAML? I’ll help them out.

What is YAML?

YAML is a flexible, human readable file format that is ideal for storing object trees. YAML stands for “YAML Ain’t Markup Language”. It is easier to read (by humans) than JSON, and can contain richer meta data. It is far nicer than XML. There are libraries available for all mainstream languages including Ruby, Python, C++, Java, Perl, C#/.NET, Javascript, PHP and Haskell. It looks like this:

1
2
3
4
5
6
--- 
- name: Xavier
  country: Australia
  age: 24
- name: Don
  country: US

That is a simple array of hashes. You can nest any combination of these simple data structures however you like. Most parsers will also detect the 24 as an integer too. Quoting strings is optional, and was omitted in this example.

YAML allows you to add tags to your objects, which is extra meta-data that your application can use to deserialize portions into complex data structures. For instance, in ruby if you serialize a set object it looks like this:

1
2
3
4
5
# Set.new([1,2]).to_yaml
--- !ruby/object:Set 
hash: 
  1: true
  2: true

Notice that ruby has added the ruby/object:Set tag so that the correct object can be instantiated on deserialization, while maintaining a human readable rendition of a set. These tags can be anything you like, ruby just happens to use that particular format.

You can remove duplication from YAML files by using anchors (&) and aliases (*). You typically see this in configuration files, such as:

1
2
3
4
5
6
7
8
9
10
11
defaults: &defaults
  adapter:  postgres
  host:     localhost

development:
  database: myapp_development
  <<: *defaults

test:
  database: myapp_test
  <<: *defaults

& sets up the name of the anchor (“defaults”), << means “merge the given hash into the current one”, and * includes the named anchor (“defaults” again). The expanded version looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
defaults:
  adapter:  postgres
  host:     localhost

development:
  database: myapp_development
  adapter:  postgres
  host:     localhost

test:
  database: myapp_test
  adapter:  postgres
  host:     localhost

Note that the defaults hash hangs around, even though it isn’t really required anymore.

YAML generators use this technique to correctly serialize repeated references to the same object, and even cyclic references. That’s pretty clever.

Flow style

YAML has an alternate synax called “flow style”, that allows arrays and hashes to be written inline without having to rely on indentation, using square brackets and curly brackets respectively.

1
2
3
4
5
6
7
8
9
10
11
12
13
--- 
# Arrays
colors:
  - red
  - blue
# in flow style...
colors: [red, blue]

# Hashes
- name: Xavier
  age: 24
# in flow style...
- {name: Xavier, age: 24}

This has the curious effect of making YAML a superset of JSON. A valid JSON document is also a valid YAML document.

Performance

Given YAML’s richness and human readability, you would expect it to be slower than native serialization or JSON. This would be correct. My brief testing shows it is about an order of magnitude slower. For the typical configuration use-case, this is irrelevant, but worth keeping in mind if you are doing something crazy. Remember to run your own benchmarks that represent your specific need.

1
2
3
4
5
6
7
8
9
                     user       system     total    real
Marshal serialize    0.090000   0.000000   0.090000 (  0.091822)
Marshal deserialize  0.090000   0.000000   0.090000 (  0.092186)
JSON serialize       0.480000   0.010000   0.490000 (  0.480291)
JSON deserialize     0.130000   0.010000   0.140000 (  0.134860)
YAML serialize       2.040000   0.020000   2.060000 (  2.065693)
YAML deserialize     0.520000   0.010000   0.530000 (  0.526048)
Psych serialize      2.530000   0.030000   2.560000 (  2.565116)
Psych deserialize    1.510000   0.120000   1.630000 (  1.622601)

Curiously, the new YAML parser Psych included in ruby 1.9.2 appears significantly slower than the old one. Not sure what is going on there.

Reading YAML from a file with ruby

1
2
3
4
5
6
7
require 'yaml'

parsed = begin
  YAML.load(File.open("/tmp/test.yml"))
rescue ArgumentError => e
  puts "Could not parse YAML: #{e.message}"
end

Writing YAML to a file with ruby

1
2
3
4
require 'yaml'

data = {"name" => "Xavier"}
File.open("path/to/output.yml", "w") {|f| f.write(data.to_yaml) }

Anything else you’d like to know? Leave a comment.

Psych YAML in ruby 1.9.2 with RVM and Snow Leopard OSX

Note that you must have libyaml installed before you compile ruby, so this probably means you’ll need to recompile your current version.

1
2
3
sudo brew install libyaml
rvm install ruby-1.9.2 --with-libyaml-dir=/usr/local
ruby -rpsych -e 'puts Psych.load("win: true")'

Ordering by a field in a join model with DataMapper

The public interface for datamapper 1.0.3 does not support ordering by a column in a joined model on a query. The core of datamapper does support this though, so we can use some hacks to make it work, as the following code demonstrates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
require 'rubygems'
require 'dm-core'
require 'dm-migrations'

DataMapper::Logger.new($stdout, :debug)
DataMapper.setup(:default, 'postgres://localhost/test') # createdb test

class User
  include DataMapper::Resource

  property :id, Serial

  has 1, :user_profile

  def self.ranked
    order = DataMapper::Query::Direction.new(user_profile.ranking, :desc) 
    query = all.query # Access a blank query object for us to manipulate
    query.instance_variable_set("@order", [order])

    # Force the user_profile model to be joined into the query
    query.instance_variable_set("@links", [relationships['user_profile'].inverse])

    all(query) # Create a new collection with the modified query
  end
end

class UserProfile
  include DataMapper::Resource

  property :user_id, Integer, :key => true
  property :ranking, Integer, :default => 0

  belongs_to :user
end

DataMapper.finalize
DataMapper.auto_migrate!

User.create(:user_profile => UserProfile.new(:ranking => 2))
User.create(:user_profile => UserProfile.new(:ranking => 5))
User.create(:user_profile => UserProfile.new(:ranking => 3))

puts User.ranked.map {|x| x.user_profile.ranking }.inspect

Padrino, MongoHQ and Heroku

Next time I google for this I’ll find the answer waiting:

1
2
3
4
5
6
7
8
9
# config/database.rb
if ENV['MONGOHQ_URL']
  uri = URI.parse(ENV['MONGOHQ_URL'])
  MongoMapper.connection = Mongo::Connection.from_uri(ENV['MONGOHQ_URL'], :logger => logger)
  MongoMapper.database = uri.path.gsub(/^\//, '')
else
  MongoMapper.connection = Mongo::Connection.new('localhost', nil, :logger => logger)
  MongoMapper.database = "myapp_#{Padrino.env}"
end

Also I’ll write MongoDB here for google. Nicked from Fikus.

Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Tutorial

This took me a while to figure out, especially since I’m not so great with either windows or SQL server, but in the end the process isn’t so difficult.

Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Screencast

The steps covered in this screencast are:

  1. Create user
  2. Create database
  3. Give user permissions
  4. Create DSN
  5. Install ruby
  6. Install devkit (Needed to complie native extensions for ODBC)
  7. Create a new rails app
  8. Add activerecord-sqlserver-adapter and ruby-odbc to Gemfile
  9. Customize config/database.yml
1
2
3
4
5
6
7
8
# config/database.yml
development:
  adapter: sqlserver
  dsn: testdsn_user
  mode: odbc
  database: test
  username: xavier
  password:

Some errors you may encounter:

The specified module could not be found – odbc.so You have likely copied odbc.so from i386-msvcrt-ruby-odbc.zip. This is for 1.8.7, and does not work for 1.9. Remove the .so file, and install ruby-odbc as above.

The specified DSN contains an architecture mismatch between the Driver and the Application. Perhaps you have created a system DSN. Try creating a user DSN instead. I also found some suggestions that you need to use a different version of the ODBC configuration panel, but this wasn’t relevant for me.

Transactional before all with RSpec and DataMapper

By default, before(:all) in rspec executes outside of any transaction, meaning that you can’t really use it for creating objects. Normally this should go in a before(:each), but for a spec with simple creation and a large number of assertions this is terribly inefficient.

Let’s fix it!

This code assumes you are using DataMapper, and that your database supports some form of nested transactions (at the very least faking them with savepoints – see nested transactions in postgres with datamapper). It wraps each before/after :all and :each in it’s own transaction.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
RSpec.configure do |config|
  [:all, :each].each do |x|
    config.before(x) do
      repository(:default) do |repository|
        transaction = DataMapper::Transaction.new(repository)
        transaction.begin
        repository.adapter.push_transaction(transaction)
      end
    end

    config.after(x) do
      repository(:default).adapter.pop_transaction.rollback
    end
  end

  config.include(RSpecExtensions::Set)
end

See that RSpecExtensions::Set include? That’s a version of the lovely let helpers that works with before(:all) setup. Props to pcreux for this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
module RSpecExtensions
  module Set

    module ClassMethods
      # Generates a method whose return value is memoized
      # in before(:all). Great for DB setup when combined with
      # transactional before alls.
      def set(name, &block)
        define_method(name) do
          __memoized[name] ||= instance_eval(&block)
        end
        before(:all) { __send__(name) }
        before(:each) do
          __send__(name).tap do |obj|
            obj.reload if obj.respond_to?(:reload)
          end
        end
      end
    end

    module InstanceMethods
      def __memoized # :nodoc:
        @__memoized ||= {}
      end
    end

    def self.included(mod) # :nodoc:
      mod.extend ClassMethods
      mod.__send__ :include, InstanceMethods
    end

  end
end

Fast specs make me a happy man.

Nested Transactions in Postgres with DataMapper

Hacks to get nested transactions support for Postgres in DataMapper. Not extensively tested, more a proof of concept. It re-opens the existing Transaction class to add a check for whether we need a nested transaction or not, and adds a new NestedTransaction transaction primitive that issues savepoint commands rather than begin/commit.

I put this code in a Rails initializer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Hacks to get nested transactions in Postgres
# Not extensively tested, more a proof of concept
#
# It re-opens the existing Transaction class to add a check for whether
# we need a nested transaction or not, and adds a new NestedTransaction
# transaction primitive that issues savepoint commands rather than begin/commit.

module DataMapper
  module Resource
    def transaction(&block)
      self.class.transaction(&block)
    end
  end

  class Transaction
    # Overridden to allow nested transactions
    def connect_adapter(adapter)
      if @transaction_primitives.key?(adapter)
        raise "Already a primitive for adapter #{adapter}"
      end

      primitive = if adapter.current_transaction
        adapter.nested_transaction_primitive
      else
        adapter.transaction_primitive
      end

      @transaction_primitives[adapter] = validate_primitive(primitive)
    end
  end

  module NestedTransactions
    def nested_transaction_primitive
      DataObjects::NestedTransaction.create_for_uri(normalized_uri, current_connection)
    end
  end

  class NestedTransactionConfig < Rails::Railtie
    config.after_initialize do
      repository.adapter.extend(DataMapper::NestedTransactions)
    end
  end
end

module DataObjects
  class NestedTransaction < Transaction

    # The host name. Note, this relies on the host name being configured
    # and resolvable using DNS
    HOST = "#{Socket::gethostbyname(Socket::gethostname)[0]}" rescue "localhost"
    @@counter = 0

    # The connection object for this transaction - must have already had
    # a transaction begun on it
    attr_reader :connection
    # A unique ID for this transaction
    attr_reader :id

    def self.create_for_uri(uri, connection)
      uri = uri.is_a?(String) ? URI::parse(uri) : uri
      DataObjects::NestedTransaction.new(uri, connection)
    end

    #
    # Creates a NestedTransaction bound to an existing connection
    #
    def initialize(uri, connection)
      @connection = connection
      @id = Digest::SHA256.hexdigest(
        "#{HOST}:#{$$}:#{Time.now.to_f}:nested:#{@@counter += 1}")
    end

    def close
    end

    def begin
      run %{SAVEPOINT "#{@id}"}
    end

    def commit
      run %{RELEASE SAVEPOINT "#{@id}"}
    end

    def rollback
      run %{ROLLBACK TO SAVEPOINT "#{@id}"}
    end

    private
    def run(cmd)
      connection.create_command(cmd).execute_non_query
    end
  end
end

I wrote code similar to this with hassox while at NZX, big ups to those guys. I’m working on a proper patch, but haven’t quite figured out the internals enough. If you know how DataMapper works, please check out and comment on this sample patch for three dm gems.

Why I Rewrote Chronic

It seems like a pretty epic yak shave. If you want to parse natural language dates in ruby, you use Chronic. That’s just how it is. (There’s also Tickle for recurring dates, which is similar, but based on Chronic anyways.) It’s the standard, everyone uses it, so why oh why did I write my own version from scratch?

Three reasons I can see.

Chronic is unmaintained. Check the network graph for Chronic. A more avid historian could turn this into an epic teledrama, but for now here’s the summary: The main repository hasn’t had a commit since late 2008. Evaryont made a valiant attempt to take the reins, but his stamina only lasted an extra year to August 2009. Since then numerous people have forked his efforts, mostly to add 1.9 support. These efforts are fragmented though. The inertia of such a large project with no clear leadership sees every man running for himself.

Further, the new maintainers aren’t providing a rock solid base. From Evaryont’s README:
I decided on my own volition that the 40-some (as reported by Github) network should be merged together. I got it to run, but quite haphazardly. There are a lot of new features (mostly undocumented except the git logs) so be a little flexible in your language passed to Chronic. [emphasis mine]

This does not fill me with confidence.

Chronic has a large barrier to entry. Natural date parsing is a big challenge. In the original README, there are ~50 examples of formats it supports, and that is excluding all of the features added in forks in the last two years. The result is a large code base which is intimidating for a new comer, especially with no high level guidance as to how everything fits together. On a project of this size, “the documentation is in the specs” is insufficient. I know what it does, I need to know how it does it.

Chronic solves the wrong problem. I want an alternative to date pickers. As such, I don’t need time support, and I only need very simple day parsing. Chronic seems geared towards a calendar type application (“tomorrow at 6:45pm”), but also parses many expressions which simply are not useful in a real application either because they are obtuse - “7 hours before tomorrow at noon” - or just not how users think about dates - “3 months ago saturday at 5:00 pm”. (Note the last assertion is a totally unsubstantiated claim with no user research to support it.)

Further, it is not hard to find simple examples that Chronic doesn’t support. Omitting a year is an easy one: 14 Sep, April 9.

So what to do?

Chronic needs a leader. Chronic neads a hero. One man to reunite the forks, document the code, and deliver it to the promised land.

I am not that man.

I sketched out the formats I actually needed to support for my application, looked at it and thought “really it can’t be that hard”. Natural date parsing is hard; parsing only the dates your application requires is easy. One hour later I had a gem that not only had 100% support for all of the Chronic features I had been using, but also covered some extra formats I wanted (“14 Sep”), and could also convert a date back into a human readable description. That’s less time than I had already sunk into trying to get Chronic working.

Introducing Kronic.

Less than 100 lines of code, totally specced, totally solved my problem. Ultimately, I don’t want to deal with this problem, so I wanted the easiest solution. While patching Chronic would intuitively appear to be pragmatic, a quick spike in the other direction turned out to be worthwhile. Sometimes 80% just isn’t that hard.

Build time graph with buildhawk

How long your build took to run, in a graph, on a webpage. That’s pretty fantastic. You need to be storing your build time in git notes, as I wrote about a few weeks back. Then simply:

1
2
3
gem install buildhawk
buildhawk > report.html
open report.html

This is a simple gem I hacked together today that parses git log and stuffs the output into an ERB template that uses TufteGraph and some subtle jQuery animation to make it look nice. For extra prettiness, I use the Monofur font, but maybe you are content with your default monospace. If you want to poke around the internals (there’s not much!) have a look on Github.

Six best talks from LSRC 2010

I wrote this last fortnight, but was waiting for videos. Still missing a few, but it’s a start. Enjoy!

I am just finishing up a week in Austin, Texas. I was here for Lone Star Ruby Conference, at which I ran both my Database Is Your Friend Training, and also a full day introduction to MongoDB course. I was then free to enjoy the talks for the remaining two days. Here are my top picks.

Debugging Ruby

Aman Gupta gave a fantastic overview of the low level tools available for debugging ruby applications, including perf-tools, strace, gdb, bleak-house, and some nice ruby wrappers he has written around them. I had heard of these tools before, but was never sure when to use them or where to start if I wanted to use them. Aman’s presentation was the hook I needed to get into these tools, giving plenty of real examples of where they had been useful and how he used them.

Slides

Seven Languages in Seven Weeks

Bruce Tate gave an entertaining talk in which he compared seven languages to movie characters. It was a great narrative, and is energy and excitement about the languages was infectious. He has written a book on the same topic, which I plan on purchasing when I make some time to work through it. There are some sample chapters available at the pragprog site.

Book

Greasing Your Suite

I had seen the content of Nick’s talk “Greasing Your Suite” before in slide format, and it was just as excellent live. Nick takes the run time of a rails test suite from 13 minutes down to eighteen seconds. An incredible effort. While watching his talk I installed and set up his hydra gem, and it was dead simple to get my tests running in parallel. I only added a rake task and a tiny yml file—-no other setup required—-and I got a significant speed up even on trivial test suites. I was impressed at how easy it was to get going, and I’ll be using it on all my apps from now on.

Video (From Goruco, but he gave the same talk)

Deciphering Yehuda

Gregg Pollack’s talk on how some of the techniques used in the internals of rails and bundler work was excellent. While the content wasn’t new to me, I was impressed at Gregg’s ability to explain code on slides, a task difficult to do well. If you ever plan to present you should watch this to pick up some of Gregg’s techniques. I am going to be checking out his Introduction to Rails 3 screencasts for the same reason.

Video

Real Software Engineering

Glenn Vanderburg opened the conference with a fantastic talk on the history of software engineering. This answered a lot of questions that have been floating around my mind, especially to do with the misleading comparisons often made to other engineering disciplines. Give a civil engineer the ability to quickly prototype bridges for little cost, they are going to do a lost less modelling. A mathematical model is simply a way to reduce costs. And cost is always an object. Watch the talk, it’s brilliant.

Video

Keynote

The best overall talk was Tom Preston-Werner’s keynote Friday evening. His mix of story, humour, and inspiration were perfect for a keynote, and his delivery was excellent. He pitched his content expertly and though there was no specific item I hadn’t heard before, it has had a significant impact on my thoughts the past few days. Hopefully a video is up soon.

Speeding Up Rails Rake

On a brand new rails project (this article is rails 3, but the same principle applies to rails 2), rake --tasks takes about a second to run. This is just the time it takes to load all the tasks, as a result any task you define will take at least this amount of time to run, even if it is has nothing to do with rails. Tab completion is slow. That makes me sad.

The issue is that since rails and gems can provide rake tasks for your project, the entire rails environment has to be loaded just to figure out which tasks are available. If you are familiar with the tasks available, you can hack around things to wring some extra speed out of your rake.

WARNING: Hacks abound beyond this point. Proceed at own risk.

Below is my edited Rakefile. Narrative continues in the comments below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Rakefile
def load_rails_environment
  require File.expand_path('../config/application', __FILE__)
  require 'rake'
  Speedtest::Application.load_tasks
end

# By default, do not load the Rails environment. This allows for faster
# loading of all the rake files, so that getting the task list, or kicking
# off a spec run (which loads the environment by itself anyways) is much
# quicker.
if ENV['LOAD_RAILS'] == '1'
  # Bypass these hacks that prevent the Rails environment loading, so that the
  # original descriptions and tasks can be seen, or to see other rake tasks provided
  # by gems.
  load_rails_environment
else
  # Create a stub task for all Rails provided tasks that will load the Rails
  # environment, which in will append the real definition of the task to
  # the end of the stub task, so it will be run directly afterwards.
  #
  # Refresh this list with:
  # LOAD_RAILS=1 rake -T | ruby -ne 'puts $_.split(/\s+/)[1]' | tail -n+2 | xargs
  %w(
    about db:create db:drop db:fixtures:load db:migrate db:migrate:status 
    db:rollback db:schema:dump db:schema:load db:seed db:setup 
    db:structure:dump db:version doc:app log:clear middleware notes 
    notes:custom rails:template rails:update routes secret stats test 
    test:recent test:uncommitted time:zones:all tmp:clear tmp:create
  ).each do |task_name|
    task task_name do
      load_rails_environment
      # Explicitly invoke the rails environment task so that all configuration
      # gets loaded before the actual task (appended on to this one) runs.
      Rake::Task['environment'].invoke
    end
  end

  # Create an empty task that will show up in rake -T, instructing how to
  # get a list of all the actual tasks. This isn't necessary but is a courtesy
  # to your future self.
  desc "!!! Default rails tasks are hidden, run with LOAD_RAILS=1 to reveal."
  task :rails
end

# Load all tasks defined in lib/tasks/*.rake
Dir[File.expand_path("../lib/tasks/", __FILE__) + '/*.rake'].each do |file|
  load file
end

Now rake --tasks executes near instantaneously, and tasks will generally kick off faster (including rake spec). Much nicer!

This technique has the added benefit of hiding all the built in tasks. Depending on your experience this may not be a win, but since I already know the rails ones by heart, I’m usually only interested in the tasks specific to the project.

I don’t pretend this is a pretty or permanent solution, but I share it here because it has made my life better in recent times.

Storing build time in git notes with zsh

Playing around with git notes, having seen them on the github blog. I needed to update to git 1.7.2 (homebrew has it). The following shell command stores the run time of your specs inside a note on the latest commit:

1
{time rake spec} 2> >(tail -n 1 | cut -f 10 -d ' ' - |  git notes --ref=buildtime add -F - -f )

Breaking down the tricky bits:

{time rake spec} Honestly, I cargo culted the curly braces, and can’t find a good description of exactly what they do in this instance. It’s some sort of grouping thing: I found without them time didn’t apply properly.

2> time prints its output to STDERR, 2> redirects STDERR to the next argument. It is kind of like |, but for STDERR rather than STDOUT.

1
{time sleep 0.1} 2> /tmp/time.log

>( ... ) Rather than redirecting STDERR to a file, this allows us to pipe it in to more commands.

tail -n 1 rake spec also prints to STDERR, so pipe through tail to grab only the last line (which will be from time)

cut -f 10 -d ' ' - Split the line on a space character, choose the tenth column of the output from time, which is the total time taken. The trailing - says “read from STDIN”.

git notes --ref=buildtime add -F - -f Add a note to the latest commit (HEAD is default) in the buildtime namespace. -F - reads the note content from STDIN, which by now is only the final time taken for the spec run, and -f forces an update of the note if it already exists.

  • Posted on September 06, 2010
  • Tagged code, git, zsh

Duplicate Data

UPDATE: If you are on PostgreSQL, check this updated query, it’s more useful.

Forgotten to back validates_uniqueness_of with a unique constraint in your database? Oh no! Here is some SQL that will pull out all the duplicate records for you.

1
2
3
4
5
6
7
8
9
User.find_by_sql <<-EOS
  SELECT * 
  FROM users 
  WHERE name IN (
    SELECT name 
    FROM users 
    GROUP BY name 
    HAVING count(name) > 1);
EOS

You will need your own strategy for resolving the duplicates, since it is totally dependent on your data. Some ideas:

  • Arbitrarily deleting one of the records. Perhaps based on latest update time? Don’t forget about child records! If you have forgotten a uniqueness constraint it is likely you have also forgotten a foreign key, so you will have to delete child records manually.
  • Merge the records, including child records.
  • Manually resolving the conflicts on a case by case basis. Possible if there are not too many duplicates.

STI is the global variable of data modelling

A Single Table Inheritance table is really easy to both update and query. This makes it ideal for rapid prototyping: just throw some extra columns on it and you are good to go! This is why STI is so popular, and it fits perfectly into the Rails philosophy of getting things up and running fast.

Fast coding techniques do not always transfer into solid, maintainable code however. It is really easy to hack something together with global variables, but we eschew them when writing industry code. STI falls into the same category. I have written about the downsides of STI before: it clutters your data model, weakens your data integrity, and can be difficult to index. STI is a fast technique to get started with, but is not necessarily a great option for maintainable applications, especially when there are other modelling techniques such as class table inheritance available.

Updating Class Table Inheritance Tables

My last post covered querying class table inheritance tables; this one presents a method for updating them. Having set up our ActiveRecord models using composition, we can use a standard rails method accepts_nested_attributes_for to allow easy one-form updating of the relationship.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Item < ActiveRecord::Base
  validates_numericality_of :quantity

  SUBCLASSES = [:dvd, :car]
  SUBCLASSES.each do |class_name|
    has_one class_name
  end

  accepts_nested_attributes_for *SUBCLASSES
end

@item = Dvd.create!(
  :title => 'The Matix',
  :item  => Item.create!(:quantity => 1))

@item.update_attributes(
  :quantity => 2,
  :dvd_attributes => {
    :id    => @item.dvd.id,
    :title => 'The Matrix'})

This issues the following SQL to the database:

1
2
UPDATE "items" SET "quantity" = 10 WHERE ("items"."id" = 12)
UPDATE "dvds" SET "title" = 'The Matrix' WHERE ("dvds"."id" = 12)

Note that depending on your application, you may need some extra locking to ensure this method is concurrent, for example if you allow items to change type. Be sure to read the accepts_nested_attributes_for documentation for the full API.

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

Class Table Inheritance and Eager Loading

Consider a typical class table inheritance table structure with items as the base class and dvds and cars as two subclasses. In addition to what is strictly required, items also has an item_type parameter. This denormalization is usually a good idea, I will save the justification for another post so please take it for granted for now.

The easiest way to map this relationship with Rails and ActiveRecord is to use composition, rather than trying to hook into the class loading code. Something akin to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Item < ActiveRecord::Base
  SUBCLASSES = [:dvd, :car]
  SUBCLASSES.each do |class_name|
    has_one class_name
  end

  def description
    send(item_type).description
  end
end

class Dvd < ActiveRecord::Base
  belongs_to :item

  validates_presence_of :title, :running_time
  validates_numericality_of :running_time

  def description
    title
  end
end

class Car < ActiveRecord::Base
  belongs_to :item

  validates_presence_of :make, :registration

  def description
    make
  end
end

A naive way to fetch all the items might look like this:

1
Item.all(:include => Item::SUBCLASSES)

This will issue one initial query, then one for each subclass. (Since Rails 2.1, eager loading is done like this rather than joining.) This is inefficient, since at the point we preload the associations we already know which subclass tables we should be querying. There is no need to query all of them. A better way is to hook into the Rails eager loading ourselves to ensure that only the tables required are loaded:

1
2
3
Item.all(opts).tap do |items|
  preload_associations(items, items.map(&:item_type).uniq)
end

Wrapping that up in a class method on items is neat because we can then use it as a kicker at the end of named scopes or associations – person.items.preloaded, for instance.

Here are some tests demonstrating this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
require 'test/test_helper'

class PersonTest < ActiveRecord::TestCase
  setup do
    item = Item.create!(:item_type => 'dvd')
    dvd  = Dvd.create!(:item => item, :title => 'Food Inc.')
  end

  test 'naive eager load' do
    items = []
    assert_queries(3) { items = Item.all(:include => Item::SUBCLASSES) }
    assert_equal 1, items.size
    assert_queries(0) { items.map(&:description) }
  end

  test 'smart eager load' do
    items = []
    assert_queries(2) { items = Item.preloaded }
    assert_equal 1, items.size
    assert_queries(0) { items.map(&:description) }
  end
end

# Monkey patch stolen from activerecord/test/cases/helper.rb
ActiveRecord::Base.connection.class.class_eval do
  IGNORED_SQL = [/^PRAGMA/, /^SELECT currval/, /^SELECT CAST/, /^SELECT @@IDENTITY/, /^SELECT @@ROWCOUNT/, /^SAVEPOINT/, /^ROLLBACK TO SAVEPOINT/, /^RELEASE SAVEPOINT/, /SHOW FIELDS/]

  def execute_with_query_record(sql, name = nil, &block)
    $queries_executed ||= []
    $queries_executed << sql unless IGNORED_SQL.any? { |r| sql =~ r }
    execute_without_query_record(sql, name, &block)
  end

  alias_method_chain :execute, :query_record
end

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

Last minute training in Seattle

If you or someone you know missed out on Saturday, I’ve scheduled a last minute database training for Seattle tomorrow. Register here. Last chance before I head to Chicago for a training on Friday.

Constraints assist understanding

The hardest thing for a new developer on a project to wrap his head around is not the code. For the most part, ruby code stays the same across projects. My controllers look like your controllers, my models look like your models. What defines an application is not the code, but the domain. The business concepts, and how they are translated into code, can take weeks or months to understand cleanly. Modelling your domain in a way that it is easily understood is an important principle to speed up this learning process.

In an application I am looking at there is an email field in the user model. It is defined as a string that allows null values. This is confusing. I need to figure in what circumstances a null value makes sense (can they choose to withhold that piece of information? Is there a case where a new column I am adding should be null?), which is extra information I need to locate and process before I can understand the code. There is a validates_presence_of declaration on the attribute, but production data has some null values. Two parts of the application are telling me two contradicting stories about the domain.

Further, when I am tracking down a bug in the application, eliminating the possibility that a column could be null is an extra step I need to take. The data model is harder to reason about because there are more possible states than strictly necessary.

Allowing a null value in a column creates another piece of information that a developer has to process. It creates an extra question that needs to be answered when reading the code: in what circumstances is a null value appropriate? Multiply this problem out to multiple columns (and factor in other sub-optimal modeling techniques not covered here), and the time to understanding quickly grows out of hand.

Adding not-null constraints on your database is a quick and cheap way to bring your data model inline with the code that sits on top of it. In addition to cutting lines of code, cut out extraneous information from your data model. For little cost, constraints simplify your application conceptually and allow your data to be reasoned about more efficiently.

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

Concurrency with AASM, Isolation Levels

I’ve posted two guest articles over on the Engine Yard blog this week on database related topics:

They’re in the same vein as what I’ve been posting here, so worth a read if you’ve been digging it.
The US tour kicks off this Saturday in San Francisco, and there’s still a couple of spots available. You can still register over at www.dbisyourfriend.com

“Your Database Is Your Friend” training sessions are happening throughout the US and UK in the coming months. One is likely coming to a city near you. For more information and free screencasts, head on over to www.dbisyourfriend.com

Relational Or NoSQL With Rails?

With all the excitement in the Rails world about “NoSQL” databases like MongoDB, CouchDB, Cassandra and Redis, I am often asked why am I running a course on relational databases?

The “database is your friend” ethos is not about relational databases; it’s about finding the sweet spot compromise between the tools you have available to you. Typically the database has been underused in Rails applications—to the detriment of both quality and velocity—and my goal is to provide tools and understanding to ameliorate this neglect, no matter whether you are using Oracle or Redis.

The differences between relational and NoSQL databases have been documented extensively. To quickly summarize the stereotypes: relational gives you solid transactions and joins, NoSQL is fast and scales. In addition, the document oriented NoSQL databases (NoSQL is a bit of a catch-all: there’s a big difference between key/value stores and document databases) enable you to store “rich” documents, a powerful modelling tool.

That’s a naive summary, but gives you a general idea of the ideologies. To make a fair comparison between the two you need to understand both camps. If you don’t know what a relational database can do for you in terms of transactional support or data integrity, you will not know what your are losing when choosing NoSQL. Conversely, if you are not familiar with document modelling techniques and why denormalization isn’t so scary, you are going to underrate NoSQL technologies and handicap yourself with a relational database.

For example, representing a many-to-many relationship in a relational database might look something like:

1
2
3
Posts(id, title, body)
PostTags(post_id, tag_id)
Tags(id, name)

This is a standard normalization, and relational databases are tuned to deal with this scenario using joins and foreign keys. In a document database, the typical way to represent this is:

1
2
3
4
5
{
  title: 'My Post',
  body: 'This post has a body',
  tags: ['ruby', 'rails']
}

Notice the denormalization of tags so that there is no longer a table for it, creating a very nice conceptual model—everything to do with a post is included in the one object. The developer only superficially familiar with document modelling will quickly find criticisms, however. To choose just one, how do you get a list of all tags? This specific problem has been addressed by the document crowd, but not in a way that relational developers are used to thinking: map/reduce.

1
2
3
4
5
6
7
8
9
10
db.runCommand({
  mapreduce: 'posts',
  map: function() { 
    for (index in this.tags) {
        emit(this.tags[index], 1);
    }
  },
  reduce: function(key, values) { return; },
  out: "tags"
})

This function can be run periodically to create a tags collection from the posts collection. It’s not quite real-time, but will be close enough for most uses. (Of course if you do want real-time, there are other techniques you can use.) Yes, the query is more complicated than just selecting out of Tags, but inserting and updating an individual post (the main use case) is simpler.

I’m not arguing one side or another here. This is just one simplistic example to illustrate my point that if you don’t know how to use document database specific features such as map/reduce, or how to model your data in such a way as to take advantage of them, you won’t be able to adequately evaluate those databases. Similarly, if you don’t know how to use pessimistic locking or referential integrity in a relational database, you will not see how much time and effort it could be saving you over trying to implement such robustness in a NoSQL database that wasn’t designed for it.

It is imperative that no matter which technology you ultimately choose for your application (or even if you mix the two!), that you understand both sides thoroughly so that you can accurately weigh up the costs and benefits of each.

The pitch

This is why I’m excited to announce a brand new training session on MongoDB. For the upcoming US tour, this session will be only be offered once exclusively at the Lone Star Ruby Conference. The original relational training is the day before the conference (at the same venue), to create a two day database training bonanza: relational on Wednesday 25th August, MongoDB on Thursday 26th.

We’ll be adding MongoDB to an existing site—Spacebook, the social network for astronauts!—to not only learn MongoDB in isolation, but practically how to integrate it into your existing infrastructure. The day starts with the basics: What it is, what it isn’t, how to use it, how to integrate with Rails, and we’ll build and investigate some of the typical MongoDB use cases like analytics tracking. As we become comfortable, we will move into some more advanced querying and data modelling techniques that MongoDB excels at to ensure we are getting the most out of the technology, and discuss when such techniques are appropriate.

Since I am offering the MongoDB training in association with the Lone Star Ruby Conference, you will have to register for the conference to attend. At only an extra $175 above the conference ticket, half price of the normal cost, the Lone Star Ruby Conference MongoDB session is the cheapest this training will ever be offered, not to mention all the win of the rest of the conference! Aside from the training, it has a killer two-day line up of talks which are going to be awesome. I’m especially excited about the two keynotes by Tom Preson-Werner and Blake Mizerany, and there’s some good database related talks to get along to: Adam Keys is giving the low down on the new ActiveModel in rails 3, Jesse Wolgamott is comparing different NoSQL technologies, and Bernerd Schaefer will be talking about what Mongoid (the ORM we’ll be using with Spacebook) is doing to stay at the head of the pack. I’ll certainly be hanging around.

Register for the relational training separately. There’s a $50 early bird discount for the next week (in addition to the half price Mongo training), but if you miss that and are attending both sessions get in touch and I’ll extend the offer for you. This is probably going to send me broke, but I really just want to get this information out there. Cheaper, higher quality software makes our industry better for everyone.

“Your Database Is Your Friend” training sessions are happening throughout the US and UK in the coming months. One is likely coming to a city near you. For more information and free screencasts, head on over to www.dbisyourfriend.com

Five Tips For Adding Foreign Keys To Existing Apps

You’re convinced foreign keys are a good idea, but how should you retroactively add them to your production application? Here are some tips to help you out.

Identify and fix orphan records. If orphan records exist, creating a foreign key will fail. Use the following SQL to identify children that reference a parent that doesn’t exist:

1
SELECT * FROM children LEFT JOIN parents ON parent_id = parents.id WHERE parents.id IS NULL

Begin with new or unimportant relationships. With any new change, it’s best to walk before you run. Targeting the most important relationships in your application head on can quickly turn into a black hole. Adding foreign keys to new or low value relationships first means you have a smaller code base that is affected, and allows you to test your test suite and plugins for compatibility over a smaller area. Get this running in production early, so any issues will crop up early on low value code where they’ll be easier to fix. Be agile in your approach and iterate.

Move away from fixtures and mocking in your tests. Rails fixture code is not designed to work well with foreign keys. (Fixtures are generally not a good idea regardless.) Also, the intense stubbing of models that was in vogue back when rspec first came on the scene doesn’t play nice either. The current best practice is to use object factories (such as Machinist) to create your test data, and this works well with foreign keys.

Use restrict rather than cascade for ON DELETE. You still want to keep on_destroy logic in your models, so even if conceptually a cascading delete makes sense, implement it using the :dependent => :destroy option to has_many, with a restrict option at the database level to ensure all cascading deletes run through your callbacks.

Be pragmatic. Ideally every relationship will have a foreign key, but for that model filled with weird hacks and supported by a massive old school test suite, it may be just too much effort to get everything working smoothly with database constraints. In this case, set up a test suite that runs over your production data regularly to quickly identify any data problems that arise (see the SQL above).

Foreign keys give you confidence and piece of mind about your data and your application. Rails may be afraid of them, but that doesn’t mean you have to be.

July through September I am running full day training sessions in the US and UK on how to make use of your database and write solid Rails code, increasing your quality without compromising your velocity. Chances are I’m coming to your city, so check it out at http://www.dbisyourfriend.com

acts_as_state_machine is not concurrent

Here is a short 4 minute screencast in which I show you how the acts as state machine (AASM) gem fails in a concurrent environment, and also how to fix it.


(If embedding doesn’t work or the text is too small to read, you can grab a high resolution version direct from Vimeo)

It’s a pretty safe bet that you want to obtain a lock before all state transitions, so you can use a bit of method aliasing to do just that. This gives you much neater code than the quick fix I show in the screencast, just make sure you understand what it is doing!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class ActiveRecord::Base
  def self.obtain_lock_before_transitions
    AASM::StateMachine[self].events.keys.each do |t|
      define_method("#{t}_with_lock!") do
        transaction do
          lock!
          send("#{t}_without_lock!")
        end
      end
      alias_method_chain "#{t}!", :lock
    end
  end
end

class Tractor
  # ...

  aasm_event :buy do
    transitions :to => :bought, :from => [:for_sale]
  end

  obtain_lock_before_transitions
end

This is a small taste of my DB is your friend training course, that helps you build solid rails applications by finding the sweet spot between stored procedures and treating your database as a hash. July through September I am running full day sessions in the US and UK. Chances are I’m coming to your city. Check it out at http://www.dbisyourfriend.com

Ultimate NYTimes jQuery Slidebox

The New York Times has a pretty fancy box that slides out when you hit the bottom of an article. It draws attention without being too distracting. Very nice. Here’s how you can do it yourself with all the trendiest bells and whistles, CSS animation (with backup jQuery for crippled browsers), and google analytics tracking. See it in the wild over at my other blog TwoShay, or jump straight to the demo to grab the code.

To start with, some basic skeleton code. I’m using new HTML5 selectors, you can just use divs if you’re not that cool.

1
2
3
4
5
6
7
8
9
10
11
<section id='slidebox'>
  <a name='close'></a>
  <h1>Related Reading</h1>
  <div class='related'>
    <h2>Sense and Sensibility</h2>
    <p class='desc'>
      Another book by Jane Austen you will enjoy
      <a href='#' rel='related' class='more'>Read »</a> 
    </p>
  </div>
</section>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* Just the important styles - see the demo source for a fuller account */
#slidebox {
  position:fixed;
  width:400px;
  right: -430px;
  bottom:20px;

  -webkit-transition: right 100ms linear;
}

#slidebox.open { 
  right: 0px; 
  -webkit-transition: right 300ms linear;
}

This sets up an absolutely positioned box, hidden off to the right of screen. Adding a class of open to the box using jQuery will trigger a 300ms CSS animation to slide the box in, nice and smooth. The correct time to do this is when the user scrolls to the last bit of content on the page. What this content is will be dependent on your site, but whatever it is flag it with an id of #last. The following javascript is all we need:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(function ($) {
  /* Add a function to jQuery to slidebox any elements */
  jQuery.fn.slidebox = function() {
    var slidebox = this;
    var originalPosition = slidebox.css('right');
    var boxAnimations = {
      open:  function() { slidebox.addClass('open'); },
      close: function() { slidebox.removeClass('open'); },
    }

    $(window).scroll(function() {
      var distanceTop = $('#last').offset().top - $(window).height();

      if ($(window).scrollTop() > distanceTop) {
        boxAnimations.open();
      } else {
        boxAnimations.close();
      }
    });
  }

  $(function() { /* onload */
    $('#slidebox').slidebox();
  });
});

That’s it! Everything from here on is gravy.

To deal with browsers that don’t support CSS animations yet, provide a fallback that uses jQuery animation using Modernizr to detect the browser’s capabilities:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* replacing the boxAnimations definition above */
var boxAnimations;
if (Modernizr.cssanimations) {
  boxAnimations = {
    open:  function() { slidebox.addClass('open'); },
    close: function() { slidebox.removeClass('open'); },
  }
} else {
  boxAnimations = {
    open: function() {
      slidebox.animate({
        'right': '0px'
      }, 300);
    },
    close: function() {
      slidebox.stop(true).animate({
        'right': originalPosition
      }, 100);
    }
  }
}

A close button is polite, allowing the user to dismiss the slidebox if they are not interested:

1
2
3
slidebox.find('.close').click(function() {
  $(this).parent().remove();
});

And finally, no point adding all this shiny without knowing whether people are using it! Google analytics allows us to track custom javascript events, which is a perfect tool for gaining an insight into how the slidebox is performing. It’s easy to use: simply push a _trackEvent method call to the _gaq variable (defined in the analytics snippet you copy and paste into your layout) and google takes care of the rest. Observe the full javascript code, with tracking added:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
(function ($) {
  jQuery.fn.slidebox = function() {
    var slidebox = this;
    var originalPosition = slidebox.css('right');
    var open = false;

    /* GA tracking */
    var track = function(label) {
      return _gaq.push(['_trackEvent', 'Slidebox', label]);
    }

    var boxAnimations;
    if (Modernizr.cssanimations) {
      boxAnimations = {
        open:  function() { slidebox.addClass('open'); },
        close: function() { slidebox.removeClass('open'); },
      }
    } else {
      boxAnimations = {
        open: function() {
          slidebox.animate({
            'right': '0px'
          }, 300);
        },
        close: function() {
          slidebox.stop(true).animate({
            'right': originalPosition
          }, 100);
        }
      }
    }

    $(window).scroll(function() {
      var distanceTop = $('#last').offset().top - $(window).height();

      if ($(window).scrollTop() > distanceTop) {
        /* Extra protection necessary so we don't send multiple open events to GA */
        if (!open) {
          open = true;
          boxAnimations.open();
          track("Open");
        }
      } else {
        open = false;
        boxAnimations.close();
      }
    });

    slidebox.find('.close').click(function() {
      $(this).parent().remove();
      track("Close");
    });
    slidebox.find('.related a').click(function() {
      track("Read More");
    });
  }

  $(function() {
    $('#slidebox').slidebox();
  });
})(jQuery);

/* Google analytics code provides this variable */
var _gaq = _gaq || [];

Tasty. For the entire code and complete styles, see the demo page.

Kudos to http://tympanus.net for getting the ball rolling.

Three Reasons Why You Shouldn't Use Single Table Inheritance

It creates a cluttered data model. Why don’t we just have one table called objects and store everything as STI? STI tables have a tendency to grow and expand as an application develops, and become intimidating and unweildy as it isn’t clear which columns belong to which models.

It forces you to use nullable columns. A comic book must have an illustrator, but regular books don’t have an illustrator. Subclassing Book with Comic using STI forces you to allow illustrator to be null at the database level (for books that aren’t comics), and pushes your data integrity up into the application layer, which is not ideal.

It prevents you from efficiently indexing your data. Every index has to reference the type column, and you end up with indexes that are only relevant for a certain type.

The only time STI is the right answer is when you have models with exactly the same data, but different behaviour. You don’t compromise your data model, and everything stays neat and tidy. I have yet to see a case in the wild where this rule holds, though.

If you are using STI (or inheritance in general) to share code, you’re doing it wrong. Having many tables does not conflict with the Don’t-Repeat-Yourself principle. Ruby has modules, use them. (I once had a project where a 20 line hash drove the creation of migrations, models, data loaders and test blueprints.)

What you should be doing is using Class Table Inheritance. Rails doesn’t “support it natively”, but that doesn’t particularly mean much since it’s a simple pattern to implement yourself, especially if you take advantage of named scopes and delegators. Your data model will be much easier to work with, easier to understand, and more performant.

I expand on this topic and guide you through a sample implementation in my DB is your friend training course. July through September I am running full day sessions in the US and UK. Chances are I’m coming to your city. Check it out at http://www.dbisyourfriend.com

Debugging Deadlocks In Rails

Here is an 13 minute long screencast in which I show you how to go about tracking down a deadlock in a ruby on rails application. I make two assumptions:

  1. You are using MySQL
  2. You know the difference between shared and exclusive locks (in short: a shared lock allows other transactions to read the row, an exclusive blocks out everyone)


(If embedding doesn’t work or the text is too small to read, you can grab a high resolution version direct from Vimeo)

This is only one specific example of a deadlock, in reality there are many ways this can occur. The process for tracking them down is always the same though. If you get stuck, read through the innodb documentation again. Something normally jumps out. If you are not sure what ruby code is generating what SQL, the query trace plugin is excellent. It gives you a stack trace for every single SQL statement ActiveRecord generates.

This is a small taste of the type of thing I cover in my DB is your friend training course. July through September I am running full day sessions in the US and UK. Chances are I’m coming to your city. Check it out at http://www.dbisyourfriend.com

acts_as_list will break in production

acts_as_list doesn’t work in a typical production deployment. It pretends to for a while, but every application will eventually have issues with it that result in real problems for your users. Here is a short 4 minute long screencast showing you how it breaks, and also a quick fix which will prevent your data from becoming corrupted.

(View it over at Vimeo if embedding doesn’t work for you)

Here is the “quick fix” I apply in the screencast. It’s ugly, but it will work.

1
2
3
4
5
6
7
8
def move_down
  Tractor.transaction do
    Tractor.connection.execute("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE")
    @tractor = Tractor.find(params[:id])
    @tractor.move_to_bottom
  end
  redirect_to(tractors_path)
end

Some things to note when fixing your application in a nicer way:

  1. This is not MySQL specific, all databases will exhibit this behaviour.
  2. The isolation level needs to be set as the first statement in the transaction (or globally, but you don’t want serializable globally!)
  3. For bonus points, add a unique index to the position column, though you’ll have to re-implement most of acts_as_list to make it work.
  4. It’s possible to do this under read committed, but it’s pretty complicated and optimised for concurrent access rather than individual performance.
  5. Obtaining a row lock before moving will fix this specific issue, but won’t address all the edge cases.

_This is a small taste of the type of thing I cover in my DB is your friend training course. July through September I am running full day sessions in the US and UK. Chances are I’m coming to your city. Check it out at http://www.dbisyourfriend.com _

Rails DB Training - US/UK Tour

You may not know, but I run an advanced rails training session titled “Your Database Is Your Friend”. Previously, I have only done this in Australia. Starting late July, I will be running this session throughout the United States and the United Kingdom. I’m still planning dates and venues, if you or someone you know is interested in hosting a session, please get in touch.

For details, see the DB is your friend rails training website.

Nanoc3 with Rack::StaticCache

There is a neat piece of middleware introduced in rack-contrib 0.9.3 called Rack::StaticCache. It allows you to version your static assets (images, css) so that you can set infinite expires headers on them. All you need is a version number trailing your file name, and it is routed through to the underlying file. Whenever you change the file, you change the version.

1
2
/img/lolcat-1.jpg -> /img/lolcat.jpg
/img/lolcat-2.jpg -> /img/lolcat.jpg

The URLs go to the same place, but since they are different you can cache them indefinitely and change all the referencing URLs in your code when you change the asset. That’s annoying if you’re trying to do it by hand, but that’s why we have code eh. I wrote a nanoc3 after filter that parses the HTML using nokogiri, and replaces any reference to any image or stylesheet with a reference versioned using the last modified timestamp of that asset. It automatically updates! This is particularly neat because you can link in images in markdown without ever worrying about versioning.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# lib/static_cache_filter.rb
require 'nokogiri'

class StaticCacheFilter < Nanoc3::Filter
  identifier :static_cache

  def run(content, params = {})
    doc = Nokogiri::HTML::Document.parse(content)
    add_version = lambda {|attr| lambda {|x|
      src = x[attr]
      item = @items.detect {|y| y.identifier == "#{src.gsub(/\..+$/, '')}/" }
      if item
        version = item.mtime.to_i
        tokens = src.split('.')
        src = tokens[0] + "-#{version}." + tokens[1..-1].join('.')
        x[attr] = src
      end
    }}
    doc.css('img'                 ).each(&add_version['src'])
    doc.css('link[rel=stylesheet]').each(&add_version['href'])
    doc.to_html
  end
end
1
2
3
4
5
6
# Rules
compile '/' do
  filter :haml
  layout 'home'
  filter :static_cache
end
1
2
3
# config.ru
use Rack::StaticCache, :urls => ['/img','/css'], :root => "public"
run Rack::Directory.new("public")

Nanoc3 and CoffeeScript

Nanoc3 is a pretty awesome static site generator. It works by running your content through “filters” to create the final static site. It comes with a lot of built in filters – Haml, Sass, rubypants, markdown, and more! Nothing for Javascript though. Which is sad because I really like CoffeeScript. It’s ok! I wrote my own filter, shared here for your enjoyment.

Bang this in your lib folder:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
require 'open3'
require 'win32/open3' if RUBY_PLATFORM.match /win32/

class CoffeeFilter < Nanoc3::Filter
  identifier :coffee

  def run(content, params = {})
    output = ''
    error = ''
    command = 'coffee -s -p -l'
    Open3.popen3(command) do |stdin, stdout, stderr|
      stdin.puts content
      stdin.close
      output = stdout.read.strip
      error = stderr.read.strip
      [stdout, stderr].each { |io| io.close }
    end

    if error.length > 0
      raise("Compilation error:\n#{error}")
    else
      output
    end
  end
end

To use it, a compilation rule like the following is pretty neat:

1
2
3
4
5
6
7
8
9
# Compile both coffee and js, co-mingled in the same directory
compile '/js/*' do
  case item[:extension]
    when 'coffee'
      filter :coffee
    when 'js'
      # Nothing
  end
end

Don’t forget to add ‘coffee’ to the list of text extensions in your config.yaml!

Protip: You can use the above pattern to filter content through any command line program. Figlet anyone?

Ruby debugging with puts, tap and Hirb

I use puts heaps when debugging. Combined with tap, it’s pretty handy. You can jump right in the middle of a method chain without having to move things around into variables.

1
x = long.chain.of.methods.tap {|x| puts x }.to.do.something.with

I thought hey why don’t I merge the two? And for bonus points, add in Hirb’s table display to format my models nicely. These are fairly personal customizations, and aren’t specific to a project, so I put them in my own ~/.railsrc file rather than each project.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# config/initializers/developer_specific_customizations.rb
if %w(development test).include?(Rails.env)
  railsrc = "#{ENV['HOME']}/.railsrc"
  load(railsrc) if File.exist?(railsrc)
end

# ~/.railsrc
require 'hirb'

Hirb.enable :pager => false

class Object
  def tapp(prefix = nil, &block)
    block ||= lambda {|x| x }

    tap do |x|
      value = block[x]
      value = Hirb::View.formatter.format_output(value) || value.inspect

      if prefix
        print prefix
        if value.lines.count > 1
          print ":\n"
        else
          print ": "
        end
      end
      puts value
    end
  end
end

# Usage (in your spec files, perhaps?)
"hello".tapp           # => hello
"hello".tapp('a')      # => a - "hello
"hello".tapp(&:length) # => 5
MyModel.first.tapp # =>
#  +----+-------------------------+
#  | id | created_at              |
#  +----+-------------------------+
#  | 7  | 2009-12-29 00:15:56 UTC |
#  +----+-------------------------+
#  1 row in set

Full stack testing rack applications

Herein is described a method for full stack testing CloudKit apps. The same techniques could easily be applied to other rack web application or framework, which is pretty much all the ruby ones these days (rails, sinatra, pancake, etc…) This method is ideal for non-html services. For HTML you’re probably better off just using webrat/selenium.

There are two external services that make up our stack:

  • CloudKit application
  • OpenID server

Both of these are rack applications, so we can start them up using the same method in our spec helper.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
require 'spec'
require 'pathname'
require Pathname(__FILE__).dirname + 'support/application_server'
require Pathname(__FILE__).dirname + 'support/tcp_socket'

TEST_PORTS = {
  :app    => 9293,
  :openid => 9294
}

$servers = nil
Spec::Runner.configure do |config|
  config.before(:all) do
    $servers ||= Support::ApplicationServer.multi_boot(
      {
        :config    => File.expand_path(Dir.pwd + '/config.ru'),
        :port      => TEST_PORTS[:app],
        :daemonize => true
      },
      {
        :config    => File.expand_path(Dir.pwd + '/spec/support/rack_my_id.rb'),
        :port      => TEST_PORTS[:openid],
        :daemonize => true
      }
    )
  end
end

You need some support files – the first two are based heavily on code from webrat, the latter is a dead simple OpenID server that I wrote specifically for testing:

A global variable is required here, since before(:all) in rspec runs once per describe block, rather than once per test run. An at_exit hook is used to shutdown the services after the test run.

You need a way of resetting your data between test runs. The default CloudKit::MemoryTable does not provide a mechanism for this – any deleted resource will exist in the version history of that resource (and will respond with a 410 rather than 404). By subclassing MemoryTable, we can provide a purge method that does what we need:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# A custom storage adapter that allows a total purge of a collection
# This is handy in test mode to clear out data between specs
class PurgeableTable < CloudKit::MemoryTable
  # Remove all resources in a collection.
  # Unlike a normal delete, which versions the resource (and sets up a 410 response),
  # this method removes all trace of the resource (it will 404).
  #
  # Example:
  #   CloudKit.setup_storage_adapter(adapter = PurgeableTable.new)
  #   adapter.purge('/items')
  def purge(collection)
    query {|q|
      q.add_condition('collection_reference', :eql, collection)
    }.each do |item|
      @hash.delete(@keys.delete(item[:pk]))
    end
  end
end

Since we’ll be testing the CloudKit app from a separate process, we also need a way of triggering a purge. An easy way is some custom rack middleware that provides a URL we can hit to reset the app. Clearly, we only want to enable this in test mode.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class ResetApp
  def initialize(app, options = {})
    @app = app
    @options = options
  end

  def call(env)
    request = Rack::Request.new(env)
    if request.path == '/test_reset' && request.request_method == 'POST'
      @options[:adapter].purge('/items')
      return Rack::Response.new([], 200).finish
    else
      @app.call(env)
    end
  end
end
1
2
3
4
5
6
# config.ru
CloudKit.setup_storage_adapter(adapter = PurgeableTable.new)

if ENV["RACK_ENV"] == 'test'
  use ResetApp, :adapter => adapter
end

Now all the infrastructure is set up, we can test the CloudKit app using familiar ruby HTTP libraries:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
require 'httparty'
require 'mechanize'
require 'json'
require 'oauth'

describe 'OAuth + OpendID' do
  include HTTParty
  base_uri "localhost:#{TEST_PORTS[:app]}"

  before(:each) do
    HTTParty.post("/test_reset").code.should == 200
  end

  specify 'Registering for an oauth token' do
    @consumer = OAuth::Consumer.new('cloudkitconsumer','',
      :site               => "http://localhost:#{TEST_PORTS[:app]}",
      :authorize_path     => "/oauth/authorization",
      :access_token_path  => "/oauth/access_tokens",
      :request_token_path => "/oauth/request_tokens"
    )
    @request_token = @consumer.get_request_token

    agent = WWW::Mechanize.new
    page = agent.get(@request_token.authorize_url)
    login_form = page.forms.first
    login_form.field_with(:name => "openid_url").value = "localhost:#{TEST_PORTS[:openid]}"
    page = agent.submit(login_form)

    oauth_form = page.forms.first
    page = agent.submit(oauth_form, oauth_form.button_with(:value => "Approve"))

    # Get access token
    @access_token = @request_token.get_access_token

    # Update an item
    result = @access_token.put("/items/12345", {:name => "Hello"}.to_json)
    result.code.should == "201"
  end
end

There’s a lot of code and not much supporting text here. I’m hoping it all just clicks together pretty easy. Hit me up with any questions.

BacktraceCleaner and gems in rails

UPDATE: Fixed the monkey-patch to match the latest version of the patch, and to explicitly require Rails::BacktraceCleaner before patching it to make sure it has been loaded

If there’s one thing my mother taught me, if you’re going to clean something up you may as well do it properly. Be thorough, cover every surface.

Rails::BacktraceCleaner is a bit sloppy when it comes to gem directories. It misses all sorts of dust – hyphens, underscores, upper case letters, numbers. That’s not going to earn any pocket money. Let’s teach it a lesson.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# config/initializers/this_is_what_a_gem_looks_like.rb
require 'rails/backtrace_cleaner'

module Rails
  class BacktraceCleaner < ActiveSupport::BacktraceCleaner
    private
      GEM_REGEX = "([A-Za-z0-9_-]+)-([0-9.]+)"

      def add_gem_filters
        Gem.path.each do |path|
          # http://gist.github.com/30430
          add_filter { |line| line.sub(/(#{path})\/gems\/#{GEM_REGEX}\/(.*)/, '\2 (\3) \4')}
        end

        vendor_gems_path = Rails::GemDependency.unpacked_path.sub("#{RAILS_ROOT}/",'')
        add_filter { |line| line.sub(/(#{vendor_gems_path})\/#{GEM_REGEX}\/(.*)/, '\2 (\3) [v] \4')}
      end
  end
end

I’ve submitted a patch to rails, please review if you like.

Kudos to Matthew Todd for pairing with me on this.

Benchmarks for creating a new array

1
2
3
4
5
6
7
8
9
10
11
require 'benchmark'

n = 1000
m = 50000
blank = [0] * m
Benchmark.bm(7) do |x|
  x.report(".new with block:") { (0..n).collect { Array.new(m) { 0 } }}
  x.report("  .new no block:") { (0..n).collect { Array.new(m, 0) }}
  x.report("        [0] * x:") { (0..n).collect { [0] * m }}
  x.report("           #dup:") { (0..n).collect { blank.dup }}
end
1
2
3
4
5
6
$ ruby19 benchmark.rb 
             user     system      total        real
.new with block: 10.180000   0.210000  10.390000 ( 10.459538)
  .new no block:  3.690000   0.210000   3.900000 (  3.915348)
        [0] * x:  4.280000   0.210000   4.490000 (  4.505334)
           #dup:  0.000000   0.000000   0.000000 (  0.000491)

Know your constructors! What is #dup doing? I think it’s cheating.

Acts_as_state_machine locking

consider the following!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Door < ActiveRecord::Base
  acts_as_state_machine :initial => :closed

  state :closed
  state :open, :enter => :say_hello

  event :open do
    transitions :from => :closed, :to => :open
  end

  def say_hello
    puts "hello"
  end
end

door = Door.create!

fork do
  transaction do
    door.open!
  end
end

door.open!

# >> hello
# >> hello

It’s broken, you can only open a door once. This is a classic double-update problem. One way to solve is with pessimistic locking. I made some codes that automatically lock any object when you call an event on it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class ActiveRecord::Base
  # Forces all state transition events to obtain a DB lock
  def self.obtain_lock_before_all_state_transitions
    event_table.keys.each do |transition|
      define_method("#{transition}_with_lock!") do
        self.class.transaction do
          lock!
          send("#{transition}_without_lock!")
        end
      end
      alias_method_chain "#{transition}!", :lock
    end
  end
end

class Door < ActiveRecord::Base
  # ... as before

  obtain_lock_before_all_state_transitions
end

beware! Your state transitions can now throw ActiveRecord::RecordNotFound errors (from lock!), since the object may have been deleted before you got a chance to play with it.

If you’re not using any locking in your web app, you’re probably doing it wrong. Just sayin’.

Range#include? in ruby 1.9

Range#include? behaviour has changed in ruby 1.9 for non-numeric ranges. Rather than a greater-than/less-than check against the min and max values, the range is iterated over from min until the test value is found (or max). This is necessary to cover some edge cases of ranges which are incorrect in 1.8.7, as demonstrated by the following example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class EvenNumber < Struct.new(:value)
  def <=>(other)
    puts "#{value} <=> #{other.value}"
    value <=> other.value
  end

  def succ
    puts "succ: #{value}"
    EvenNumber.new(value + 2)
  end
end

puts (EvenNumber.new(2)..EvenNumber.new(6)).include?(EvenNumber.new(5))


# 1.8.7
#   2 <=> 6
#   2 <=> 5
#   5 <=> 6
#   true # buggy!
# 1.9.1 
#   2 <=> 6
#   2 <=> 6
#   succ: 2
#   4 <=> 6
#   succ: 4
#   6 <=> 6
#   false # correct!

This makes sense for the conceptual range, but has a performance impact especially on large ranges. #include? has gone from O(1) to O(N). This is most likely to crop up when checking time ranges – Time#succ returns a time one second in the future.

1
2
3
4
5
6
(Time.utc(1999)..Time.utc(2001)).include?(2000) 

# 1.8.7
#   true
# 1.9.1
#   Don't wait for this to finish...

Workarounds

Ruby 1.9 introduces a new method Range#cover? that implements the old include? behaviour, however this method isn’t available in 1.8.7.

1
2
3
4
5
6
7
8
9
puts (EvenNumber.new(2)..EvenNumber.new(6)).cover?(EvenNumber.new(5))

# 1.8.7
#   undefined method `cover?' for #<struct EvenNumber value=2>..#<struct EvenNumber value=6> (NoMethodError)
# 1.9.1
#   2 <=> 6
#   2 <=> 5
#   5 <=> 6
#   true

Another alternative, if it makes sense for your range, is to define the to_int method, which ruby will use to do a straight comparison against your min/max values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class EvenNumber < Struct.new(:value)
  # ... as before

  def to_int
    value
  end
end

puts (EvenNumber.new(2)..EvenNumber.new(6)).include?(EvenNumber.new(5))

# 1.8.6 and 1.9.1
#   2 <=> 6
#   2 <=> 5
#   5 <=> 6
#   true

Personally, I’ve monkey-patched range in 1.8.* to alias cover? to include?. That’s it. May your test suites not appear to hang.

Selenium, webrat and the firefox beta

*UPDATE: * Latest version of webrat (0.5.3, maybe earlier) includes a fixed version of selenium, so you shouldn’t need this hack.

I needed a few hacks to get selenium running with webrat.

First, make sure you are running at least 0.4.4 of webrat. Don’t make the same mistake I did and upgrade your gem version, but not the plugin installed in vendor/plugins.

1
2
3
gem install webrat
gem install selenium-client
gem install bmabey-database_cleaner --source=http://gems.github.com

There is a trick to get Firefox 3.5 beta working. The selenium server package with webrat 0.4.4 only supports FF 3.0.*. Follow these instructions, patching the jar that is packaged with webrat (vendor/selenium-server.jar) so that the extensions that selenium uses will be valid for the new FF.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cd vendor/ # In webrat dir
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/readystate@openqa.org/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/{538F0036-F358-4f84-A764-89FB437166B4}/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/\{503A0CD4-EDC8-489b-853B-19E0BAA8F0A4\}/install.rdf 
jar xf selenium-server.jar \
customProfileDirCUSTFF/extensions/readystate\@openqa.org/install.rdf 
jar xf selenium-server.jar \
customProfileDirCUSTFF/extensions/\{538F0036-F358-4f84-A764-89FB437166B4\}/install.rdf

replace "3.0.*" "3.*" -- `find . | grep rdf`

jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/readystate@openqa.org/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/{538F0036-F358-4f84-A764-89FB437166B4}/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/\{503A0CD4-EDC8-489b-853B-19E0BAA8F0A4\}/install.rdf 
jar uf selenium-server.jar \
customProfileDirCUSTFF/extensions/readystate\@openqa.org/install.rdf 
jar uf selenium-server.jar \
customProfileDirCUSTFF/extensions/\{538F0036-F358-4f84-A764-89FB437166B4\}/install.rdf

(hat tip to space vatican)

I haven’t been able to get Safari working yet.

I want to run selenium tests besides normal webrat tests, so I created a new environment “acceptance” that I can run tests under. Modify your test helper file:

1
2
3
4
5
6
7
8
# test/test_helper.rb
ENV["RAILS_ENV"] ||= "test"
raise "Can't run tests in #{ENV['RAILS_ENV']} environment" unless %w(test acceptance).include?(ENV["RAILS_ENV"])

require 'webrat'
require "test/env/#{ENV["RAILS_ENV"]}"

# ...
1
2
3
4
5
6
7
# test/env/test.rb
require 'webrat/rails'

Webrat.configure do |config|
  config.mode = :rails
  config.open_error_files = false
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# test/env/acceptance.rb
require 'webrat/selenium/silence_stream'
require 'webrat/selenium'
require 'test/selenium_helpers'
require 'test/element_helpers'

# Required because we aren't isolating tests inside a transaction
require 'database_cleaner'
DatabaseCleaner.strategy = :truncation

Webrat.configure do |config|
  config.mode = :selenium
end

class ActionController::IntegrationTest
  self.use_transactional_fixtures = false # Necessary, otherwise selenium will never see any changes!

  setup do |session|
    session.host! "localhost:3001"
  end

  teardown do
    DatabaseCleaner.clean
  end
end

# Hack: webrat requires this, even though we're not using rspec
module Spec
  module Expectations
    class ExpectationNotMetError < Exception
    end
  end
end
1
2
3
4
5
6
7
8
9
10
11
# lib/tasks/test.rake
namespace :test do
  task :force_acceptance do
    ENV["RAILS_ENV"] = 'acceptance'
  end

  Rake::TestTask.new(:acceptance => :force_acceptance) do |t|
    t.test_files = FileList['test/acceptance/*_test.rb']
    t.verbose = true
  end
end

Notes

  • selenium and javascript helpers are from pivotallabs pat, they’re really handy for testing visibilty of DOM elements
  • there’s some magic in webrat to conditionally require silence_stream based on something in active_support. I don’t understand it quite enough, but requiring it explicitly was necessary to get things running for me
  • webrat/selenium assumes some classes are loaded that only happens if you’re using rspec. I’m not, so stubbed out the ExpectationNotMetError (it is only referred to in a rescue block).
  • rake test:acceptance runs the selenium tests. Running acceptance tests directly as a ruby script runs them using normal webrat – this is actually handy when writing tests because you get a quicker turnaround.
  • to pause selenium mid test run (to see wtf is going on), just add gets at the appropriate line in your test

Faster rails testing with ruby_fork

A long running test suite isn’t the problem. Your build server can take care of that. A second or two here or there, no one notices.

The killer wait is in the red/green/refactor loop. You’re only running one or two tests, and an extra second can mean the difference between getting into flow or switching to twitter. And you know what kills you in rails?

1
2
3
4
5
$ time ruby -e '' -r config/environment.rb

real    0m3.784s
user    0m2.707s
sys     0m0.687s

Yep, the environment. That’s a lot of overhead to be waiting for everytime you run a test, especially since it’s the same code every time! You fix this with a clever script called ruby_fork that’s included in the ZenTest package. It loads up your environment, then just chills out, waiting. You send a ruby file to it, and it forks itself (the process containing the environment) to execute that file. The beauty of this is that forking is really quick, and it leaves a pristine copy of the environment around for the next test run.

‘Environment’ doesn’t just have be environment.rb, for bonus points you can load up test_helper.rb, which will also load your testing framework into memory. In fact, you can preload any ruby code at all – ruby_fork isn’t rails specific.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ ruby_fork -r test/test_helper.rb &
/opt/local/bin/ruby_fork Running as PID 526 on 9084

$ time ruby_fork_client -r test/unit/your_test.rb
Started
...
Finished in 0.565636 seconds. # Aside: this time is bollocks

3 tests, 4 assertions, 0 failures, 0 errors

real    0m0.972s # This is the time you're interested in
user    0m0.225s
sys     0m0.035s

That’s fantastic, though you’ll notice in newer versions of rails your application code is not reloaded. By default your test environment caches classes – which normally isn’t a problem except that newer rails versions also eager load those classes (so they’re loaded when you load enviornment.rb). You can fix this by clearing out the eager load paths in your test environment file:

1
2
# config/environments/test.rb
config.eager_load_paths = []

On my machine this gets individual test runs down from about 4 seconds to less than 1 second. You can sell that to your boss as a four-fold productivity increase.

Testing Glue Code

db2s3 combines together 3 external dependencies – your database, the filesystem, and Amazon’s S3 service. It has 1 conditional in the main code path (and it’s not even an important one). The classic unit testing approach of “stub everything” provides little benefit.

Unit testing is good for ensuring complex code paths execute properly, that edge cases are properly explored, and for answering the question “what broke?”. For trivial glue code, none of these are of particular benefit. There are no complex code paths or edge cases, and it will be quickly obvious what broke. In fact, the most likely thing to “break” (or change) over time isn’t your code, but the external services it is sticking together, which stubs cannot protect you from. Considering the high relative cost of stubbing out all your dependencies, unit testing becomes an expensive way of testing something quite simple.

For glue code, integration tests are the best solution. Glue code needs to stick, and integration tests ensures that it does. Here is the only test that matters from db2s3:

1
2
3
4
5
6
7
8
9
it 'can save and restore a backup to S3' do
  db2s3 = DB2S3.new
  load_schema
  Person.create!(:name => "Baxter")
  db2s3.full_backup
  drop_schema
  db2s3.restore
  Person.find_by_name("Baxter").should_not be_nil
end

This test costs money to run since it hits the live S3 service, but only in the academic sense. The question you need to ask is “would I pay one cent to have confidence my backup solution works?”

Always remember why your are testing. Unit tests are a focussed tool, and not always necessary.

Backup MySQL to S3 with Rails

UPDATE: This code is too old. Use db2fog instead. It does the same thing but better.

Here is some code I wrote over the weekend – db2s3. It’s a rails plugin that provides rake tasks for backing up your database and storing it on Amazon’s S3 cloud storage. S3 is a trivially cheap offsite backup solution – for small databases it costs about 4 cents per month, even if you’re sending full backups every hour.

There are many scripts around that do this already, but they fail to address the biggest actual problem. The aws-s3 gem provides a really nice ruby interface to S3, and dumping a backup then storing it really isn’t that hard. The real problem is that I really hate system administration. I want to spend as little time as possible and I want things to Just Work.

A script is great but there’s still too many things for me to do. Where does it go in my project? How do I set my credentials? How do I call it?

That’s why a plugin was needed. It’s as little work as possible for a rails developer to backup their database, so they can get back to making their app awesome.

db2s3. Check it out.

Singleton resource, pluralized controller fix for rails

map.resource still looks for a pluralized controller. This has always bugged me. Here’s a quick monkey patch to fix. Tested on rails 2.2.2.

1
2
3
4
5
6
7
8
9
10
11
12
13
# config/initializers/singleton_resource_fix.rb
module ActionController
  module Resources
    class SingletonResource < Resource #:nodoc:
      def initialize(entity, options)
        @singular = @plural = entity
        # options[:controller] ||= @singular.to_s.pluralize
        options[:controller] ||= @singular.to_s # This is the only line to change
        super
      end
    end
  end
end
1
2
3
4
5
6
# config/routes.rb
# before fix
map.resource :session, :controller => 'sessions'

# after fix
map.resource :session

Evolution of a graph

Recently I have wanted to chart some cost data I collected on various foods. As a baseline for discussion, here is a very vanilla excel type graph, reminiscent of ones I am certain you have seen in powerpoint presentations:

This is not a good graph for several reasons

  • Only provides a general overview of the data – some foods are cheaper, some more expensive, so what?
  • Labels feel cramped and ugly.
  • The grid is too prominent and distracting, without being very helpful – you can’t read accurate values from it.

The biggest problem is that it doesn’t “invite the eye to compare”. It doesn’t leave an impact. The first step to addressing this is to revisit the data – it’s quite possible you just have boring data. In this case, I improved the data by coding it according to whether it is vegetarian or not.

Version 2

For the next iteration of this graph, I colored the graph to highlight the vegetarian aspect of the food. To address the other issues, I moved the labels into the legend, and completely removed the grid, instead displaying the values directly on the graph. This technique works due to the low number of data points. You can think of it has “enhancing” the table rather than displaying a high level overview of it. Also, a serif font (georgia) was used.

This is certainly an improvement, but it still has its flaws

  • 8 different colors, which distracts from the data, and the vegetarian data is muted.
  • It is much harder to identify the food with the data point, now that the labels have been moved into the legend.

Final

I iterated again, moving the labels back down to the x-axis, which in addition to solving the identification problem, allowed me to drop back down to 2 colours. In our initial graph this felt cramped, so I added some more whitespace and also kept the serif font from the last iteration.

This version of the graph speaks much louder. It’s easier on the eye, and the conclusion I want to draw from the data is clearly expressed. I am using this graph (with proper references and notes) on a new information site I’m working on – it’s far from complete but you can follow along on github if you’re interested.

Tools

The first graph was made with OpenOffice spreadsheet, the second with a hacked version of flot for jQuery. The final graph was made with a new jQuery plugin I wrote called tufte-graph. There is a meta-lesson here – I spent hours hacking different JS libraries to try and get them working exactly how I wanted, in the end the quickest solution was to just write what I needed.

I use Colour Lovers to find color nice colour palettes. Works much better than trying random RGB codes.

Final word

Spend time on your graphs. A picture is worth a thousand words. They are too often neglected, and it doesn’t take much effort to make them really shine.

inject and collect with jQuery

You know, I would have thought someone had already made an enumerable plugin for jQuery. Maybe someone has. Mine is better.

  • Complete coverage with screw-unit
  • Interface so consistent with jQuery you’ll think it was core
1
2
3
4
squares = $([1,2,3]).collect(function () {
  return this * this;
});
squares // => [1, 4, 9]

It’s on github. It deliberately doesn’t have the kitchen sink – fork and add methods you need, there’s enough code it should be obvious the correct way to do it.

As an aside, it’s really hard to spec these methods concisely. I consulted the rubyspec project and it turns out they had trouble as well, check out this all encompassing spec for inject: “Enumerable#inject: inject with argument takes a block with an accumulator (with argument as initial value) and the current element. Value of block becomes new accumulator”. Bit of a mouthful eh.

Post your improvements in the comments.

Code for Christmas

Developers don’t have enough time.

We’re all too busy working our day job, or looking after our better half, to give our pet projects the attention they deserve.

That makes time the most valuable thing we can give. This year for Christmas, why not give a fellow developer some?

Ticking off an amazon wishlist never really resonated with me, so this year here is what we are all doing instead:

  1. Find someone’s pet open source project – I’d start at github
  2. Contribute! It doesn’t have to be much – a spec or two, some documentation, or even just a “hey it works on my box”. Fork, commit, pull request.
  3. Wish them a Merry Christmas!

That shouldn’t take you more than an hour. It’s a total win all around – you get to hone your chops, they get some love on their project, and the open source ecosphere is improved. If you’re feeling generous, or don’t have any friends, there’s no shortage of projects that I’m sure would welcome some support.

My wishlist is any of the ruby midi projects out there.

Unique data in dm-sweatshop

dm-sweatshop is how you set up test data for your datamapper apps. Standard practice is to generate random data that follows a pattern:

1
2
3
4
5
User.fix {{
  :login  => /\w+/.gen
}}

new_user = User.gen

Let’s not now debate whether or not random data in tests is a good idea. What’s more important is that the above code should make you uneasy if login is supposed to be unique. There was a hack in sweatshop that would try recreating the data if you had a uniqueness constraint on login and it was invalid, but it was exactly that: a hack. As of a few days ago (what will be 0.9.7), you need to be more explicit if you want unique data. It’s pretty easy:

1
2
3
4
5
include DataMapper::Sweatshop::Unique

User.fix {{
  :login  => unique { /\w+/.gen }
}}

Tada! You can also easily get non-random unique data by providing a block with one parameter. Check the README for this and other cool things you can do.

Introducing SocialBeat (screencast)

Here is a screencast of socialbeat in which you will note:

  1. I don’t appear drunk
  2. I don’t reveal intra-company communications
  3. I show off the full gamut of socialbeat’s awesomeness in under 3 minutes

In these ways you may find it superior to other screencasts you may have seen on the matter.


Introducing SocialBeat

If you are behind the times – socialbeat is some code that lets you live code OpenGL visualizations to MIDI tracks.

Comparing lambdas in ruby

to_ruby is a really convenient way to compare the equality of two lambdas. It’s a bit slow though. If we get our hands dirty (only a little!) with ParseTree, we can get a result 2 orders of magnitude quicker. I’d be interested to see if these benchmarks differ significantly on other versions of ruby.

1
2
~ $ ruby -v
ruby 1.8.6 (2007-09-23 patchlevel 110) [i686-darwin8.11.1]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
require 'benchmark'
require 'parse_tree'
require 'ruby2ruby'

def gen_lambda
  lambda {|x| x + 1 }
end

Parser = ParseTree.new(false)

# This only requires parse tree, not ruby2ruby
def proc_identity(block)
  klass = Class.new
  name = "myproc"
  klass.send(:define_method, name, &block)

  # .last ignores the method name and definition - they're irrelevant
  Parser.parse_tree_for_method(klass, name).last 
end

n = 1000
Benchmark.bmbm do |x|
  x.report("#to_ruby") { n.times { gen_lambda.to_ruby == gen_lambda.to_ruby }}
  x.report("#to_sexp") { n.times { gen_lambda.to_sexp == gen_lambda.to_sexp }}
  x.report("manual")   { n.times { proc_identity(gen_lambda) == proc_identity(gen_lambda) }}
end
1
2
3
4
               user     system      total        real
#to_ruby   4.460000   0.220000   4.680000 (  4.695327)
#to_sexp   0.920000   0.190000   1.110000 (  1.110214)
manual     0.030000   0.000000   0.030000 (  0.032768)

In case you were wondering, I was playing around with this while implementing unique data generation for dm-sweatshop

Integration testing with Cucumber, RSpec and Thinking Sphinx

Ideally you would want to include sphinx in your integration tests. It’s really just like your database. In practice, this is problematic. Ensuring the DB is started and triggering a re-index after each model load is doable, if slow, with a small bit of hacking of thinking sphinx (hint – change the initializer for the ThinkingSphinx::Configuration to allow you to specify the environment). Here’s the rub though – if you’re using transactional fixtures the sphinx indexer won’t be able to see any of your data! Turning that off can really slow down your tests, and once you add in the re-indexing time you’re going to be making a few cups of coffee while they run.

One approach I’ve been taking is to stub out the search methods with RR. I know, I know, stubbing in your integration tests is evil. I’m being pragmatic here. For most applications your search is trivial (find me results for this keyword), and if you unit test your define_index block you’re pretty well covered. To go one step further you could unit test your controllers with an expect on the search method, or have a separate suite of non-transactional integration tests running against sphinx. I like the latter, but haven’t done it yet.

Enough talk! Here’s the magic you need to get it working with cucumber:

1
2
3
4
5
6
7
8
9
# features/steps/env.rb
require 'rr'
Cucumber::Rails::World.send(:include, RR::Adapters::RRMethods)

# features/steps/*_steps.rb
Given /a car with model '(\w+)' exists/ do |model|
  car = Car.create!(:model => model)
  stub(Car).search(model) { [car] }
end

Capturing output from rake

Rake has an annoying habit of putting it’s own diagnostic line on the first line of output. You can strip that out with tail.

1
rake my_report:xml | tail -n+2 > output.xml

You don't need view logic in models

Jake Scruggs wrote about moving view logic into his models

It’s hard to tell without knowing the full dataset, but my approach to these sort of problems is to reduce the data down to the simplest possible form (usually a hash), and then use an algorithm to extract what I need.

One commenter tried this and I think it’s heading in the right direction. There is potentially quite a lot of duplication here – the repetition of the layouts and scripts. To ease this it can sometimes be easier to inverse the key/values, for a more concise representation. You could reduce this even further if there were sensible defaults (if 90% of cars used a two_column layout, for instance) – just replace the raise in the following code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# See original post for context
# Data
layouts = {
  'two_column'   => [Toyota, Saturn],
  'three_column' => [Hyundai],
  'ford'         => [Ford]
}

scripts => {
  'discount' => [Hyundai, Ford],
  'poll'     => [Saturn]
}

# Algorithm
find_key = lambda {|hash, car| 
  (
    hash.detect {|key, types| 
      types.any? {|type| car.is_a?(type)}
      # types.include?(car.class) if you're not using inheritance
    } || raise("No entry for car: #{car}")
  ).first
}

layout = find_key[layouts, @car]
script = find_key[scripts, @car]

@stylesheets += ['layout', 'theme'].collect {|suffix| "#{layout}_#{suffix}.css" }
@scripts     += ["#{script}.js"]

render :action => find_view, :layout => layout

This is preferable to putting this data in your object hierarchy for all the normal reasons, especially since it keeps view logic where you expect to find it and doesn’t muddy up your models.

Speeding up Rails Initialization

Chad Wooley just posted a tip to get rails starting up faster. Which is real, except it doesn’t work if you’re using ActiveScaffold. This is due to a load ordering problem – ActiveScaffold monkey patches the Resource class used by routes after routes have been parsed the first time, and relies on the re-parsing triggered by the inflections change.

To fix this, you can explicitly require the monkey patch just before you draw your routes (it doesn’t depend on anything else in ActiveScaffold).

1
2
3
4
5
6
7
# config/routes.rb
ActionController::Routing::Routes.draw do |map|
  # Explicitly require this, otherwise it won't get loaded before we parse our resources time
  require 'vendor/plugins/active_scaffold/lib/extensions/resources.rb'

  # Your routes go here...
end

Yes it’s a hack on top of hack, but I get my console 30% quicker, so I’m running with it.

Tested on 2.0.2

Rake tab completion with caching and namespace support

UPDATE: It now invalidates the cache if you touch lib/tasks/*.rake, for those using it with rails (like me)

There’s a few articles on the net regarding rake tab completion, I had to combine a few of them to get what I wanted:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/usr/bin/env ruby

# Complete rake tasks script for bash
# Save it somewhere and then add
# complete -C path/to/script -o default rake
# to your ~/.bashrc
# Xavier Shay (http://rhnh.net), combining work from
#   Francis Hwang ( http://fhwang.net/ ) - http://fhwang.net/rb/rake-complete.rb
#   Nicholas Seckar <nseckar@gmail.com>  - http://www.webtypes.com/2006/03/31/rake-completion-script-that-handles-namespaces
#   Saimon Moore <saimon@webtypes.com>

require 'fileutils'

RAKEFILES = ['rakefile', 'Rakefile', 'rakefile.rb', 'Rakefile.rb']
exit 0 unless RAKEFILES.any? { |rf| File.file?(File.join(Dir.pwd, rf)) }
exit 0 unless /^rake\b/ =~ ENV["COMP_LINE"]

after_match = $'
task_match = (after_match.empty? || after_match =~ /\s$/) ? nil : after_match.split.last
cache_dir = File.join( ENV['HOME'], '.rake', 'tc_cache' )
FileUtils.mkdir_p cache_dir
rakefile = RAKEFILES.detect { |rf| File.file?(File.join(Dir.pwd, rf)) }
rakefile_path = File.join( Dir.pwd, rakefile )
cache_file = File.join( cache_dir, rakefile_path.gsub( %r{/}, '_' ) )
if File.exist?( cache_file ) &&
   File.mtime( cache_file ) >= (Dir['lib/tasks/*.rake'] << rakefile).collect {|x| File.mtime(x) }.max
  task_lines = File.read( cache_file )
else
  task_lines = `rake --silent --tasks`
  File.open( cache_file, 'w' ) do |f| f << task_lines; end
end
tasks = task_lines.split("\n")[1..-1].collect {|line| line.split[1]}
tasks = tasks.select {|t| /^#{Regexp.escape task_match}/ =~ t} if task_match

# handle namespaces
if task_match =~ /^([-\w:]+:)/
  upto_last_colon = $1
  after_match = $'
  tasks = tasks.collect { |t| (t =~ /^#{Regexp.escape upto_last_colon}([-\w:]+)$/) ? "#{$1}" : t }
end

puts tasks
exit 0

Finding related content with Sphinx

Previous efforts to find related posts with the classifier gem yielded no fruit, so I tried another approach using sphinx. Turned out to be a winner.

The basic theory is to index all posts by tag, then to find related posts just use the current post’s tags as a search string. Remember to exclude the current post from the search results. For this blog, I use tags for the main categories, which were corrupting the results – most everything is tagged ‘Ruby’ so it doesn’t add any value in determining likeness. So rather than indexing all tags I excluded some of the main ones.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Post < ActiveRecord::Base
  has_many :searchable_tags, 
           :through    => :taggings,
           :source     => :tag,
           :conditions => "tags.name NOT IN ('Ruby', 'Code', 'Life')"
  
  def related_posts(number = 3)
    Post.search(:limit => number + 1, :conditions => {
      :tag_list => tag_list.join("|")
    }).reject {|x| x == self }.first(number)
  end

  define_index do
    indexes searchable_tags(:name), :as => :tag_list
    # If you want to use this for normal search as well you'll have to 
    # add in title/body here as well
  end
end

For a more complete example, see the relevant RHNH commits: cdc0bf and d4d844

Showing links to related content is a good way to stop the bottom of your page from being a ‘dead end’. In the event that no related posts are found, I’m linking to the archives instead.

Hash trumps case

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Two equivalent functions
def rgb(color)
  case color
    when :red   then 'ff0000'
    when :green then '00ff00'
    when :blue  then '0000ff'
    else             '000000' # Default to black
  end
end

def rgb2(color)
  {
    :red   => 'ff0000',
    :green => '00ff00',
    :blue  => '0000ff'
  }[color] || '000000'
end

Even though these functions are equivalent, the second carries more semantic weight – it maps a symbol directly to a color. The case sample makes no such guarantees since you can execute any arbitrary code in the then block. In addition, a hash is easier to work with – you can easily iterate over the keys, extract to another method if you need reuse, or query it for other properties (for example, 3 colors are available). It is also easier to read – both aesthetically and because it contains fewer tokens. In almost all circumstances I will prefer a hash over a case statement.

Relationships in data are easier to comprehend and manipulate than relationships in code.

Contextual Composition With Delegation

I’ve had some models getting rather large recently. This makes them hard to comprehend and makes the source difficult to browse. A lot of the time, a big chunk of functionality is fairly context specific – it is only relevant to one particular part of my application (reporting, data integration, etc…). Thoughtbot presented one way to do this recently by adding methods to the model that return another model with the extra goodness.

That’s not bad, but it still pollutes the class with methods that most users won’t care about. We can just decorate the class with extra methods at the time (context) that we need them. My first go at doing this used the extend method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class PurchaseOrder
  attr_reader :id
end

module Reports::PurchaseOrderMethods
  def description
    "A Purchase Order"
  end
end

class ReportMakerWithExtend
  def self.report_for(po)
    po.extend(Reports::PurchaseOrderMethods)
    "#{po.id}: #{po.description}"
  end
end

This has a few edge case problems though.

  1. It can potentially override methods in our base class. Imagine if PurchaseOrder#description was defined as private, our module would override this defenition resulting in probably breakage.
  2. It is inelegant to test – extend will override any existing stubs, so you need to stub it out. This is unintuitive and may have unintended consequences, for instance if the class is also using extend in a manner that doesn’t interfere with your stubs.
1
2
3
4
5
6
7
8
9
10
11
# Testing extended PurchaseOrder is inelegant
describe 'ReportMakerWithExtend#report_for' do
  it 'returns a line containing both ID and description' do
    po = stub(
      :id          => 1
      :description => "hello",
      :extend      => nil # :(
    )
    ReportMaker.report_for(po).should == "1: hello"
  end
end

Ruby provides another method to achieve what we want in the form of SimpleDelegator. Basically, it passes on any methods not defined on itself to the object specified in the constructor. This way we can wrap another object without fear of interferring with its internals nor our stubs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
require 'delegate'

class Reports::PurchaseOrder < SimpleDelegator
  def description
    "A Purchase Order"
  end
end

class ReportMaker
  def self.report_for(po)
    po = Reports::PurchaseOrder.new(po)
    "#{po.id}: #{po.description}"
  end
end

Much nicer. Of course, we would have specs for Reports::PurchaseOrder in addition to PurchaseOrder – this split allows us to keep our tests focussed and easy to read. Using delegation to split up your models allows you to separate code into areas where it is most relevant – helping keep both your models and your tests easy to read and maintain.

What's new in Enki - Admin Interface

I’ve just finished up a fairly major over haul of the Enki admin area, finally throwing away the ugly SimpleLog stylings. Features include:

  1. New visual style, heavily inspired by the new Habari Monolith look
  2. New dashboard, with space to add your own data (feedburner subscribers? analytics data?)
  3. Nicer forms (thanks formtastic!)
  4. AJAX goodness for UI snappiness
  5. Undo for item deletion (no more alert boxes!)

Screens:

Enki - Admin dashboard

Enki - Admin posts list

Of course there’s still more I’d like to add (in particular to do with tags), but isn’t that always the case? I think it’s pretty swish – if you’ve already got an install just pull from master, if you think you might like an install, head over to the Enki website.

  • Posted on May 03, 2008
  • Tagged code, enki

Testing flash.now with RSpec

flash.now has always been a pain to test. The the traditional rails approach is to use assert_select and find it in your views. This clearly doesn’t work if you want to test your controller in isolation.

Other folks have found work arounds to the problem, including mocking out the flash or monkey patching it.

These solutions feel a bit like using a sledgehammer to me. If you’re going to monkey patch/mock something, you want it to be as discreet as possible so to minimize the chance of the implementation changing underneath you and also to reduce the affect on other areas of your application. Also, why duplicate perfectly good code that is provided elsewhere?

The real problem with testing flash.now is that it gets cleaned up (via #sweep) at the end of the action before you get to test anything. So let’s solve that problem and that problem only: disable sweeping of flash.now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# spec/spec_helper.rb
module DisableFlashSweeping
  def sweep
  end
end

# A spec
describe BogusController, "handling GET to #index" do
  it "sets flash.now[:message]" do
    @controller.instance_eval { flash.extend(DisableFlashSweeping) }
    get :index
    flash.now[:message].should_not be_nil
  end
end

instance_eval is used to access the flash, since it’s a protected method, and we extend with the minimum possible code to do what we want – blanking out the sweep method. This should not cause problems because sweeping is only relevant across multiple requests, which we shouldn’t be doing in our controller specs.

Classifier gem rubbish for recommending posts

Chatting with Tim today he suggested maybe using Classifier::LSI would be a cool way to offer ‘related posts’ suggestions for a blog.

Not really knowing anything about it, I whipped up a prototype rake task. It creates the index then marshals it to disk because it takes ages to create and it’s not much fun to play with when you have to wait minutes each time. It then presents 3 related suggestions for each post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
require 'classifier'

namespace :lsi do
  task :test => :environment do
    if File.exists?("lsidata.dump")
      lsi = File.open("lsidata.dump") {|f| Marshal.load(f) }
    else  
      lsi = Classifier::LSI.new
      Post.find(:all, :order => 'published_at DESC').each do |post|
        text = post.body
        categories = post.tags.collect(&:name)
        puts "Indexing " + post.title
        lsi.add_item(text, *categories)
      end
      File.open("lsidata.dump", "w") {|f| Marshal.dump(lsi, f) }
    end

    Post.find(:all).each do |post|
      puts post.title
      puts lsi.find_related(post.body, 3).collect {|i| Post.find_by_body(i).title }.inspect
    end
  end
end

Here’s the data for my last 5 articles. I don’t know what I was expecting, but this just doesn’t seem very helpful. I don’t have a very rich set of tags on my posts, so that probably has something to do with it. Was kind of hoping it would just look at text and all just work * waves hands *.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Seagate 500Gb FreeAgent Pro external drive - first impressions
  - Building Firefox Extensions
  - The Colemak Diaries
  - Counting ActiveRecord associations: count, size or length?
Coconut Oats
  - The Colemak Diaries
  - Summertime Tagliarini
  - Mary Iron Chef - Chocolate Jaffa Boxes
Mary Iron Chef - Chocolate Jaffa Boxes
  - The Colemak Diaries
  - Building Firefox Extensions
  - Summertime Tagliarini
Paypal IPN fails date standards
  - Building Firefox Extensions
  - Straight Sailing with Magellan
  - The Colemak Diaries
I'm number 8!
  - Extending Rails
  - Practical Hpricot: SVG
  - Day of days

Next step is to try tagging my stuff better and seeing if that helps out.

Getting classifier working

Quick side note – pure ruby classifier doesn’t work out of the box with rails because it also redefines Array#sum. If you install the GSL lib and the ruby bindings (see classifier docs) you’ll still need this one line patch to classifier to get it to work:

1
2
3
4
5
6
7
8
9
10
11
12
Index: lib/classifier/lsi.rb
===================================================================
--- lib/classifier/lsi.rb       (revision 31)
+++ lib/classifier/lsi.rb       (working copy)
@@ -25,6 +25,8 @@
   # please consult Wikipedia[http://en.wikipedia.org/wiki/Latent_Semantic_Indexing].
   class LSI
     
+    include GSL if $GSL
+    
     attr_reader :word_list
     attr_accessor :auto_rebuild

UPDATE: I’ve forked classifier on github, so you can just grab that version if you like.

Nginx, OpenID delegation and YADIS

Typically OpenID delegation reads delegation information out of HTML headers on your home page:

1
2
<link rel="openid.server" ref="http://server.myid.net/server" />
<link rel="openid.delegate" href="http://xaviershay.myid.net/" />

The problem with this is that any client trying to discover this information needs to fetch your entire home page. If that client is your page (commenting on your own entry, for instance), that request can get queued up behind the same mongrel that was serving the original request, which of course now won’t complete until the OpenID delegation request times out.

There is another way to provide delegation information. Clients will request your home page with an accept header of application/xrds+xml – and you can use that information to serve up a static YADIS file rather than your home page. Mine looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
<xrds:XRDS xmlns:xrds="xri://$xrds" xmlns="xri://$xrd*($v*2.0)"
      xmlns:openid="http://openid.net/xmlns/1.0">
  <XRD>

    <Service priority="1">
      <Type>http://openid.net/signon/1.0</Type>
      <URI>https://server.myid.net/server</URI>
      <openid:Delegate>https://xaviershay.myid.net/</openid:Delegate>
    </Service>

  </XRD>
</xrds:XRDS>

And I serve it up with this Nginx rewrite rule:

1
2
3
if ($http_accept ~* application/xrds\+xml) {
  rewrite (.*) $1/yadis.xrdf break;
}

Try it in the comfort of your own home:

1
curl -H 'Accept: application/xrds+xml' http://rhnh.net

Ref: OpenID for non-SuperUsers

Powered by Enki

Finally got this blog switched over to Enki. Main feed has moved to feed burner. Please report any weirdness to the relevant authorities.

For some extra content, here’s what’s happening in the Enki world:

  • Moved to github (keeping gitorious as a mirror)
  • Tim has a functional multiple authors fork
  • API is functional if you want to kick the tyres a bit, still needs some work though. Here is some code to publish from VIM
  • Posted on April 12, 2008
  • Tagged code, enki

Paypal IPN fails date standards

Paypal Instant Payment Notification lets you know when you have received a paypal payment. Presumably, you then mark an order as paid or something. Do not use the current time as the paid_at date – despite the ‘instant’ in the title it can be many days later. You should use the payment_date provided by paypal. Your accountant will thank you.

But here’s the rub. From the IPN spec, payment_date is:
bq. Time/Date stamp generated by PayPal system [format: “18:30:30 Jan 1, 2000 PST”]

Seen that date format before? No? Didn’t think so. That’s no RFC I’ve seen before. The popular Paypal gem uses Time.parse, but this is incorrect (as of 2.0.0). Observe:

1
2
3
4
>> Time.parse("18:30:30 Mar 28, 2008 PST")
=> Fri Mar 28 18:30:30 1100 2008 # Good
>> Time.parse("18:30:30 Feb 28, 2008 PST")
=> Fri Mar 28 18:30:30 1100 2008 # FAIL

Also, Time only has a range of about a week, so that could screw you over come any major system failures (either you or paypal). Also note the payment_date is in PST, which unless you’re on the right side of the US is fairly useless. I recommend the following:

1
2
>> DateTime.strptime("18:30:30 Jan 1, 2000 PST", "%H:%M:%S %b %e, %Y %Z").new_offset(0)
=> Sun, 02 Jan 2000 02:30:30 0000

The un-intuitive new_offset converts to UTC. Patch submitted. I hate you, Paypal.

Absence, with suitable recompense

I’m going on holidays until the end of January. The off line kind of holiday where I don’t see a computer. So sad.

So here is a tasty treat for you to devour until I return. A sneak preview of a Fashionable New Blogging App™ named Enki. It is an alternative to Mephisto and SimpleLog that is built on the principles espoused in my prior writings. The website is built using Enki itself, and the port of this site from mephisto is just about finished, so you know you’re getting code that’s got a real life application. There’s still a few rough edges, but it’s ready enough to start building something with if you don’t mind getting your hands a little dirty. I’ve set up a mailing list for it which I’ll be catching up on once I get back.

Unobtrusive live comment preview with jQuery

Live preview is shiny. First get your self a URL that renders a comment. In rails maybe something like the following.

1
2
3
4
5
6
7
8
9
def new
  @comment = Comment.build_for_preview(params[:comment])

  respond_to do |format|
    format.js do
      render :partial => 'comment.html.erb'
    end
  end
end

Now you should have a form or div with an ID something like “new_comment”. Just drop in the following JS (you may need to customize the submit_url).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$(function() { // onload
  var comment_form = $('#new_comment')
  var input_elements = comment_form.find(':text, textarea')
  var submit_url = '/comments/new'  
  
  var fetch_comment_preview = function() {
    jQuery.ajax({
      data: comment_form.serialize(),
      url:  submit_url,
      timeout: 2000,
      error: function() {
        console.log("Failed to submit");
      },
      success: function(r) { 
        if ($('#comment-preview').length == 0) {
          comment_form.after('<h2>Your comment will look like this:</h2><div id="comment-preview"></div>')
        }
        $('#comment-preview').html(r)
      }
    })
  }

  input_elements.keyup(function () {
    fetch_comment_preview.only_every(1000);
  })
  if (input_elements.any(function() { return $(this).val().length > 0 }))
    fetch_comment_preview();
})

The only_every function is they key to this piece – it ensures that an AJAX request will be sent at most only once a second so you don’t overload your server or your client’s connection.

Obviously you’ll need jQuery, less obviously you’ll also need these support functions

1
2
3
4
5
6
7
8
9
10
11
12
13
// Based on http://www.germanforblack.com/javascript-sleeping-keypress-delays-and-bashing-bad-articles
Function.prototype.only_every = function (millisecond_delay) {
  if (!window.only_every_func)
  {
    var function_object = this;
    window.only_every_func = setTimeout(function() { function_object(); window.only_every_func = null}, millisecond_delay);
   }
};

// jQuery extensions
jQuery.prototype.any = function(callback) { 
  return (this.filter(callback).length > 0)
}

Viola, now you’re shimmering in awesomeness.
Demo up soon, but it’s similar to what you see on this blog (though this blog is done with inline prototype).

AtomFeedHelper produces invalid feeds

Summary: atom_feed is broken until changeset 8529

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# http://api.rubyonrails.org/classes/ActionView/Helpers/AtomFeedHelper.html#M000931
atom_feed do |feed|
  feed.title("My great blog!")
  feed.updated((@posts.first.created_at))

  for post in @posts
    feed.entry(post) do |entry|
      entry.title(post.title)
      entry.content(post.body, :type => 'html')

      entry.author do |author|
        author.name("DHH")
      end
    end
  end
end

Produces the following feed (rails 2.0.2)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <id>tag:localhost:posts</id>
  <link type="text/html" rel="alternate" href="http://localhost:3000"/>
  <title>My great blog!</title>
  <updated>2007-12-23T04:23:07+11:00</updated>
  <entry>
    <id>tag:localhost:3000:Post1</id>
    <published>2007-12-23T04:23:07+11:00</published>
    <updated>2007-12-30T15:29:55+11:00</updated>
    <link type="text/html" rel="alternate" href="http://localhost:3000/posts/1"/>
    <title>First post</title>
    <content type="html">Check out the first post</content>
    <author>
      <name>DHH</name>
    </author>
  </entry>
</feed>

Let’s run that through the feed validator

1
2
3
line 3, column 25: id is not a valid TAG
line 2, column 0: Missing atom:link with rel="self"
line 8, column 32: id is not a valid TAG

Oh dear. Not a happy result. Let’s fix it.

Problem the first is the feed ID tag. It doesn’t include a date, as per the Tag URI specification. This is a little bit tricky – you can’t just add Time.now.year as a default because that will change every year, and we need IDs to stay the same. We will provide an option to the user to specify the schema date, and produce a warning if they do not (as much as I’d like to just break it, the pragmatic side of me keeps backwards compatibility in).

The entry tag has the same problem, but you’ll also note it concatenates the class and the ID with no separator to create the ID. While it’s an edge case, this will break if you have a class name ending in a number, so we need to add in a separator. I vote for a slash. Also, the port in the tag URI is inconsistent with the feed URI (no port), so remove it.

For further reading, I recommend How to make a good ID in Atom.

The missing self link is just your garden variety bug – the documentation says it should be provided by default, but the code does not.

I went ahead and fixed these problems. Changeset 8529. The example above, when you change the call to atom_feed(:schema_date => 2008), looks like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <id>tag:localhost:/posts</id>
  <link type="text/html" rel="alternate" href="http://localhost:3000"/>
  <link type="application/atom+xml" rel="self" href="http://localhost:3000/posts.atom"/>
  <title>My great blog!</title>
  <updated>2007-12-23T04:23:07+11:00</updated>
  <entry>
    <id>tag:localhost:Post/1</id>
    <published>2007-12-23T04:23:07+11:00</published>
    <updated>2007-12-30T15:29:55+11:00</updated>
    <link type="text/html" rel="alternate" href="http://localhost:3000/posts/1"/>
    <title>First post</title>
    <content type="html">HOORAY. About ruby.</content>
    <author>
      <name>DHH</name>
    </author>
  </entry>
</feed>

mmm, semantic goodness

I don't want preferences

Or why I’m writing another blog engine for ruby

I’ve been running this site on Mephisto for a number of months now. It is fantastic at what it does, but I’ve just recently realised it’s not what I want.

I want to configure my blog by hacking code

I don’t want preferences or theme support – I want to edit code. Mephisto isn’t great for this – it uses non standard routing (everything goes through dispatch), it uses liquid templates. I feel like I have to learn Mephisto to hack it.
SimpleLog is another rails option, but it sucks because it reads like a PHP app, and I don’t want to be hacking that. It’s built to be configured, not to be hacked.

So here is my grand plan.

An opinionated blog engine that does things my way. OpenID login, XHTML valid default template, RESTful stuff, code highlighting in comments, etc…
To install, you branch my master git repo and customize away. You can just keep rebasing to get all the trunk updates. You can publish a ‘theme’ in the form of a patch against trunk. The code is going to be lean since I don’t need to accommodate for 5, 10 or 15 articles per page, so it will be easy to comprehend.

Basically, it’s so you can write your own blog without having to worry about boring stuff like admin, defensio integration, and OpenID auth.

I wonder what I’ll call it.

UPDATE: Look I made it – Enki

Don't use pagination on your blog

What problem are you trying to solve? In my case, I don’t want the bottom of the page to be a dead end. Paging would appear to be a good solution – click next page, get more content. Alas, it has issues:

  • When you post a new article, it changes the content of all your pages. Google doesn’t like this – search traffic to your blog will suffer since people will click through expecting an older version of the page.
  • Invalidates your entire cache when you post something new. Admittedly not a problem for most of us, but worth considering.

Archives solve my problem – not wanting a dead end, while avoiding the two problems with pagination mentioned above. It is harder to get your window size right though (you don’t want 2 or 200 articles per page).

For bonus points, add something like the Humanized Reader. Javascript fetches the next article when you’re near the bottom of the page, seamlessly adding it to the bottom of the page so the user can just keep on reading.

I’ve just added archives to this site – an interim fix to tide me over until I do it right.

Thanks to Rick Olson for telling me I didn’t need paging.

Lesstile - A yuletide present

Textile is great for formatting articles. But comments aren’t articles, and I have always felt that textile was overkill. Do you really need nested headings and subscript in comments? No.

Also! And more importantly, textile doesn’t output valid XHTML. Consider the following textile code:

1
2
3
4
5
<b>
Hello

This is broken
</b>

Converts to:

1
2
3
<p><b>
Hello</p>
<p>This is broken</b></p>

That sucks if your blog happens to be XHTML strict, because then your site is broken :( So I made an alternative. I offer it as a present to you: Lesstile

Try it out, it’s pretty neat:

1
gem install lesstile
1
2
3
4
5
6
7
8
9
require 'lesstile'

Lesstile.format_as_xhtml <<-EOS
Wow this is ace!

--- Ruby
def some_code
  "yay code"
end

EOS
-

It supports code blocks, and that’s it. You can easily pass it through CodeRay to get syntax highlighting if you want – see the docs. In the future it may also support hyperlinking. That’s all I suppose commenters on this blog need, maybe you will tell me otherwise. Try it out on this post.

As a special extra treat, I added live preview to this blog, so you can see what your comment is going to look like as you write. It’s just like the future!

Please comment with code to say hi.

Tail call optimization in erlang

1
2
fact(1) -> 1;
fact(N) -> N * fact(N - 1).

You’ve all seen the classic recursive factorial definition. Problem is, it’s not really useable. 50000 factorial, anyone? The problem is it needs to create a new stack frame for each recursive call, very quickly blowing out your memory usage. Let’s look at a classic erlang structure, a message processing loop:

1
2
3
4
5
loop() ->
  receive
    hello -> io:format("hello")
    loop().
  end.

That looks mighty recursive also – one would be inclined to think that saying hello a couple of thousand times would quickly chew through memory. Happily, this is not the case! The reason is tail call optimization.

As you can see, the above hello program is really just a loop. Note that when we call loop(), there’s no reason to maintain the stack for the current call, because there is no more processing to be done. It just needs to pass on the return value. The erlang compiler recognises this, and so can optimize the above code by doing just that – throwing away the stack (or transforming it into a loop, whichever you prefer).

With the factorial example, optimization cannot be done because each call needs to wait for the return value of fact(N-1) to multiply it by N – extra processing that depends on the call’s stack.

Tail call optimization can only be done when the recursive call is the last operation in the function.

With this knowledge, we can rewrite our factorial function to include an accumulator parameter, allowing us to take advantage of the optimization.

1
2
3
fact(N)    -> fact(N, 1).
fact(1, T) -> T;
fact(N, T) -> fact(N - 1, T * N).

Or since we recognise that you can redo this with a loop, you could always just write it that way yourself.

1
fact(N) -> lists:foldl(fun(X, T) -> X * T end, 1, lists:seq(1, N)).

I haven’t used erlang enough to make a call as to which is nicer. Probably the first one. I’m a ruby guy at heart, so for old time’s sake here’s a version you can use in ruby, which I think is quite pretty (be warned ruby doesn’t do tail call optimization).

1
2
3
def fact(n)
  (1..n).inject(1) {|t, n| t * n}
end

Test setup broken in Rails 2.0.2

Some changes went into rails 2.0.2 that mean the setup method in test subclasses won’t get called. Here’s how it went down:

  • 8392 broke it
  • 8430 tagged 2.0.2
  • 8442 reverted 8392
  • 8445 added a test so it doesn’t break again

You can see some code illustrating the problem in 8445. This affects two plugins that we’re using – helper_test and activemessaging.

For the helper test, the work around is to rename your helper test setup methods to setup_with_fixtures.

1
2
3
def setup_with_fixtures
  super
end

For activemessaging, add the following line to the setup of your functionals that are failing (from the mailing list):

1
ActiveMessaging.reload_activemessaging

Understanding the Y Combinator

Many people have written about this, it still took me a long while to figure it out. It’s a bit of a mindfuck. So here is me rehashing what other people have said in a way that makes sense to me.

The Problem

I’ll start with the same example of hash autovivication (that’s what perl calls it) used by Charles Duan in his article.
We want the following code to work:

1
2
3
hash = Hash.new {|h, k| h[k] = default } # We need to implement default later, read on!
hash[1][2][3][4][5] = true
hash # => {1=>{2=>{3=>{4=>{5=>true}}}}}

To do this, we need to specify an appropriate default value for the hash. If we set the default to {}, we only get one level of autovivication.

1
2
3
hash = Hash.new {|h, k| h[k] = {} }
hash[1]    # => {} 
hash[1][2] # => nil

Clearly we need a recursive function to support infinite depth, which we can do with a normal ruby method.

1
2
3
4
5
6
def make_hash
  Hash.new {|h, k| h[k] = make_hash }
end  

hash = make_hash
hash[1][2][3][4][5] # => {}

The problem here is we’ve introduced a new method into the namespace (make_hash), which isn’t really necessary. The Y Combinator allows us to achieve the same result, without a named method or variable.

The Solution

We can avoid the need for a named method by wrapping the Hash creation code in an anonymous lambda that passes in the callback as an argument.

1
lambda {|callback| Hash.new {|h, k| h[k] = callback.call }}.call(some_callback)

We just need a way to pass in a callback function that is the same as the initial function. If you try to copy and paste in the hash maker code, you’ll find it doesn’t quite work because we then need a way to get a callback for that callback.

1
2
3
4
5
6
7
lambda {|callback| 
  Hash.new {|h, k| h[k] = callback.call }
}.call(
  lambda { 
    Hash.new {|h, k| h[k] = callback.call }
  }
}) # fails because the second callback isn't defined

But we’re getting closer. What if we pass in our initial callback function as a parameter to itself? Then it will know how to call itself over and over again. This is pretty tricky – the first example illustrates the concept using a named method for clarity, the second example is what we actually want.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# With named method
def make_hash(x) 
  Hash.new {|h,k| h[k] = x.call(x)}
end 
hash = make_hash(method(:make_hash))

# With lambdas
hash = lambda {|callback| 
  Hash.new {|h, k| h[k] = callback.call(callback) }
}.call(
  lambda {|callback| 
    Hash.new {|h, k| h[k] = callback.call(callback) }
  })
hash[1][2][3][4][5] # => {}, hooray!

And that’s really the guts of it. If you understand that you’ve pretty much got it. From here on in it’s just extra credit.

Making it DRY

The previous code repeats itself somewhat – you copy and paste the hash maker function into two spots. Basically, the code is hash = x.call(x). So let’s use another lambda to express it as such.

1
2
3
4
lambda {|x| x.call(x) }.call(
  lambda {|callback| 
    Hash.new {|h, k| h[k] = callback.call(callback) }
  })

Making it work for callbacks with an arbitrary number of parameters

By passing in the callback to itself, we’re restricting ourselves to a callback with no parameters. You’ll notice we’re not able to pass in any parameters to the hash maker above. As you may have guessed, we add another level of abstraction with a lambda that passes in a callback_maker function.

1
2
3
4
5
6
hash = lambda {|x| x.call(x) }.call(lambda {|callback_maker| 
  lambda {|*args| 
    callback = callback_maker.call(callback_maker)
    Hash.new {|h, k| h[k] = callback.call(*args) }
  }
}).call("an argument!")

So yes, that example is kind of useless because we don’t use the arguments. Let’s try something a bit meatier, say a factorial function.

1
2
3
4
5
6
7
lambda {|x| x.call(x) }.call(lambda {|callback_maker| 
  lambda {|*args| 
    callback = callback_maker.call(callback_maker)
    v = args.first
    return v == 1 ? 1 : v * callback.call(v - 1)
  }
}).call(5) # => 120

Making it generic and pretty

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def y_combinator(&generator)
  lambda {|x| x.call(x) }.call(lambda {|callback_maker| 
    lambda {|*args| 
      callback = callback_maker.call(callback_maker)
      generator.call(callback).call(*args)
    }
  })
end

y_combinator {|callback|
  lambda {|v|
    return v == 1 ? 1 : v * callback.call(v - 1)
  }
}.call(5) # => 120
end

And let’s make it a bit less ugly by doing what Tom Mortel did and using [] instead of call (they’re equivalent), and moving the callback_maker inline.

1
2
3
4
5
def y_combinator(&f)
  lambda {|x| x[x] } [
    lambda {|maker| lambda {|*args| f[maker[maker]][*args] }}
  ]
end

Thus ends my exploration of the Y Combinator. Practically useless in any language you’d be using today, but hey, don’t you feel smarter?

UPDATE: Added dmh’s suggestion from the comments.

Rails devs, reclaim your harddrive

1
2
cd code-dir
find . | egrep "(development|test)\\.log" | grep -v .svn | xargs rm

I’d forgotten to clear out my logs for a long while. This found me 9.5Gb!

  • Posted on December 17, 2007
  • Tagged code, rails

Making cerberus more fun

And throughout the lands of the Greek empire, he was known and feared as Cerberus, the original three-headed party dog from hell

Here is patch to the cerberus campfire publisher that enables it to prepend a funny image to its messages. Submitted to core, guess it depends on how much of a sense of humour the author has.

Someone let GIS know it’s about to be thrashed by queries for train wrecks and hi fives.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Index: lib/cerberus/config.example.yml
===================================================================
--- lib/cerberus/config.example.yml     (revision 167)
+++ lib/cerberus/config.example.yml     (working copy)
@@ -17,6 +17,11 @@
 #    channel: cerberus
 #  campfire:
 #    url: http://someemail:password@cerberustool.campfirenow.com/room/51660
+#    preamble: 
+#      # Posts content before the main message based on the build state. Perfect for amusing images.
+#      # Valid states are: setup, broken, failed, revival, successful
+#      broken:  http://mydomain.com/broken.jpg
+#      revival: http://mydomain.com/fixed.jpg
 #  rss:
 #    file: /usr/www/rss.xml
 #builder:
@@ -26,4 +31,4 @@
 #hook:
 #  rcov:
 #    on_event: successful, setup #by default - run hook for any state
-#    action: 'export CERBERUS_HOME=/home/anatol && sudo chown www-data -R /home/anatol/cerberus && rcov' #Add here any hook you want
\ No newline at end of file
+#    action: 'export CERBERUS_HOME=/home/anatol && sudo chown www-data -R /home/anatol/cerberus && rcov' #Add here any hook you want
Index: lib/cerberus/publisher/campfire.rb
===================================================================
--- lib/cerberus/publisher/campfire.rb  (revision 167)
+++ lib/cerberus/publisher/campfire.rb  (working copy)
@@ -3,8 +3,10 @@
 class Cerberus::Publisher::Campfire < Cerberus::Publisher::Base
   def self.publish(state, manager, options)
     url = options[:publisher, :campfire, :url]
+    preamble = options[:publisher, :campfire, :preamble, state.current_state]
     
     subject,body = Cerberus::Publisher::Base.formatted_message(state, manager, options)
+    Marshmallow.say(url, preamble) unless preamble.nil?
     Marshmallow.say(url, subject)
     Marshmallow.paste(url, body)
   end

Props to grant for the inspiration and finding of the title photo

Formatting ruby hashes in VIM

I’ve been meaning to write this script for a while. If you’re anal about your whitespace (like I), you’ll often pretty up your ruby hashes to make them easy to read by adding a bit of whitespace to the keys before the =>. I wrote a ruby script to do this automatically!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/usr/bin/env ruby

# format_hash.rb
#
# Formats ruby hashes
# a => 1
# ab => 2
# abc => 3
#
# becomes
# a   => 1
# ab  => 2
# abc => 3
#
# http://rhnh.net

lines = []
while line = gets
  lines &lt;&lt; line
end

indent = lines.first.index(/[^\s]/)

# Massage into an array of [key, value]
lines.collect! {|line| 
  line.split('=>').collect {|line| 
    line.gsub(/^\s*/, '').gsub(/\s*$/, '') 
  }
}

max_key_length = lines.collect {|line| line[0].length}.max

# Pad each key with whitespace to match length of longest key
lines.collect! {|line|
  line[0] = "%#{indent}s%-#{max_key_length}s" % ['', line[0]]
  line.join(' => ')
}

print lines.join("\n")

Put that in your path, then in VIM you can run the following command to format the current selection:

:‘<,’>!format_hash.rb

  1. Or map F2 to do it for you…
    :vmap <F2> !format_hash.rb<CR>
    -

Logging SQL statistics in rails

When your sysadmin comes to you whinging with a valid concern that your app is reading 60 gazillion records from the DB, you kinda wish you had a bit more information than % time spent in the DB. So I wrote a plugin that counts both the number of selects/updates/inserts/deletes and also the number of records affected. [This plugin is no longer available, the code is below for posterity.]

That does the counting, you need to decide how to log it. I am personally quite partial to adding it to the request log line, thus getting stats per request:

1
2
3
4
5
# vendor/rails/actionpack/lib/action_controller/benchmarking.rb:75
log_message << " | Select Records: #{ActiveRecord::Base.connection.select_record_count}"
log_message << " | Selects: #{ActiveRecord::Base.connection.select_count}"

ActiveRecord::Base.connection.reset_counters!

Don’t forget the last line, otherwise you get cumulative numbers. That may be handy, but I doubt it. We’re only logging selects because that’s all we care about at the moment. I am sure this will change in time.

UPDATE: Moved to github, bzr repo is no longer available

UPDATE 2: Pasted code inline below, it’s way old and probably doesn’t work anymore.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
module ActiveRecord::ConnectionAdapters
  class MysqlAdapter
    class << self
      def counters
        @counters ||= []
      end

      def attr_accessor_with_default(name, default)
        attr_accessor name
        define_method(name) do
          instance_variable_get(:"@#{name}") || default 
        end
      end

      def define_counter(name, record_func = lambda {|ret| ret })
        attr_accessor_with_default("#{name}_count", 0)
        attr_accessor_with_default("#{name}_record_count", 0)

        define_method("#{name}_with_counting") do |*args|
          ret = send("#{name}_without_counting", *args)
          send("#{name}_count=", send("#{name}_count") + 1)
          send("#{name}_record_count=", send("#{name}_record_count") + record_func[ret])
          ret
        end
        alias_method_chain name, :counting

        self.counters << name
      end
    end

    define_counter :select, lambda {|ret| ret.length }
    define_counter :update
    define_counter :insert
    define_counter :delete

    def reset_counters!
      self.class.counters.each do |counter|
        self.send("#{counter}_count=", 0)
        self.send("#{counter}_record_count=", 0)
      end
    end
  end
end

Tiny doc patch wins hearts

Rails patch accepted after just 44 minutes: r8379

A result of moving our app from preview 1-ish on to 2-stable this morning. Only other issues were a test that was expecting a ProtectedAttributeAssignmentError – now the attribute just doesn’t get set (a good change), and some small changes where we were doing stupid things with view paths.

  • Posted on December 13, 2007
  • Tagged code, ruby

exception_notifiable and ruby 1.8.6 p110

ruby 1.8.6 p110 has recently come out in ports. If you’re using the exception_notifiable plugin to let you know about errors, make sure you update it to at least r8191, otherwise it will break when you update ruby. And you won’t know about it, because it can’t email you.

Things that aren't subversion

Here are the slides for the talk I gave at the Melbourne Ruby Meetup last Thursday night. It was a little bit rambling, but my basic point was: You should try using bazaar instead of subversion because it’s more awesome. Number one question was “so should I use bazaar or git?”, to which I unfortunately don’t have a good answer. I personally haven’t used either enough to give an unequivocal recommendation, and there are heavyweights in both corners (ubuntu, linux kernel). My initial impression is bazaar is easier, git more powerful. There are also other options such as darcs and mecurial.

For the curious, I’d say start with bazaar because it has the smallest learning curve from svn – see the slides. It seem that most non-svn ruby projects are on git, so you’ll get to know that eventually :)

Hash#translate_keys_and_values

1
2
3
4
5
6
7
8
9
module CoreExtensions
  module Hash 
    def translate_keys_and_values(&block)
      inject({}) {|a, (key, value)| a.update(block.call(key) => block.call(value))}
    end
  end
end

Hash.send(:include, CoreExtensions::Hash)

It’s like symbolize_keys but a bit more flexible. It calls the block for every key and value in the hash. Of course you could tune it just do keys or values if you wanted. I do not want!

1
2
{"1" => "2"}.translate_keys_and_values(&:to_i)  # => {1 => 2}
{1 => 2}.translate_keys_and_values {|x| x + 1 } # => {2 => 3}

Array#collapse

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
module CoreExtensions
  module Array
    def collapse
      self.inject([]) do |a, v|
        if existing = a.find {|o| o.eql?(v)}
          yield(existing, v)
        else
          a << v
        end
        a
      end
    end
  end
end

Array.send(:include, CoreExtensions::Array)

Kind of handy for reporting, where you need to collapse line items into a summary. This example may make it clear:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Item < Struct.new(:code, :quantity)
  def eql?(b)
    code == b.code
  end

  alias_method :==, :eql?

  def hash
    code.hash
  end

  def to_s
    "#{code} - #{quantity}"
  end
end  

summary = [Item.new("a", 1), Item.new("a", 2), Item.new("b", 5)].collapse {|a, b| a.quantity += b.quantity}
summary.collect(&:to_s) # => ["a - 3", "b - 5"]

Maintaining a stable branch

Part one of my VCS ninja skills program.

A common scenario for a production application is to have a trunk for development, and a stable branch that is deployed to production. This is what we do at RedBubble, and here I share how to complete some common tasks with subversion.

Push out a new release

It might seem like a good idea to merge trunk into stable. Not so! Trunk is the code that you’ve been working with and testing with, merging it into another branch introduces the risk of either hard conflicts (not so bad – you can fix them) or the scarier Bodgy Merge (technical term) where subversion thinks it has merged everything correctly but hasn’t. We blow away our stable branch and just copy over trunk. Takes less time, and we’re more confident in the result. Here’s an example from our release notes:

1
2
svn delete -m "Removed previous stable branch" svn+ssh://example.com/home/svn/branches/stable
svn copy -m "Ice T Release - Iteration 2 : trunk to stable (r1234)" svn+ssh://example.com/home/svn/trunk svn+ssh://example.com/home/svn/branches/stable

We also tag the release in tags/ (just another copy), but to this day we have never checked out one of the tags, so maybe that isn’t worthwhile. You can always checkout a specific revision anyway.

Patch a bug fix into stable

Oh noes! Production is broken! Code Red! Hopefully you release often enough that trunk and stable are similar enough that you can apply the same patch to both of them. This is the case 99% of the time for us, so when something is broken we fix it in trunk, then merge the patch across to stable to release.

1
2
3
4
5
6
# trunk fix was r100
cd branches/stable
svn merge -r99:100 svn+ssh://example.com/home/svn/trunk .
svn st   # Always check!
svn diff # Always check!
svn ci -m "Merge r100 from trunk (my awesome bug fix)"

That’ll get it done, but we don’t want to be just competent. Ninjas aren’t just ‘competent’.

1
2
3
4
5
#!/usr/bin/env ruby
ARGV.collect {|x| x.to_i }.each do |revision|
  cmd = "svn merge -r#{revision-1}:#{revision} svn+ssh://example.com/home/svn/trunk ."
  puts `#{cmd}`
end

Put that in your bin folder – mine’s called rbm (RedBubble Merge – yay for obscure shortcuts) – and you can now patch with rbm 100 105. It’s so quick, there have been reports of patches getting merged before they’re even committed to trunk.

UPDATE: Multi-param version of rbm

Facets patch

1
2
$ svn log svn://rubyforge.org/var/svn/facets/trunk -r 383 -v
---------------------------------------------------------------------

r383 | transami | 2007-11-03 23:31:54 +1100 (Sat, 03 Nov 2007) | 2 lines
Changed paths:
M /trunk/lib/core/facets/hash/op.rb
M /trunk/test/unit/hash/test_op.rb

Fixed bug in Hash#- Thanks to Xavier Shay.

1
2
3
4
--- ruby
require 'facets/hash/op'
{:a => 1, :b => 2, :c => 3} - [:a, :b]            # => {:c => 3}
{:a => 1, :b => 2, :c => 3} - {:a => 1, :b => 99} # => {:b => 2, :c => 3}

It may be small, but it’s authentic. In the 2.0.5 gem.

Introducing Clerk Simon

Someone sends you an email and you want to add them to your LDAP address book, but your email client doesn’t support it *cough*thunderbird*cough*. If you think the next best way would be to just forward that email somewhere and have someone else take care of it, then allow me to introduce Clerk Simon. He’s quite attentive when it comes to such matters, and fully certified to boot. Full details at that link, check it out.

1
2
3
4
bzr co http://code.rhnh.net/clerk_simon/
cd clerk_simon
cp config.sample.yml config.yml # Edit to taste
bin/clerk_simon config.yml

LDAP Address Book with FreeBSD and SSL

First you need to install and configure the OpenLDAP server. Clearly you won’t want to use rhnh.net – just substitute in your own domain.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
sudo pkg_add -r openldap24-server
sudo pkg_add -r openssl

sudo cp /usr/local/openssl/openssl.cnf.sample /usr/local/openssl/openssl.cnf 
# Generate a self signed certificate
sudo openssl req -newkey rsa:1024 -x509 -nodes -out server.pem -keyout server.pem -days 3650
sudo mkdir /usr/local/etc/ldap
sudo mv server.pem /usr/local/etc/ldap

# /etc/rc.conf
slapd_enable="YES"
slapd_flags='-h "ldaps://rhnh.net/"'

# /usr/local/etc/openldap/ldap.conf
# Add these same settings not just on the server but for each client
BASE dc=rhnh, dc=net
URI ldaps://rhnh.net/
TLS_REQCERT allow

# /usr/local/etc/openldap/slapd.conf:
# Add
include     /usr/local/etc/openldap/schema/cosine.schema
include     /usr/local/etc/openldap/schema/inetorgperson.schema

TLSCipherSuite HIGH:MEDIUM:-SSLv2
TLSCACertificateFile /usr/local/etc/ldap/server.pem
TLSCertificateFile /usr/local/etc/ldap/server.pem
TLSCertificateKeyFile /usr/local/etc/ldap/server.pem

require authc

# Modify these properties from their defaults
suffix          "dc=rhnh,dc=net"
rootdn          "cn=xavier,dc=rhnh,dc=net"
# Use slappasswd to generate your own password
rootpw          {SSHA}Iogj+Awafoj9FP5IdLVy1DmFaASDw1P5 # secret

Start up the server to make sure everything is apples

1
2
sudo /usr/local/etc/rc.d/slapd start
openssl s_client -connect rhnh.net:636 -showcerts

Load up a schema to hold your address book entries, and here is also an example entry.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# directory.ldif
dn: dc=rhnh, dc=net
objectClass: top
objectClass: dcObject
objectClass: organization
dc: rhnh
o:  Robot Has No Heart

dn: ou=people, dc=rhnh, dc=net
objectClass: top
objectClass: organizationalUnit
ou: people

# contact.ldif
dn: cn=Xavier Shay, ou=people, dc=rhnh, dc=net
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
cn: Xavier Shay
gn: Xavier
sn: Shay
mail: contact@rhnh.net
ou: people
mobile: 0400-123-456
1
2
ldapadd -D 'cn=xavier,dc=rhnh,dc=net' -f directory.ldif -W
ldapsearch -D 'cn=xavier,dc=rhnh,dc=net' -w -x # Check everything worked

To configure Thunderbird to use your address book, go to Edit - Preferences... - Composition - Edit Directories... and follow the bouncing ball. Thunderbird can’t write to the directory, which is kind of a pain. Maybe you could use Evolution, which I think works. Maybe you could write an app that monitors a drop box and updates your directory for you. Maybe you could assume I’ve already done what I suggested and wait for me to release it in the very near future.

Tested on FreeBSD 6.2-stable

References

Gutsy upgrade

Just upgraded Ubuntu from feisty (7.04) to gusty (7.10). Was a bit touch and go for a moment … got a filesystem check failure on reboot. Miamoto_musashi, my knight in shining armour from #roro, saved the day. ls -l /dev/disk/by-id/ revealed that my HDs had been remapped from /dev/hd to /dev/sd. The gutsy upgrade had modified all of the standard partitions to use UUIDs, but had failed to update a custom mount I had (/data -> /dev/hdc1). Changed that over in /etc/fstab, reboot, hooray we have a winner.

Get with the times: IMAP

Last night I bought a fastmail account that I can use to host my rhnh.net mail. It supports IMAP. I really should have found out about this a long time ago, I’ve been living in the POP dark ages. I’m not going to list the benefits – many have done so before. I’m just going to say: If you’re still using POP, stop kidding yourself and get on to IMAP. Your quality of life will improve.

  • Posted on October 31, 2007
  • Tagged code, email

Sinatra deserves an encore

I’m putting together a small site for a dancing troupe I’m involved with. Index page, bio pages, that’s about it. I want basic templating so I can keep my HTML dry. Initially I tried rolling my own solution with ERB and rake to generate HTML, but that was shit, so I found Sinatra and found that a much tastier. It’s kind of like camping but without all the weird meta-fu. Also, it has a sweet name and sweet copy:

1
2
3
4
$ ruby app.rb 
== Sinatra has taken the stage on port 4567!
GET / | Status: 200 | Params: {:format=>"html"}
== Sinatra has ended his set (crowd applauds)

My app, sans views and data (use your imagination):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
['sinatra', 'yaml'].each {|x| require x }

# This complex bit just loads up a YAML file and indexes an array of hashes
# by their name. Also, it symbolizes keys because strings are for losers
symbolize_keys = lambda {|a,v| a.update(v[0].intern => v[1]) }
Data = YAML.load(File.open('data/performers.yml')).inject({}) {|a, v| a.update(v["name"].downcase => v.inject({}, &symbolize_keys))}

layout do
  File.open('views/main.erb').read
end

helpers do
  def dancer
    data = Data[params[:id].downcase]
    data[:bio] = erb(:"dancers/#{params[:id]}")
    data
  end
end

get '/' do
  erb :index
end

get '/dancers/:id' do
  if dancer
    erb :dancer
  else
    status(404)
  end
end

static '/static', 'static'

When I need to deploy to some cheap-cheap-we-support-nothing host I can just spider wget the whole site and FTP it up. For the complete integrated coding experience may I recommend Mr. Sinatra live with The Count Basie Band.

Enumerable#inject is my favourite method

Combines the elements of enum by applying the block to an accumulator value (memo) and each element in turn. At each step, memo is set to the value returned by the block. – RubyDoc

It just doesn’t sound very helpful. I must confess, it isn’t something I use everyday. But I love that when you do want to use it, it is oh so sweet. The canonical example is summing the elements in an array:

1
[1,2,3].inject(0) {|sum, n| sum + n} # => 6

Probably the most used pattern is converting an array to a hash:

1
[1,2,3].inject({}) {|a, v| a.update(v => v * 2)} # => {1 => 2, 2 => 4, 3 => 6}

Someone in IRC today wanted a nested send, something like @"string".send(“trim.downcase”)

1
"trim.downcase".split('.').inject("HELLO  ") {|obj, method| obj.send(method)} # => "hello"

What do you inject?

Extending Rails

Previously, I extended rails by monkey patching stuff in lib/. This was good because it kept vendor/rails clean.

I have changed my mind!

I now just patch vendor/rails directly with a comment prefixed by RBEXT explaining why. This means that when I piston update rails, I get notified of any conflicts immediately, rather than having to remember what was in lib. It’s also much easier and quicker than monkey patching. Theoretically, I could also run the rails tests to make sure everything is still kosher, but I must confess I haven’t gotten around to patching the tests as well…

And the comments are ace because I can use this sweet rake task to see what rb-rails currently looks like:

1
2
3
4
desc "Show all RB extensions in vendor/"
task :core_extensions do
  FileList["vendor/**/*.rb"].egrep(/RBEXT/)
end

How we use the Presenter pattern

FAKE EDIT: I wrote this article just after RailsConf but have just got around to publishing it. Jay has since written a follow up which is worthwhile reading.

I may have been zoning out during Jay Fields talk at RailsConf – not sleeping for a few days will do that to you – but I think I got the gist of his presentation: “Presenter” isn’t really a pattern because it’s use is to specific and there isn’t anything that be generalized from it. Now, I’m not going to argue with Jay, but I thought it may be helpful to give an example of how we’re using this “pattern” and how it is helpful for us at redbubble.

Uploading a piece of work to redbubble requires us to create two different models – a work and a storage, and associate them with each other. Initially, this logic was simply in the create method of one of our controllers. My problem with this was it obscured the intent of the controller. To my mind a controller is responsible for the flow of the application – the logic governing which page the user is directed to next – and kicking off any changes that need to happen at the model layer. In this case the controller was also dealing with the exact associations between the models, roll back conditions. Code that as we will see wasn’t actually specific to the controller. In addition, passing validation errors through to the views was hard because errors could exist on one or more of the models. So we introduced a psuedo-model that handled the aggregation of the models for us, it looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class UploadWorkPresenter < Presenter
  include Validatable

  attr_reader :storage
  attr_reader :work

  delegate_attributes :storage, :attributes => [:file]
  delegate_attributes :work,    :attributes => [:description]

  include_validations_for :storage
  include_validations_for :work

  def initialize(work_type, user, attributes = {})
    @work_type = work_type
    @work = work_type.new(:user => user, :publication_state => Work::PUBLISHED)
    @storage = work_type.storage_type.new

    initialize_from_hash(attributes)
  end

  def save
    return false if !self.valid?

    if @storage.save
      @work.storage = @storage
      if @work.save
        return true
      else
        @storage.destroy
      end
    end

    return false
  end
end

We have neatly encapsulated the logic of creating a work in a nice testable class that not only slims our controller, but can be reused. This came in handy when our UI guy thought it would be awesome if we could allow a user to signup and upload a work all on the same screen:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class SignupWithImagePresenter < UploadWorkPresenter
  attr_reader :user

  delegate_attributes :user, :attributes => [:user_name, :email_address]

  include_validations_for :user

  def initialize(attributes)
    @user = User.new
    super(ImageWork, @user, attributes)
  end

  def save
    return false if !self.valid?

    begin
      User.transaction do
        raise(Validatable::RecordInvalid.new(self)) unless @user.save && super
        return true
      end
    rescue Validatable::RecordInvalid
      return false
    end
  end
end

So why does Jay think this is such a bad idea? I think it stems from a terminology issue. Presenters on Jay’s project were cloudy with their responsibilties – handling aggregation, helper functions, and navigation. As you can see, the Presenters we use solely deal with aggregation, keeping their responsibility narrow.

For reference, here is our base Presenter class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Presenter
  extend Forwardable
  
  def initialize_from_hash(params)
    params.each_pair do |attribute, value| 
      self.send :"#{attribute}=", value
    end unless params.nil?
  end
  
  def self.protected_delegate_writer(delegate, attribute, options)
    define_method "#{attribute}=" do |value|
      self.send(delegate).send("#{attribute}=", value) if self.send(options[:if])
    end
  end
  
  def self.delegate_attributes(*options)
    raise ArgumentError("Must specify both a delegate and an attribute list") if options.size != 2
    delegate = options[0]
    options = options[1]
    prefix = options[:prefix].blank? ? "" : options[:prefix] + "_"
    options[:attributes].each do |attribute|
      def_delegator delegate, attribute, "#{prefix}#{attribute}"
      def_delegator delegate, "#{attribute}=".to_sym, "#{prefix}#{attribute}=".to_sym
      def_delegator delegate, "#{attribute}?".to_sym, "#{prefix}#{attribute}?".to_sym
    end
  end
end

Object#send_with_default

Avoid those pesky whiny nils! send_with_default won’t complain.

1
2
3
"hello".send_with_default(:length, 0)      # => 5
    nil.send_with_default(:length, 0)      # => 0
"hello".send_with_default(:index, -1, 'e') # => 1

So sending parameters is a little clunky, but I don’t reckon’ you’ll want to do that much. Here is the extension you want:

1
2
3
4
5
6
7
8
9
module CoreExtensions
  module Object
    def send_with_default(method, default, *args)
      !self.nil? && self.respond_to?(method) ? self.send(*args.unshift(method)) : default
    end
  end
end

Object.send(:include, CoreExtensions::Object)

I'm a rails contributor

Allow me to gloat for a moment. Please turn your attention to changeset 7692 you’ll notice my name in the credits. So it’s not much, but there’s a certain amount of geek cred there.

Counting ActiveRecord associations: count, size or length?

Short answer: size. Here’s why.

length will fall through to the underlying array, which will force a load of the association

1
2
3
>> user.posts.length
  Post Load (0.620579)   SELECT * FROM posts WHERE (posts.user_id = 1321) 
=> 162

This is bad. You loaded 162 objects into memory, just to count them. The DB can do this for us! That’s what count does.

1
2
3
>> user.posts.count
  SQL (0.060506)   SELECT count(*) AS count_all FROM posts WHERE (posts.user_id = 1321) 
=> 162

Now we’re on to something. The problem is, count will always issue a count to the DB, which is kind of redundant if you’ve already loaded the association. That’s were size comes in. It’s got smarts. Observe!

1
2
3
4
5
6
7
>> User.find(1321).posts.size
  User Load (0.003610)   SELECT * FROM users WHERE (users.id = 1321) 
  SQL (0.000544)   SELECT count(*) AS count_all FROM posts WHERE (posts.user_id = 1321) 
=> 162
>> User.find(1321, :include => :posts).posts.size 
  User Load Including Associations (0.124950)   SELECT ...
=> 162

Notice it uses count, but if the association is already loaded (i.e. we already know how many objects there are), it uses length, for optimum DB usage.

But know that’s not all. There’s always more. If you also store the number of posts on the user object, as is common for performance reasons, size will use that also. Just make sure the column is named _association__count (i.e. posts_count).

1
2
3
4
5
>> User.columns.collect(&:name).include?("posts_count")
=> true
>> User.find(1321).posts.size
  User Load (0.003869)   SELECT * FROM users WHERE (users.id = 1321) 
=> 162

The bad news

So now you’re all excited, I better tell you why this is only fantastic until you start using has_many :through.

Now, the situation is slightly different between 1.2.x (r4605) and edge (r7639), so I’ll start with stable. Now, they may look the same but a normal has_many association and one with the :through option are actually implememted by two entirely separate classes under the hood. And it so happens that the has_many :through version kind of, well, doesn’t have quite the same smarts. It loads up the association just as length does (then falls through to Array#size). Edge is sharp enough to use a count, but still doesn’t know about any caches you may be using. This was commited in r7237, so it’s pretty easy to patch in to stable. Or you can use this extension (on either branch – here is the trac ticket): This patch was added to edge in 7692

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module CoreExtensions::HasManyThroughAssociation
  def size
    return @owner.send(:read_attribute, cached_counter_attribute_name) if has_cached_counter?
    return @target.size if loaded?
    return count
  end

  def has_cached_counter?
    @owner.attribute_present?(cached_counter_attribute_name)
  end

  def cached_counter_attribute_name
    "#{@reflection.name}_count"
  end
end

ActiveRecord::Associations::HasManyThroughAssociation.send(:include, CoreExtensions::HasManyThroughAssociation)

How it doesn’t work

1
user.posts.find(:all, :conditions => ["reply_count > ?", 50]).size

size normally works because assocations use a proxy – when I call user.posts it won’t actually load any posts until I call a method that requires them. So user.posts.size can work without ever loading the posts because they aren’t required for the operation. The above code won’t work well because find does not use a proxy – it will straight away load the requested posts from the DB, without size getting a chance to send a COUNT instead. You may be better off moving this finder logic into an association so that size will work as expected. This also has the benefit that if you decide to add a counter cache later on you won’t have to change any code to use it.

1
has_many :popular_posts, :class_name => "Post", :foreign_key => "post_id", :conditions => ["reply_count > ?", 50]

So use size when counting associations unless you have a good reason not to. Most importantly thought, ensure you’re watching your development log so to be aware what SQL your app is generating.

UPDATE: Added link to my patch on trac

UPDATE 2: … which is now closed, see r7692

Practical Hpricot: CruiseControl.rb results

1
2
3
4
5
6
7
8
9
10
require 'hpricot'
require 'open-uri'

url = "http://mydomain.com/builds/myapp/#{ARGV[0]}"
doc = Hpricot(open(url))

puts (doc/"div#build_details h1").first.inner_text.gsub(/^\s*/, '')
(doc/"div.test-results").each do |results|
  puts results.inner_html
end

Grabs the current build status from CruiseControl.rb. Especially handy since our build server isn’t sending emails at the moment.

Data is fun

This is a story about a graph.

Inspiration struck just before sunrise one Sunday morning. 8 of us, too tired to sleep, decided to construct a relationship map of the local swing dancing scene. Naturally, the discussion turned to relationships on a micro level … who dances with who, who asks who, and the like, a topic quickly abandoned since gossip is a much more readily available data at 5am in the morning. But the seed was sown and my mind was compelled to tend it. On Monday I borrowed a copy of Tufte’s The Visual Display of Quantitative Information from work and, well, if you don’t feel like drawing a graph after reading that book there is something wrong with you.

Collection

On the following Wednesday I packed up my laptop and set off to brat pack (my performance troupe) rehearsal. Innocuously planted in the line of other machines waiting to play music or show off videos, my iSight went unnoticed as it snapped a picture of the dance floor every second during social dancing, weaving them together into a little over 1 minute of footage.

That Friday after a few too many post work beers at the local, being in an appropriate data collection mood I reviewed the footage and created a two column table: lead on the left, follow on the right, one row per song. The low quality of the iSight made identifying couples towards the rear of the hall tricky, but the tendency of dancers to generally wear distinctly colored clothes made it possible.

Presentation

A brief stint of research led me to Processing, a Java environment for creating neat data visuals. I would have preferred something with ruby, but you take what you can get. My Java was a bit rusty, and the collection handling was downright clumsy to what I was used to in ruby, but after a Saturday of hacking I had something I’m quite proud of. Behold, the "dancing network of brat pack for the 15th August:

Brat Pack Dancing Network

I tried to apply many of Tufte’s ideas in the creation of this graph. It was initially presented vertically, but I rotated it so it was easier to compare the histograms. Chart-junk is kept to a minimum, only the horizontal lines representing each dancer are non-data carrying, and the connecting lines were deliberately thinned and lightened to make interpreting the myriad of partnerships easier. Labels use a serif font and also provide scale information and except for one are all presented horizontally.

Looking forward, I’d like to collect some richer data – both more of the same and also extra information like tempo of song – to incorporate into the graph. I suspect the best way to do this would be to record normal video rather than timelapse, to both grab the audio and also make identifying partnerships easier.

This is stupid: Hash#select vs reject

A little consistency would be nice…

1
2
{1=>1, 2=>2, 3=>3}.reject {|key, value| key != 1 } # => {1=>1}
{1=>1, 2=>2, 3=>3}.select {|key, value| key == 1 } # => [[1, 1]]

Convert M4A to WAV in Ubuntu

1
mplayer -ao pcm:file=targetfile.wav sourcefile.m4a

The Switch to VIM

I’m been meaning to try out Vim for a while, especially since I now use two different platforms/editors for development (mac/textmate at work, linux/jedit at home). Finally got some time to try it out this weekend, and initial reports are positive! The thing that strikes me the most is how quickly you can navigate/select things without using the mouse. Vim’s navigation shortcuts are like CTRL-(LEFT|RIGHT) on crack. Regex search forward and back, move by multiple lines, seek to next/prev char. I haven’t internalized this navigation yet and already I’m loving it. Give me some experience and I’ll become an absolute machine. It’s a bit weird because I’m using colemak, so the “stick to the home row” mantra doesn’t really apply, but overall it’s still quite bearable. I need to figure out how to replicate the Apple-T shortcut in textmate (quick swith to file) and I think I’ll be sold. I’ll use it at work for the week and see how things go. For reference, I found the tutorial on the vi-improved site to be quite helpful.

Practical Hpricot: SVG

Inkscape does a pretty good job of creating plain SVG files, but they could be nicer. A particular file I was working on had many path elements, all with the same style attribute that I wanted to move into a parent tag (or external style or whatever). What better way to strip them out than Hpricot?

1
2
3
4
5
6
7
8
9
10
11
require 'hpricot'

doc = open(ARGV[0]) { |f| Hpricot.XML(f) }

(doc/:path).each do |path|
  [:id, :style].each do |attr| 
    path.remove_attribute(attr)
  end
end

puts doc

And you get the benefit of prettier formatting!

Practical Hpricot: XML to INI

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
require 'hpricot'
require 'open-uri'

def ini_entry(url, name)
  buffer = "[#{url}]\\n"
  buffer += "name = #{name}\\n"
  buffer += "\\n"
  buffer
end

doc = Hpricot(open("http://www.byteclub.net/testsite/getFeeds.php"))

(doc/"blog").each do |elem|
  url  = (elem/"url")
  name = (elem/"name")
  comments = (elem/"comments")
  
  if name.length > 0
    puts ini_entry(url.inner_text, name.inner_text) if url.inner_text.length > 0
    puts ini_entry(comments.inner_text, name.inner_text + " Comments") if comments.inner_text.length > 0
  end
end

Planet coming soon!

Let's go bowling with OO

To compare with my previous post: bowling_scorer_oo.rb

I don’t like this version as much.

How would YOU do it?

Let's go bowling

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class BowlingScorer
  def score(balls, frames = 10)
    return frames == 0 ? 0 : score_function(balls[0], balls[1]).call(balls) + score(balls, frames - 1)
  end
  
protected
  Component       = Struct.new(:condition, :number_to_score, :number_to_shift)
  ConditionIsTrue = lambda {|x| x[0].call }
  
  def score_function(s1, s2)
    p = Component.new *[
      [ lambda { s1 == 10},      3, 1], # Strike
      [ lambda { s1 + s2 == 10}, 3, 2], # Spare
      [ lambda { true },         2, 2]  # Default
    ].find(&ConditionIsTrue)
    return join_return_first(score_frame(p.number_to_score), multi_shift(p.number_to_shift))
  end
  
  def score_frame(n)
    lambda {|balls| n ? balls[0..n-1].inject(0) {|a, g| a + g } : 0 }
  end
  
  def multi_shift(count)
    lambda {|x| count.times { x.shift } }
  end
end

scorer = BowlingScorer.new
scorer.score([10] * 11) # => 300
scorer.score([5] * 21)  # => 150

Full source and tests – bowling_scorer.rb

EDIT: Refactored BowlingScorer#score_function

Eating with functions

1
2
3
4
5
6
7
8
9
# 3 Tasty treats, all the same!
edibles.each do |edible|
  edible.eat! if likes?(edible) || edible.is_healthy?
end

condition = lambda {|edible| likes?(edible) || edible.is_healthy?}
edibles.select(&condition).each(&:eat!)

edibles.select(disjoin(&method(:likes?), &:is_healthy?)).each(&:eat!)

Help: &:eat!, disjoin

Snippets in SVN

Added a snippets section to my public subversion repository, to hold random ruby goodies that I happen to have found useful at some point or another. Maybe you too will find something enticing?

  • Posted on February 13, 2007
  • Tagged code

No Audio in Ubuntu

Just a quick one – for some reason my sound stopped working in Ubuntu. To fix, right click volume icon (once you’ve re added it to the panel if it’s not usually there), select “Open Volume Control” and ensure that PCM is not muted.

Also, to allow sounds from multiple sources to play simultaneously, go to System → Preferences → Sound and select ESD for your output device and ensure “Enable Software Sound Mixing (ESD)” is selected. Not sure why this wasn’t working as a default for me.

Mode Errors in Mobile Phones

A recent post on the humanized weblog has got me thinking about mode errors in software I use often.

One particularly nasty one occurs on my Nokia 2100 phone when sending SMS messages. After selecting “send”, a box is displayed to enter the destination number. To select a contact from your address book, you press button A. However, if a number is already present in the entry box (if you are replying to a message, and in other circumstances whose criteria I am uncertain of), the same button A sends the message!

The implication? My reflex action is to press button A immediately after selecting “send” to go to my address book. Twice in the past two days there has already (unexpectedly!) been a number there, causing me to send a message to the wrong person.

What potentially embarrassing or disastrous mode errors do you deal with regularly?

Gmail and PGP

Recently I set myself up to be able to use PGP signing and encryption with Thunderbird. Privacy: it’ll cure what ails ya. That’s all well and great when I’m at home, but it’s kind of hard to use my desktop when I’m roaming the wild savannah of Africa. The two webmail products I use don’t support PGP (gmail and the one provided by my hosting). So I’ve started work on a mouseHole script – PgPirate – that checks the code before it hits my browser and processes all the PGP stuff for me. Next step is to get it installed on a USB flash drive with ProxyLike so I can use it on most any other computer I happen to find myself using.

Passwordless Login

I’ve been typing in SSH passwords for ever now. For some reason I just assumed it was a pain to setup passwordless login. Wrong! It took me about 10 minutes. Ubuntu already has all the tools you need.

1
2
3
ssh-keygen -t rsa -f ~/.ssh/id_rsa -C "xavier@home"
ssh-add ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub | ssh xavier@remote_host 'cat - >> ~/.ssh/authorized_keys'

Repeats steps 3 and 4 for each remote host.

It just works. And you know how much I like that.

Reference

Hack the Planet

  1. Vegetarian – battery farms lose!
  2. Buy organic, fairtrade and/or local where possible
  3. No car, use public transport and feet, except where not possible (Geelong)
  4. Refuse plastic bags, although I think perversely our excessive number of green bags at home is soon to become an environmental risk
  5. I plan to vote, haven’t had the opportunity yet
  6. Spread the love. Bring politics into conversations. Getting people talking and thinking is the first step.

The last one is important. Preaching at people will never work – global awareness must come from within. We must provide the support and encouragement. Lead by example. It can be tough sometimes. I almost hit intolerable despair last night. Startling, raw, realisations: The pope – the most important man in Christianity – is a political retard, the most powerful man in the world is widely regarded an idiot, and you couldn’t have pulled the recent Naomi Robson story from Frontline… Politics, Religion, Media, the triple crown. The world is loco.

In other news, I’ve just commited some C Sharp tools to Ruby Rant , if you’re interested in a sweet build tool that lets you use ruby (XML loses!). I’m using it for a fairly decent project at work (multiple projects, resources, unit tests, etc) and find it a pleasure to work with. Note I’m talking about a replacement for the deprecated method described in the current documentation. I’m going to get that updated, but for now check the mailing list for info.

HAML Tutorial

HAML is, and is an acronym for, an HTML Abstraction Markup Language. It is a replacement for the RHTML templates we are so used to in rails applications. If you are interested in why one would need such a thing, please read John Philip Green’s excellent HAML introduction. If you are more interested in how one would use such a thing, read on!

Table of Contents

  1. Installation
  2. Fundamentals
  3. XHTML techniques
  4. Ruby techniques
  5. Conclusion

h3.#installation). Installation

First things first, install the plugin:

1
./script/plugin install -x svn://hamptoncatlin.com/haml/trunk haml

This gives you a library to parse HAML templates, and also registers the .haml extension with rails. What this means is that to start using HAML you only need to rename your template from ‘index.rhtml’ to ‘index.haml’. Do that now (in a new test app, an existing app, whatever), as we are about to get our first taste of ham … (l).

h3.#fundamentals). Fundamentals

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
%h1 HAML Example
%div
  %blockquote 
    Farewell, Emily. It was fun, but you were a robot. 
    You had no heart. 
---   

In the same vein as YAML and Python, *indentation matters* in HAML. It allows the parser to cleverly close our tags without being explicitly told to do so. Equals less typing for us lazy sloths. 2 spaces per indent is the rule. The first non-whitespace character of each line is what is used to decide how to parse the line. As may be evident, the % character indicates an XHTML tag. There are only 5 others, which we will cover in due course. Lines that do not begin with a special character are treated as normal text.

h3.#xhtml_techniques). XHTML techniques

Being a prime requirement of a templating language, outputting XHTML is as simple as you would expect. I'm not even going to write a full paragraph, this annotated listing should suffice:
--- haml
/ The slash character specifies an XHTML comment,
/  but if after a tag name it self closes that tag
%br/

/ Attributes are specified by a hash provided directly after 
/ the tag name. There is NO SPACE between the tag and the hash
%a{"name" => "top"}

/ "class" is such a common attribute that it has a shortcut syntax
%span.important Tada!

/ Combine the two to impress you friends
%span.extra{"style"=>"color: red"} Tada! Tada!

/ A div with id is also common, so it too has a shortcut syntax
#content
  This is a div with id "content"

/ As does a div with class
.fancy
  This is a div with class "fancy"

The one curly aspect of generating XHTML you only need to deal with once – the doctype. You can use three exclamation marks on the first line of a template (hopefully a layout template) to output a doctype declaration. The problem is that it makes your document XHTML 1.0 transitional. Always. It also forgets to give you an XML prolog, so for now I specify these without using HAML, which brings up another point – you can mix normal XHTML tags and HAML code (although why you would want to outside of this fix eludes me).

1
2
3
4
5
!!!
%html{"xmlns"=>"http://www.w3.org/1999/xhtml", "xml:lang"=>"en"}
  %head
    %title Layout Example
  %body= @content_for_layout
1
2
3
4
5
6
7
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%html{"xmlns"=>"http://www.w3.org/1999/xhtml", "xml:lang"=>"en"}
  %head
    %title Layout Example
  %body= @content_for_layout

Those on the edge may want to keep an eye on this ticket , which proposes a fix.

h3.#ruby_techniques). Ruby techniques

1
2
3
= link_to :controller => 'home'
= 1 + 2 # => 3
%span= 1 + 2

Text after an equals sign is evaluated as ruby code. It is roughly equivalent to
<%= 1 + 2 # => 3 %>, but with one fairly major caveat: Each is evaluated independent of the rest of the template. Meaning the follow will not work, because the first line is evaluated as an entire ruby snippet, and does not find the end it requires to be valid.

1
2
3
= for i in (1..10)
= i
= end

There is currently no way around this. There is a ticket on the HAML trac with a proposed fix, but at the time of authoring the patch has not been attached. This is not as shocking as it may first appear. Ask yourself why you are using a loop or an if block in your code. If it cannot be reduced to a one liner, maybe it should be moved it out into a partial.

1
2
= (1..10).inject('') { |buffer, i| buffer + i.to_s }
= render :partial => 'secret', :collection => @secrets if cia?

An alternative way to evaluate ruby code is to use a tilde instead of equals. This has the effect of searching in the evaluated string and replacing all newlines found in pre, code or textarea tags with an XHTML entity (&#000A;). This allows you to create neat markup even when displaying large chunks of preformatted text.

1
 ~ "<textarea>\n\n\n\n\n\nYo</textarea>"

Keep in mind that your ruby expression must not span more than one line – only the first line will be parsed and the rest will be treated as plain text. There is a proposed fix (that makes 3! I want a pony) on the HAML Trac, if you are in to that sort of thing.

h3.#conclusion). Conclusion

HAML may not be quite as powerful as RHTML yet, but it drastically reduces the size of your views while greatly increasing readability and the quality of the markup. The best part is you can mix and match – you can start writing HAML templates in your existing project right now and keep all your old RHTML code hanging around.

DOM Quirks

Unobtrusive javascript is undoubtably the nicest way to add Javascript behaviours to a web page. It keeps the HTML clean and (hopefully) ensures it will degrade properly in older browsers. That said, the methods you generally use for this type of design (see Unobtrusive Javascript for an excellent introduction) contain a number of quirks you should be aware, of which this article addresses a few. In particular, unexpected or non-obvious behaviour in createElement, appendChild, and getElementsByTagName.

Table of Contents

  1. Creating Elements
  2. Appending Elements
  3. Finding Elements
  4. Conclusion

Creating Elements

The createElement function allows the dynamic creation of HTML elements. It takes one parameter: the type of element to create. It is used in conjunction with setAttribute to modify the attributes of a new element. Elements created in this way will not actually be displayed in the document until added with appendChild, insertBefore or replaceChild. The following code creates an image (but does not display it):

1
2
element = document.createElement("img");
element.setAttribute("src", "img1.jpg");

While support for this is good in the major browsers, there is a small quirk in IE that can cause some pain when creating forms. To quote MSDN:

Attributes can be included with the sTag as long as the entire string is valid HTML. You should do this if you wish to include the NAME attribute at run time on objects created with the createElement method.

What this means is that in IE, you can do the following (which is equivalent to the above snippet of code):

1
2
str = '<img src="img1.jpg" />';
element = document.createElement(str);

While IE supports the first method shown for most attributes, if you want to set the “name” attribute of an element you must use the second method. This is a problem since Mozilla will throw an exception on the latter. Thankfully, we can use exception handling for an easy workaround:

1
2
3
4
5
6
7
8
try {
  str = "<input name='aradiobutton' type='radio' />"
  element = document.createElement(str);
} catch (e) {
  element = document.createElement("input");
  element.setAttribute("name", "aradiobutton");
  element.setAttribute("type", "radio");
}

Appending Elements

Using appendChild (or replaceChild) is the “correct” way to add content to a DOM, rather than the more popular innerHTML property.

When using this function to add rows to a table, you should add the rows to a tbody or equivalent tag inside the table, not the table tag itself. Mozilla and Opera will pick up the new rows if you add them directly to the table tag, whereas IE will not.

Finding Elements

You can get a collection of all tags of a specific type using the getElementsByTagName function. Not only is this handy for standard unobtrusive javascript behaviours, you can also use it to do cool things like automatically process all elements in a form.

1
2
3
4
5
6
7
8
function showData(form) {
  inputs = form.getElementsByTagName("input");
  buffer = "";
  for (i = 0; i < inputs.length; i++)
    buffer += inputs[i].name + "=" + inputs[i].value + "\n";

  alert(buffer);
}

Although it may appear to act like an array, it is very important to remember that the returned object is actually an HTMLCollection. It does not support any array-like functions (concat, splice, etc…) bar those presented above. This is because the HTMLCollection is a live representation of the page’s HTML, and such functions would interfere.

1
2
3
4
5
// Assume an empty document
images = document.getElementsByTagName("img");  
// images.length = 0
addImgElementToDocument(); // function implemented elsewhere 
// images.length = 1;

This can be an annoyance when we know that the HTML structure will not be changing, and is easily worked around:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function collectionToArray(col) {
  a = new Array();
  for (i = 0; i < col.length; i++)
    a[a.length] = col[i];
  return a;
}

function showData(form) {
  elems = form.getElementsByTagName("input");
  inputs = collectionToArray(elems);
  elems =  form.getElementsByTagName("select");
  inputs = inputs.concat(collectionToArray(elems));
  buffer = "";
  for (i = 0; i < inputs.length; i++)
    buffer += inputs[i].name + "=" + inputs[i].value + "\n";
        
  alert(buffer);
}

It would be nice if the collectionToArray function above could be added to @HTMLCollection@’s prototype, however for some reason it is read-only.

Conclusion

These quirks may be minor and their solutions trivial, but it helps to be aware of them when coding any sort of unobtrusive javascript as it can reduce the amount of time you spend debugging seemingly illogical behaviour.

Building Firefox Extensions

This article will introduce the basics of Ruby Rant by creating a Rantfile to build Firefox extensions. You don’t actually need to know anything about extensions to follow along, but if you are interested may I recommend this tutorial by roachfiend. You will note that that article (and many others on the same topic) use a batch file to build their extensions. While this is quick to set up for simple development, a build file saves time and effort in the long run, and gives more flexibility.

I assume you at least know what Rant is – a replacement for Rake – and have it installed and working. Please visit their website for more information on this topic. This is also not a build file tutorial – you should know what a task and a dependency are.

Table of Contents

  1. Extension Basics
  2. Rant
  3. Making the JAR
  4. Cleaning
  5. Making the XPI
  6. Final Touches
  7. The Completed Rakefile

Extension Basics

The first step is to decide on directory structure for your project. Firefox extensions are comprised of two main portions – the install instructions, and the actual content of the extension. A Firefox extension (an XPI file) is really just a zip file with a different extension. You can open it up using your favourite archive manager and see the following structure:

1
2
3
4
5
6
myextension.xpi/
  install.js
  install.rdf
  chrome/
    myextension.jar/
      ... myextension content ...

Likewise, the JAR file is also a zip file with an alternate extension. We can see that there are two major portions of the extension that need building, the JAR and the XPI (which contains the JAR). As such, we will use a source structure that looks like this (download the source code):

1
2
3
4
5
myextension/
  Rantfile
  src/
    install/
    jar/

Clearly, the install folder will only contain our install.js and install.rdf files, and the jar folder will contain the contents of our jar.

Rant

Enough introduction, let’s get started with Rant. Rant is a replacement for Rake. I won’t go into detail here, but one of the advantages for our purposes is portable zip creation without the need for external libraries. Rant is similar to Rake in that you define all your build tasks in a file in your root directory – the Rantfile. We will create 3 tasks – package, clean, and clobber. The first obviously packages up our extension into a zip file and gives it a .xpi extension. “clean” removes temporary files used to package the extension, and “clobber” removes all generated artefacts (basically the same as clean but also removes the XPI file).

Making the JAR

Baby steps steps though – first of all we want to create the JAR file for our extension. We can do this using the Archive::Zip generator provided by Rant:

1
2
3
4
5
6
7
import "archive/zip"
require "archive_rootdir_fix"

gen Archive::Zip, "build/helloworld", 
                  :files     => sys["src/jar/**/*"],
                  :rootdir   => "src/jar",
                  :extension => ".jar"

This generator creates a task called “build/helloworld.jar” that creates exactly that archive, containing all the files from src/jar. “**/*” tells rant to recursively add all files. The rootdir parameter is necessary so that the generator knows where to start adding files. Without it, the created JAR will have the “src/jar” folders inside it, which is undesirable.

I draw your attention to the archive_rootdir_fix file that is being required. Support for the rootdir parameter is currently not in Rant. I’ve submitted a patch, but until it is accepted, you need this particular file. It is included in the example source code for you convenience.

The generated task name is quite cumbersome, but it is quite trivial to create an alias to it using a blank task with a sole dependency. But what happens when we change our extension name or build directory? We also have to recode our alias task. Thankfully, the generator returns an object with information about the generated task, so that we can use it later in our Rantfile:

1
2
3
4
5
6
7
8
import "archive/zip"

jar_t = gen Archive::Zip, "build/helloworld", 
                  :files     => sys["src/jar/**/*"],
                  :rootdir   => "src/jar",
                  :extension => ".jar"

task :build_jar => jar_t.path

Cleaning

Before we proceed, let us quickly set up our clean and clobber tasks, as they are required for the next section. Rant makes this trivially easy, so I’m just going to show you some code and move on.

1
2
3
4
5
6
7
8
import "clean"

gen Clean, :clean
var[:clean] << "build"

gen Clean, :clobber
var[:clobber] << "build"
var[:clobber] << "bin"

Making the XPI

As you can imagine, the next step – packaging up the XPI file – is more of the same. A small amount of trickery is required to get the JAR file into the chrome directory – we actually move files around and prepare the XPI file in the build directory, so that our zip task only has to zip the single directory. You can do this using methods of the sys object. Since it uses standard shell commands it is fairly self explanatory, as you’ll see in the following example. See that we can keep using the jar_t object through out build file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
xpitask = gen Archive::Zip, "bin/helloworld",
                            :version   => "1.0.0",
                            :files     => sys["build/**/*"],
                            :rootdir   => "build",
                            :extension => ".xpi"
task :build_xpi => xpitask.path           

task :prepare => [:build_jar] do |t|
  sys.mkdir_p "build/chrome"
  sys.mv jar_t.path, "build/chrome/helloworld.jar"
  sys.cp sys["src/install/**/*"], "build"
end

task :package => [:prepare, :build_xpi]

Note that we’ve added a version parameter to the zip task – this automatically appends a version string to our output file.

Final Touches

Now we just need to add the finishing touches to our build file. For maintainability, we will extract common names (such as the “helloworld” title and the “build” directory) into variables, so that changing them once will change them throughout the entire buildfile. You can use normal ruby variables for this, but it is preferable to use the “var” construct since it means you have the option of using them in Command generators later on (maybe I will cover it in another tutorial). It is more verbose, however, so you may choose not to use it in your own projects.

Finally, we move our public tasks to the top of file for readability and give them descriptions so they are displayed when executing “rant -T”. And there you have it folks, an automated build script for firefox extensions. Please download the source code to peruse at your leisure.

The Completed Rantfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Rantfile for building Firefox Extension
# Xavier Shay (xshay@rhnh.net), July 2006

import "archive/zip"
require "archive_rootdir_fix"
import "clean"

# Configuration
var :title   => "helloworld"
var :version => "1.0.0"
var :build_dir => "build"
var :bin_dir => "bin"
var :src_dir => "src"

# Primary tasks
desc "Package up the XPI file for release"
task :package => [:prepare, :build_xpi]

desc "Cleanup temporary files"
gen Clean, :clean
var[:clean] << "build"

desc "Cleanup all generated artifacts"
gen Clean, :clobber
var[:clobber] << "build"
var[:clobber] << "bin"

# Support tasks
jar_t = gen Archive::Zip, "#{var :build_dir}/#{var :title}", 
                  :files     => sys["#{var :src_dir}/jar/**/*"],
                  :rootdir   => "#{var :src_dir}/jar",
                  :extension => ".jar"
task :build_jar => jar_t.path

xpi_t = gen Archive::Zip, "#{var :bin_dir}/#{var :title}",
                  :version   => "#{var :version}",
                  :files     => sys["#{var :build_dir}/**/*"],
                  :rootdir   => "#{var :build_dir}",
                  :extension => ".xpi"
task :build_xpi => xpi_t.path           

task :prepare => [:clean, :build_jar] do |t|
  sys.mkdir_p "#{var :build_dir}/chrome"
  sys.mv jar_t.path, "#{var :build_dir}/chrome/#{var :title}.jar"
  sys.cp sys["#{var :src_dir}/install/**/*"], "#{var :build_dir}"
end

YAML in Ruby Tutorial

UPDATE 2011-01-31: I have posted a newer tutorial which is probably going to be more useful to you than this one: YAML Tutorial

So you’ve got all these tasty ruby objects lying around in memory, and they’re going to be lost when your program ends. Such a tragic end. What’s a robot to do? Why, store them to disk in a language agnostic format, of course! Enter YAML, a language perfectly suited to the task, more so than it’s heavier bretheren, XML. YAML support comes built in to the ruby language, and it couldn’t be easier to use. Every object automagically gets a to_yaml method that returns a string containing appropriate YAML markup when you include the right file.

1
2
3
require 'yaml' # Assumed in future examples

puts "hello".to_yaml

Of course this works for any object, using some of that oh-so-sweet reflection. to_yaml recursively calls itself on all of your instance variables, and even knows how to handle complex structure like arrays and hashes. It even copes with cyclic references! How’s that for value?

1
2
3
4
5
6
7
8
9
10
class Square
  def initialize width, height
    @width = width
    @height = height
    @bonus = ['yo', {:msg => 'YAML 4TW'}]
    @me = self
  end
end

puts Square.new(2, 2).to_yaml

Now that you’ve got a handy YAML string you can do whatever you like with it: write it to disk, store it in a database, email it to your cousin Benny. But Benny is going to spin out – how does he reproduce your shiny ruby objects? Thoughtfully, ruby makes it just about as easy to create an object from YAML markup – in other words to go the other way. The YAML::load method takes either a string or an IO object and gives you back an object, ready to use. It’s worth noting that the initialize method is not called on the new object – a fact that will become pertinent later.

1
2
3
serialized = Square.new(2, 2).to_yaml
new_obj = YAML::load(serialized)
puts new_obj.width

Transience

The YAML serializer works in essentially the same manner as a sledgehammer. There’s no finesse – it will serialize all of your instance variables. Always. This is generally not a problem, but every now and then for reasons of space, security, beauty or public health you will have a transient variable that you really just don’t want to be serialized. There is no neat way in the supplied library to do this. You could override to_yaml and blank out the transient fields before you call super, but then you need to restore them afterwards. And what if those fields were calculated on initialization – how do you restore them when the object is deserialized?

Not to worry, our gallant hero (yours truly) has created a helper script that allows you to specify which fields are to be persisted in a declarative manner using a class attribute.

1
2
3
4
5
6
7
8
9
10
11
require 'rhnh/yaml_helper' # Assumed in future examples

class Square
  persistent :width, :height
  
  def initialize width, height
    @width = width
    @height = height
    @me = self        # @me will not be serialized
  end
end

The script also provides a post_deserialize hook that is called, not surprisingly, after deserialization. It essentially acts as initialize for deserialized objects. No setup is necessary to use this hook, it’s mere presence will attract enough attention.

1
2
3
4
5
6
7
class OnTheBall
  def post_deserialize
    puts "I'm awake!"
  end
end

YAML::load(OnTheBall.new.to_yaml)

In closing

YAML is an excellent choice for serializing your Ruby objects. Its brevity and readability give it the edge over both XML and Marshal, and with the addition of YAML Helper it becomes more flexible as well.

Resources

Straight Sailing with Magellan

Magellan is a Ruby on Rails plugin that provides a framework for abstracting navigation logic out of your views and controllers, allowing you to write neater, more reusable code.

Table of Contents

  1. Using Magellan
    1. Dynamic Links
    2. State
    3. Testing
  2. Extra Morsels
  3. Conclusion
  4. Footnotes
  5. Bonus Material

Why should I use Magellan?

The short answer is you probably shouldn’t. Sorry, thanks for stopping by, please visit the gift shop. To elaborate, many applications don’t actually have complex navigational requirements. They are more generally of the type “go from page A to page B, then from there to page C”, and that’s that. While of course Magellan can neatly express these relationships, it adds a layer of complexity to your application for questionable benefit.

Where Magellan excels is in expressing more complex requirements: “go from page A to page B, unless it’s a Thursday, in which case go to page C. If we got to page C from page A, then go to page B, otherwise go to page A”. Urgh. Where do you put this logic in a traditional rails app? You don’t want this kind of logic in your views, and if you put it in your controllers you’ll end up duplicating code. You need a better solution.

You need Magellan.

Using Magellan

To use Magellan you need to understand three concepts:

  1. Pages
  2. Links
  3. State

State is a more advanced topic, so we’ll go over that bit later on. You covered the first two in Web Coding 101, so I’ll go over them first. The only difference in Magellan’s usage of the terms “page” and “links” is a level of abstraction. Simply, a Magellan page represents a URL (rails or otherwise). Drop the following code into your environment.rb:

1
2
3
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :home, {:controller => 'home', :action => 'list'}
end

Easy. To link to this page in a view, we use the nav_link_to helper in our .rhtml file instead of link_to. The first parameter is the name of the page we are currently on – in this case it is not strictly required and could be set to nil.

1
nav_link_to :current_page, :home

That in of itself isn’t particularly exciting. Where things get tasty is when we start using links. Now, in basic usage a link acts the same way as a page1. We can create a next link that is different depending on which page you are on.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :home1 do |p|
    p.url = { :controller => 'home1' }
    p.add_link :next, :home2
  end
  
  map.add_page :home2 do |p|
    p.url = { controller => 'home2' }
    p.add_link :next, :home1
  end
end

# Then in both home1.rhtml and home2.rhtml
# @current_page is either :home1 or :home2
nav_link_to @current_page, :next

As you can see we have de-coupled our navigation from the page itself. If we wanted to we could change the next link for home2 to home3 without having to change any of the code associated with home2. This makes our pages more modular and reusable, which is generally a Good Thing.

Let’s go back to our original example. I want the next link on page A to go to page B except on Thursdays, where it should go page C. The trick here is that in addition to just accepting a symbol for the link name (a “static link”), it can also accept a lambda block that is evaluated at runtime. This is a little bit more convoluted, the block needs to return not a link name, but the actual page we want to go to. While initially slightly unintuitive, it allows for more flexibility and less code than having to specify extra links.

1
2
3
4
5
6
7
8
9
10
11
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :page_a do |p|
    p.add_link :back, lambda {|pages, state|
      # Thursday is the 4th day the of week
      Time.new.wday == 4 ? pages[:page_b] : pages[:page_c]
    }
  end

  map.add_page :page_b, { :controller => 'page_b' }
  map.add_page :page_c, { :controller => 'page_c' }
end

State

State is just like session storage for your navigation logic. In fact, it actually uses a subset of session storage2. The reason we differentiate it from normal session variables is simply to keep a neat separation between our navigation logic and other modules that may require the session. In typical usage, you modify the state in your controller (using set_nav_state, and then make a decision based on that state in your navigation logic (using the state parameter). A simple example is to have a dynamic back link depending on the previous page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Both page A and B have a link to page C
def page_a; set_nav_state :back_page => 'page_a'; end;
def page_b; set_nav_state :back_page => 'page_b'; end;

# Page C
nav_link_to 'Back', :page_c, :back

# environment.rb
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :page_a, { :controller => 'page_b' }
  map.add_page :page_b, { :controller => 'page_c' }

  map.add_page :page_c, { do |p|
    p.add_link :back, lambda {|pages, state|
      pages[state[:back_page]]
    }
  end
end

Testing your navigation

As with any code, it is important to test your navigation logic. There are many ways to do this, depending on the requirements and complexity of your application. I recommend at least one class of unit tests for your logic, and also to add code to your functional tests to ensure your controllers are setting the correct state. Magellan provides one helper function here – nav_state – which returns a hash of the current state.

1
2
3
4
5
6
7
8
9
10
11
12
class UnitTest < Test::Unit::TestCase
  def setup
    @nav = RHNH::Magellan::Navigator.instance
  end
  
  def test_back_link
    state = { :homepage => :home1 }
    expected = { :controller => 'example', :action => 'home1' }
      
    assert_equal expected, @nav.get_url(:page1, :back, state)
  end
end
1
2
3
4
5
6
7
8
9
class FunctionalTest < Test::Unit::TestCase
  # Standard functional test setup code...
  
  def test_index
    get 'index'
    
    assert_equal :home1, nav_state[:homepage]
  end
end

The tests included with the example that comes with Magellan provide a more complex example of navigation testing. I highly recommend you look over them.

Extra morsels

You can specify a default link by adding a link to the map rather than a page. For instance, to specify a default :back link:

1
2
3
4
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :home, { controller => 'home' }
  map.add_link :back, :home
end

To be extra fancy, you can return extra parameters from your navigation logic that are added to the :params hash of the url. This is done by returning an array with both the page and the parameters in it.

1
2
3
4
5
6
RHNH::Magellan::Navigator.draw do |map|
  map.add_page :home, { controller => 'home' }
  map.add_link :back, lambda { |pages, state|
    [pages[:home], {:message => 'You just hit a default link'}]
  }
end

To conclude

Magellan is a great way of managing the complexity of larger projects. By abstracting navigation logic out of your controllers and views you make your project much more modular and reusable. It can even be introduced incrementally – all your old link_to calls will still work.

Footnotes

1 To be technically correct, a page acts like a link. Magellan creates default links to pages with the same name as the page. For instance, unless you specify otherwise, :home is actually a link to the page :home

2 Magellan uses session[:rhnh_navigator_state], so you may want to steer clear of that to avoid stepping on anyone’s toes.

Rails XHTML Validation with LibXML/HTML Tidy

I improved upon the XHTML validation technique I showed yesterday to add nicer error messages, and also support for local testing via HTML Tidy. HTML Tidy isn’t quite as good as W3C – for example it missed a label that was pointing to an invalid ID, but it runs hell fast. For W3C testing I’m now using libXML to parse the response to actually list the errors rather than just tell you they exist.

And it’s all customizable by setting the MARKUP_VALIDATOR environment variables. Options are: w3c, tidy, tidy_no_warnings. Tidy is the default.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def assert_valid_markup(markup=@response.body)
  ENV['MARKUP_VALIDATOR'] ||= 'tidy'
  case ENV['MARKUP_VALIDATOR']
  when 'w3c'
    # Thanks http://scottraymond.net/articles/2005/09/20/rails-xhtml-validation
    require 'net/http'
    response = Net::HTTP.start('validator.w3.org') do |w3c|
      query = 'fragment=' + CGI.escape(markup) + '&output=xml'
      w3c.post2('/check', query)
    end
    if response['x-w3c-validator-status'] != 'Valid'
      error_str = "XHTML Validation Failed:\n"
      parser = XML::Parser.new
      parser.string = response.body
      doc = parser.parse

      doc.find("//result/messages/msg").each do |msg|
        error_str += "  Line %i: %s\n" % [msg["line"], msg]
      end

      flunk error_str
    end

  when 'tidy', 'tidy_no_warnings'
    require 'tidy'
    errors = []
    Tidy.open(:input_xml => true) do |tidy|
      tidy.clean(markup)
      errors.concat(tidy.errors)
    end
    Tidy.open(:show_warnings=> (ENV['MARKUP_VALIDATOR'] != 'tidy_no_warnings')) do |tidy|
      tidy.clean(markup)
      errors.concat(tidy.errors)
    end
    if errors.length > 0
      error_str = ''
      errors.each do |e|
        error_str += e.gsub(/\n/, "\n  ")
      end
      error_str = "XHTML Validation Failed:\n  #{error_str}"
      
      assert_block(error_str) { false }
    end    
  end
end

Getting Tidy to work was an ordeal, the ruby documentation is rather lacking. It also behaves in weird ways – the call to errors returns a one element array, with all the errors bundled together in the one string.

LibXML was a little tricky – there’s no obvious way to parse an XML document in memory. You’d think XML::Document.new(xml) would do the trick, since there’s a XML::Document.file(filename) method, but that actually uses the entire XML document as the version string. Not so handy. Turns out you need to create an XML::Parser object instead, as I’ve done above. The docs don’t mention this (anywhere obvious, that is), I found a thread in the LibXML mailing list.

Testing rails

I was working on creating functional tests for some of my code today, a task made ridiculously easy by rails. To add extra value, I added an assertion (from Scott Raymond) to validate my markup against the w3c online validator:

1
2
3
4
5
6
7
8
9
10
def assert_valid_markup(markup=@response.body)
  if ENV["TEST_MARKUP"]
    require "net/http"
    response = Net::HTTP.start("validator.w3.org") do |w3c|
      query = "fragment=" + CGI.escape(markup) + "&output=xml"
      w3c.post2("/check", query)
    end
    assert_equal "Valid", response["x-w3c-validator-status"]
  end
end

The ENV test means it isn’t run by default since it slows down my tests considerably, but I don’t want to move markup checks out of the functional tests because that’s where they belong. Next step is to validate locally, which I’ve heard you can do with HTML Tidy.

Another problem is testing code that relies on DateTime.now, since this is a singleton call and not easily mockable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def pin_time
  time = DateTime.now
  DateTime.class_eval <<-EOS
    def self.now
      DateTime.parse("#{time}")
    end
  EOS
  yield time
end

# Usage
pin_time do |test_time|
  assert_equal test_time, DateTime.now
  sleep 2
  assert_equal test_time, DateTime.now
end

I haven’t found a neat way of resetting the behaviour of now. Using load 'date.rb' works but produces warnings for redefined constants. I couldn’t get either aliasing the original method, undefining the new one, or even just calling Date.now to work.

UPDATE: Ah, how young I was. A better way to do this is to use a library like mocha

YAML persistence

Fixed up my persistence code to not have to specify variables as an array, and committed my changes to CVS. Funny that on the day I got developer access to clxmlserial, I switched it out of my project in favour of YAML. Of course, I need to add a persistent attribute to that also, but it works a little different from XML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class Object
  def self._persist klass
    begin
      @@persist
    rescue
      @@persist = {}
    end
    @@persist[klass] = [] if !@@persist[klass]
    @@persist[klass]
  end

  def self._persist_with_parent klass
    begin
      @@persist
    rescue
      @@persist = {}
    end
    p = nil
    while (!p) && klass
      p = @@persist[klass.to_s]      
      klass = klass.superclass
    end
    p
  end

  def self.persistent *var
    p = self._persist(self.to_s)
    for i in (0..var.length-1)
      var[i] = var[i].to_s
    end
    p.concat(var)
  end

  def to_yaml ( opts = {} )       
    p = self.class._persist_with_parent(self.class)
   
    if p.size > 0
      YAML::quick_emit( object_id, opts ) do |out|
        out.map( taguri, to_yaml_style ) do |map|
          p.each do |m|
            map.add( m, instance_variable_get( '@' + m ) )
          end
        end
      end
    else
      YAML::quick_emit( object_id, opts ) do |out|
        out.map( taguri, to_yaml_style ) do |map|
                                  to_yaml_properties.each do |m|
            map.add( m[1..-1], instance_variable_get( m ) )
          end
        end
      end
    end
  end

  def save(filename)
    File.open( filename + '.yaml', 'w' ) do |out|
      YAML.dump( self, out )
    end
  end
end

XML Serialization and Persistence

I’ve been using cl/xmlserial to save/load my levels. Unfortunately, it doesn’t have a good mechanism for making variables transient – it just dumps every instance variable you’ve got. UNACCEPTABLE. So i patched it a bit. Now we can do something like this:

1
2
3
4
5
6
class Actor
  include XmlSerialization
  attr_accessor :name, :location, :last_location

  persistent [:name, :location]
end

I needed a bit of metaprogramming to get that persistent attribute to work properly. It basically adds a class method ‘persistent’ to any class that includes XmlSerialization, and then provides an accessor for use in the instance_data_to_xml method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def XmlSerialization.append_features(includingClass)
  includingClass.class_eval <<-EOS
      def self.persistent var
        var = [var] if !var.kind_of?(Array)
          
        @@persist = var
        for i in (0..@@persist.length-1)
          @@persist[i] = @@persist[i].to_s
        end
      end
  
      def self.persist
        @@persist
      end
    EOS

# Rest snipped

I’d like to get rid of the array in the call (so you can just keep adding on parameters like attr_accessor), but I’m not sure how to do it. Unfortunately I couldn’t figure out a good way to ensure that @@persist was defined for all classes, so I’ve currently just wrapped the access in a begin/rescue (thinking now … I could do this in persistent to remove the array thing … hmmmm)

1
2
3
4
5
6
7
8
9
10
11
12
module XmlSerialization
def instance_data_to_xml(element)    
  begin    
    p = self.class.persist
  rescue
    p = nil
  end

  instance_variables.each do |instanceVarName|
    if !p || p.include?(instanceVarName[1..instanceVarName.length])

# Rest snipped

One other small addition is the calling of a post_from_xml instance method (if it exists) after deserialization, to allow the object to do extra initialization, since the constructor has already been called and the instance vars are populated directly (doesn’t use accessor methods).

At some point I’ll have to write up some proper tests and submit it back to the author. I think it’s a worthwhile addition to the code, at least in idea if not implementation.

This morning I add moving platforms, coins (collectibles) that can be attached to those moving platforms, fixed up the XML code as detailed above, and fixed up the collision response to feel a bit nicer.

Link of the day goes to DWEMTHY_S ARRAY, a fun ruby adventure. I’ve linked to the poignant guide before, but I feel it’s worth another mention.

Formatting numbers in ruby

Just for my own reference, this is how you format numbers in Ruby:

1
puts "%.2f (float), %d (decimal)" % [1.23456, 5]

OpenGL Text with Imlib2

Getting text into your openGL apps is simple with the use of the imlib2 library (developed by the enlightenment team). If you have the good fortune of working on a debian system, the libraries are in apt:

1
sudo apt-get install libimlib2-ruby

The examples at the ruby bindings webpage show the basics of loading an image and writing text, all that remains is converting an Imlib2::Image into an OpenGL texture – just switch the data around from BGRA to RGBA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Imlib2::Image
  # Convert data to format compatible with OpenGL
  def rgba_data
    new_data = Array.new(data.size)
    i = 0
    for i in (0..data.size/4-1)
      new_data[i*4] = data[i*4+2] 
      new_data[i*4+1] = data[i*4+1]
      new_data[i*4+2] = data[i*4+0]
      new_data[i*4+3] = data[i*4+3]
    end
    return new_data.pack('C*')
  end
end

… and you can pass it straight into GL::TexImage2D. Follows is the TextMananger class I wrote tonight. Still haven’t quite mastered imlib2 – note the resize hack to get the correct format. If anyone has any suggestions I’m all ears.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
require 'imlib2'
 
class OpenGLTextManager
  def initialize
    @textures = Hash.new
 
    blank_filename = 'res/img/blank.png' # 1x1 png image
    @blank = Imlib2::Image::load(blank_filename)
 
    # Probably better to copy the font locally and load it from there
    Imlib2::Font::add_path '/usr/share/fonts/truetype/ttf-bitstream-vera'
    fontname = 'Vera/10'
    
    @font = Imlib2::Font.new(fontname)
  end
  
  def render text, x, y
    texture = @textures[text]
    texture = create_texture(text) if texture == nil
 
    # Draw a quad with the text texture
    # Looks best with Ortho 1:1 projection
    GL::Enable(GL::TEXTURE_2D);
    GL::LoadIdentity();
    GL.BindTexture(GL::TEXTURE_2D, texture.ogl);
    GL::Begin(GL::QUADS);
        GL::TexCoord(0.0, 0.0); GL::Vertex(x, y)
        GL::TexCoord(0.0, 1.0); GL::Vertex(x, texture.height + y)
        GL::TexCoord(1.0, 1.0); GL::Vertex(texture.width + x, texture.height + y)
        GL::TexCoord(1.0, 0.0); GL::Vertex(texture.width + x, y)
    GL::End()
  end
 
  def create_texture text
    fw, fh = @font.size(text)
 
    # This is a hack
    # Image.new doesn't have the right color format (or something),
    # so just resize a preloaded png
    image = @blank.clone
    image.crop_scaled! 0,0,image.width, image.height, fw, fh
    image.fill_rect [0,0], [image.w, image.h], Imlib2::Color::RgbaColor.new(0,0,0,255)
 
    image.draw_text @font, text, 0, 0, Imlib2::Color::WHITE
 
    texture = TextTexture.new
    texture.ogl = GL::GenTextures(1)[0];
    GL.BindTexture(GL::TEXTURE_2D, texture.ogl);
    GL.TexParameteri(GL::TEXTURE_2D, GL::TEXTURE_WRAP_S, GL::CLAMP);
    GL.TexParameteri(GL::TEXTURE_2D, GL::TEXTURE_WRAP_T, GL::CLAMP);
    GL.TexParameteri(GL::TEXTURE_2D, GL::TEXTURE_MAG_FILTER,GL::LINEAR);
    GL.TexParameteri(GL::TEXTURE_2D, GL::TEXTURE_MIN_FILTER,GL::LINEAR);
    GL.TexImage2D(GL::TEXTURE_2D, 0, GL::RGBA, image.width,
                  image.height, 0, GL::RGBA, GL::UNSIGNED_BYTE, image.rgba_data);
    texture.width = image.width
    texture.height = image.height
    image.delete!
    @textures[text] = texture
    return texture
  end
 
  def get_texture text
    texture = @textures[text]
    texture = create_texture(text) if texture == nil
    texture
  end
end
 
class TextTexture
  attr_accessor :ogl
  attr_accessor :width
  attr_accessor :height
end
 
class Imlib2::Image
  # Convert data to format compatible with OpenGL
  def rgba_data
    new_data = Array.new(data.size)
    i = 0
    for i in (0..data.size/4-1)
      new_data[i*4] = data[i*4+2] 
      new_data[i*4+1] = data[i*4+1]
      new_data[i*4+2] = data[i*4+0]
      new_data[i*4+3] = data[i*4+3]
    end
    return new_data.pack('C*')
  end
end
1
2
3
4
5
6
7
8
9
10
11
12
13
# Usage
# ... Inside draw loop ...
GL::MatrixMode(GL::PROJECTION);
GL::LoadIdentity()
GL::Ortho(0,@viewport.x,@viewport.y,0,-1.0,1.0)
                
GL::MatrixMode(GL::MODELVIEW);
GL::LoadIdentity()
GL::Disable(GL::LIGHTING);
GL::Disable(GL::DEPTH_TEST);
   
GL::Color(1.0, 1.0, 1.0, 0.7);
OpenGLTextManager.new.render 'hello', 0, 0

LDAP Authentication

Spent the better part of the evening setting up LDAP authentication for my boxen. The portage issue I mentioned prior was because I hadn’t updated portage for like 8 months … my bad. Slapd installed without a hitch on my gentoo server, and I was even able to set it up with an SSL certificate. The problems came getting pam_ldap setup on my ubuntu client. I’m not really sure what I did, but part of my problem was installing all the packages a few days ago, and then changing my mind on the configuration today but not reinstalling the packages. As such, I learnt a handy new command, to reconfigure without reinstall:

1
2
dpkg-reconfigure libpam-ldap
dpkg-reconfigure libnss-ldap

I’d also warn against using the libnss-ldap sample nsswitch.conf without a contigency plan – I wasn’t able to execute commands (ls, sudo) after using it, and my machine wouldn’t reboot properly, even in recovery mode. Moral of the story – Always have a LiveCD handy!

I’m at the point now where everything seems to work … except passwd. When changing my password I get “passwd: Authentication information cannot be recovered”. Posted something on the forums, hopefully someone helps me out.

Laggy Donuts - Client Side Prediction

Obviously, you have to synchronize your client data with the server. How do you do this when the client is entering data? A naive solution* might send the client data and refresh itself when a confirmation or update message is received from the server. This works great on local testing, but what happens when you are getting lag? The client will notice delays between them doing something and the UI reflecting the change – a Bad Thing. The UI must be responsive at all times. The way around this problem is client-side prediction. In most cases, you can assume that the user will enter “correct” data. Think about it – how many times do you enter invalid data into a game? This means that in most cases, we can update the UI to reflect the change as soon!
as it is entered, then we double check that our assumption was correct when confirmation from the server is received, and if not, rollback the changes.

Unfortunately, It’s not quite that simple. It will work if the confirmation consists of a complete data refresh, but often this is not the case. For reasons of bandwidth, usually only incremental changes are sent to clients. Consider the following example, where we are using client-side prediction: There are 4 donuts on a table. Clients can pick up a donut, but they may only hold one at a time. If they try to pick up another, their current donut is put back on the table. There is lag between the server and clients. Imagine the following chain of events:

  1. Client picks up iced donut
  2. Client picks up cinnamon donut (iced donut goes back on table)
  3. Client receives iced donut picked up message
  4. Client receives iced donut drop message
  5. Client receives iced cinnamon donut picked up message

If we are blindly trusting updates from the server, by the end of this sequence our display will be correct. But imagine there is considerable lag between steps 3 and 4. At this point, the client will be showing that we have picked up both donuts, yet we know this to (most likely) not be the case!

One method of dealing with this problem involves keeping track of an “expected reply” from the server. This method assumes that a full data refresh is sent from the server on error (reasonable, since they are infrequent, and removes the need for sophisticated event histories). If we receive a message from the server telling us we have done something we know we have made obsolete (does not match the expected reply), we can safely ignore it. In the event that an error occurs (say, someone picked up the cinnamon donut before out message arrived at the server), we can rollback to the data provided by the server.

It is impossible to have the UI be correct at every single point in time, but using this method we have reduced the time to only when an erroneous action is made.

An easy way to test these sort of UI issues is to add a sleep(500) before the line that dispatches your messages in the server code.

  • Posted on July 27, 2005
  • Tagged code, lag, ui

A Banana a Day

SQL Optimization

Consider the following join (a is a char(32), b is varchar):

1
select * from t1, t2 where instr(t1.a, t2.b) = 0

The actual code I worked with was a bit more complex, but essentially that is it. It works fine, however performance is rather lacking. This is because for each for every row in t2, the DBMS must perform the instr() function for every row in t1 to check if t1.a is in t2.b until it finds a match.

If a field is used in a function, any indexes on that field cannot be used

That’s bad. In this case, since a is fixed length, we can rewrite the query thusly:

1
select * from t1, t2 where substr(t1.a,0,32) = t2.b

This way, the substr is only performed once for each row of b, and the result can quickly be checked against an index on t2.b.

In the particular case I was working on (there were multiple joins to be rewritten), this cut execution time down from 2 minutes to just under a second.

Lonely Hammer Syndrome

When all you have is a hammer, every problem looks like a nail. After optimising the query referred to above, I found out its context. It’s a data collection problem consisting of a server log where each url contains a UID, a csv that contains details about each uid (page title, etc…), and the two needed to be collated. The original process was:

  1. Import both tables into an access db (the server log into a table with one field – whole_line)
  2. Using link tables and SQL string functions, convert the server log into a friendlier format in a table on a local oracle server
  3. Collate the two files using the above mentioned sql
  4. Export collated data to csv (from access)
  5. Append to master file

Basically, we’re going from csv, to access, to oracle, back through access out to csv again. Additionally, many of these steps required manual intervention. This process has to be done every week – how can it be optimised? Of course there are many ways – I looked into Java and JDBC to cut access out of the loop, but thought if I’m going to the effort I may as well cut out oracle as well. Perl is reknowned for its log-munging ability, so I put together a script (rather quickly – thank you regular expressions!) which can now automatically give formatted data in barely a few minutes, compared to up to 60 minutes of manual labour using the old method.

The moral – expose yourself to as many tools as possible. You don’t need to be an expert in them (I had to google pretty much every perl command :S), but if you know the pros and cons of each you can cut down both development and operational time substantially.

Point in triangle

New way

I’m a happy man. The elusive collision response is closer to my grasp.

A pretty flower Another pretty flower