Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Rails XHTML Validation with LibXML/HTML Tidy

I improved upon the XHTML validation technique I showed yesterday to add nicer error messages, and also support for local testing via HTML Tidy. HTML Tidy isn’t quite as good as W3C – for example it missed a label that was pointing to an invalid ID, but it runs hell fast. For W3C testing I’m now using libXML to parse the response to actually list the errors rather than just tell you they exist.

And it’s all customizable by setting the MARKUP_VALIDATOR environment variables. Options are: w3c, tidy, tidy_no_warnings. Tidy is the default.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def assert_valid_markup(markup=@response.body)
  ENV['MARKUP_VALIDATOR'] ||= 'tidy'
  case ENV['MARKUP_VALIDATOR']
  when 'w3c'
    # Thanks http://scottraymond.net/articles/2005/09/20/rails-xhtml-validation
    require 'net/http'
    response = Net::HTTP.start('validator.w3.org') do |w3c|
      query = 'fragment=' + CGI.escape(markup) + '&output=xml'
      w3c.post2('/check', query)
    end
    if response['x-w3c-validator-status'] != 'Valid'
      error_str = "XHTML Validation Failed:\n"
      parser = XML::Parser.new
      parser.string = response.body
      doc = parser.parse

      doc.find("//result/messages/msg").each do |msg|
        error_str += "  Line %i: %s\n" % [msg["line"], msg]
      end

      flunk error_str
    end

  when 'tidy', 'tidy_no_warnings'
    require 'tidy'
    errors = []
    Tidy.open(:input_xml => true) do |tidy|
      tidy.clean(markup)
      errors.concat(tidy.errors)
    end
    Tidy.open(:show_warnings=> (ENV['MARKUP_VALIDATOR'] != 'tidy_no_warnings')) do |tidy|
      tidy.clean(markup)
      errors.concat(tidy.errors)
    end
    if errors.length > 0
      error_str = ''
      errors.each do |e|
        error_str += e.gsub(/\n/, "\n  ")
      end
      error_str = "XHTML Validation Failed:\n  #{error_str}"
      
      assert_block(error_str) { false }
    end    
  end
end

Getting Tidy to work was an ordeal, the ruby documentation is rather lacking. It also behaves in weird ways – the call to errors returns a one element array, with all the errors bundled together in the one string.

LibXML was a little tricky – there’s no obvious way to parse an XML document in memory. You’d think XML::Document.new(xml) would do the trick, since there’s a XML::Document.file(filename) method, but that actually uses the entire XML document as the version string. Not so handy. Turns out you need to create an XML::Parser object instead, as I’ve done above. The docs don’t mention this (anywhere obvious, that is), I found a thread in the LibXML mailing list.

Testing rails

I was working on creating functional tests for some of my code today, a task made ridiculously easy by rails. To add extra value, I added an assertion (from Scott Raymond) to validate my markup against the w3c online validator:

1
2
3
4
5
6
7
8
9
10
def assert_valid_markup(markup=@response.body)
  if ENV["TEST_MARKUP"]
    require "net/http"
    response = Net::HTTP.start("validator.w3.org") do |w3c|
      query = "fragment=" + CGI.escape(markup) + "&output=xml"
      w3c.post2("/check", query)
    end
    assert_equal "Valid", response["x-w3c-validator-status"]
  end
end

The ENV test means it isn’t run by default since it slows down my tests considerably, but I don’t want to move markup checks out of the functional tests because that’s where they belong. Next step is to validate locally, which I’ve heard you can do with HTML Tidy.

Another problem is testing code that relies on DateTime.now, since this is a singleton call and not easily mockable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def pin_time
  time = DateTime.now
  DateTime.class_eval <<-EOS
    def self.now
      DateTime.parse("#{time}")
    end
  EOS
  yield time
end

# Usage
pin_time do |test_time|
  assert_equal test_time, DateTime.now
  sleep 2
  assert_equal test_time, DateTime.now
end

I haven’t found a neat way of resetting the behaviour of now. Using load 'date.rb' works but produces warnings for redefined constants. I couldn’t get either aliasing the original method, undefining the new one, or even just calling Date.now to work.

UPDATE: Ah, how young I was. A better way to do this is to use a library like mocha

A pretty flower Another pretty flower