Rails XHTML Validation with LibXML/HTML Tidy
I improved upon the XHTML validation technique I showed yesterday to add nicer error messages, and also support for local testing via HTML Tidy. HTML Tidy isn’t quite as good as W3C – for example it missed a label that was pointing to an invalid ID, but it runs hell fast. For W3C testing I’m now using libXML to parse the response to actually list the errors rather than just tell you they exist.
And it’s all customizable by setting the MARKUP_VALIDATOR environment variables. Options are: w3c, tidy, tidy_no_warnings. Tidy is the default.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
def assert_valid_markup(markup=@response.body) ENV['MARKUP_VALIDATOR'] ||= 'tidy' case ENV['MARKUP_VALIDATOR'] when 'w3c' # Thanks http://scottraymond.net/articles/2005/09/20/rails-xhtml-validation require 'net/http' response = Net::HTTP.start('validator.w3.org') do |w3c| query = 'fragment=' + CGI.escape(markup) + '&output=xml' w3c.post2('/check', query) end if response['x-w3c-validator-status'] != 'Valid' error_str = "XHTML Validation Failed:\n" parser = XML::Parser.new parser.string = response.body doc = parser.parse doc.find("//result/messages/msg").each do |msg| error_str += " Line %i: %s\n" % [msg["line"], msg] end flunk error_str end when 'tidy', 'tidy_no_warnings' require 'tidy' errors = [] Tidy.open(:input_xml => true) do |tidy| tidy.clean(markup) errors.concat(tidy.errors) end Tidy.open(:show_warnings=> (ENV['MARKUP_VALIDATOR'] != 'tidy_no_warnings')) do |tidy| tidy.clean(markup) errors.concat(tidy.errors) end if errors.length > 0 error_str = '' errors.each do |e| error_str += e.gsub(/\n/, "\n ") end error_str = "XHTML Validation Failed:\n #{error_str}" assert_block(error_str) { false } end end end |
Getting Tidy to work was an ordeal, the ruby documentation is rather lacking. It also behaves in weird ways – the call to errors returns a one element array, with all the errors bundled together in the one string.
LibXML was a little tricky – there’s no obvious way to parse an XML document in memory. You’d think XML::Document.new(xml) would do the trick, since there’s a XML::Document.file(filename) method, but that actually uses the entire XML document as the version string. Not so handy. Turns out you need to create an XML::Parser object instead, as I’ve done above. The docs don’t mention this (anywhere obvious, that is), I found a thread in the LibXML mailing list.
Testing rails
I was working on creating functional tests for some of my code today, a task made ridiculously easy by rails. To add extra value, I added an assertion (from Scott Raymond) to validate my markup against the w3c online validator:
1 2 3 4 5 6 7 8 9 10 |
def assert_valid_markup(markup=@response.body) if ENV["TEST_MARKUP"] require "net/http" response = Net::HTTP.start("validator.w3.org") do |w3c| query = "fragment=" + CGI.escape(markup) + "&output=xml" w3c.post2("/check", query) end assert_equal "Valid", response["x-w3c-validator-status"] end end |
The ENV test means it isn’t run by default since it slows down my tests considerably, but I don’t want to move markup checks out of the functional tests because that’s where they belong. Next step is to validate locally, which I’ve heard you can do with HTML Tidy.
Another problem is testing code that relies on DateTime.now, since this is a singleton call and not easily mockable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
def pin_time time = DateTime.now DateTime.class_eval <<-EOS def self.now DateTime.parse("#{time}") end EOS yield time end # Usage pin_time do |test_time| assert_equal test_time, DateTime.now sleep 2 assert_equal test_time, DateTime.now end |
I haven’t found a neat way of resetting the behaviour of now. Using load 'date.rb' works but produces warnings for redefined constants. I couldn’t get either aliasing the original method, undefining the new one, or even just calling Date.now to work.
UPDATE: Ah, how young I was. A better way to do this is to use a library like mocha