Rake tab completion with caching and namespace support
UPDATE: It now invalidates the cache if you touch lib/tasks/*.rake, for those using it with rails (like me)
There’s a few articles on the net regarding rake tab completion, I had to combine a few of them to get what I wanted:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
#!/usr/bin/env ruby # Complete rake tasks script for bash # Save it somewhere and then add # complete -C path/to/script -o default rake # to your ~/.bashrc # Xavier Shay (http://rhnh.net), combining work from # Francis Hwang ( http://fhwang.net/ ) - http://fhwang.net/rb/rake-complete.rb # Nicholas Seckar <nseckar@gmail.com> - http://www.webtypes.com/2006/03/31/rake-completion-script-that-handles-namespaces # Saimon Moore <saimon@webtypes.com> require 'fileutils' RAKEFILES = ['rakefile', 'Rakefile', 'rakefile.rb', 'Rakefile.rb'] exit 0 unless RAKEFILES.any? { |rf| File.file?(File.join(Dir.pwd, rf)) } exit 0 unless /^rake\b/ =~ ENV["COMP_LINE"] after_match = $' task_match = (after_match.empty? || after_match =~ /\s$/) ? nil : after_match.split.last cache_dir = File.join( ENV['HOME'], '.rake', 'tc_cache' ) FileUtils.mkdir_p cache_dir rakefile = RAKEFILES.detect { |rf| File.file?(File.join(Dir.pwd, rf)) } rakefile_path = File.join( Dir.pwd, rakefile ) cache_file = File.join( cache_dir, rakefile_path.gsub( %r{/}, '_' ) ) if File.exist?( cache_file ) && File.mtime( cache_file ) >= (Dir['lib/tasks/*.rake'] << rakefile).collect {|x| File.mtime(x) }.max task_lines = File.read( cache_file ) else task_lines = `rake --silent --tasks` File.open( cache_file, 'w' ) do |f| f << task_lines; end end tasks = task_lines.split("\n")[1..-1].collect {|line| line.split[1]} tasks = tasks.select {|t| /^#{Regexp.escape task_match}/ =~ t} if task_match # handle namespaces if task_match =~ /^([-\w:]+:)/ upto_last_colon = $1 after_match = $' tasks = tasks.collect { |t| (t =~ /^#{Regexp.escape upto_last_colon}([-\w:]+)$/) ? "#{$1}" : t } end puts tasks exit 0 |
Finding related content with Sphinx
Previous efforts to find related posts with the classifier gem yielded no fruit, so I tried another approach using sphinx. Turned out to be a winner.
The basic theory is to index all posts by tag, then to find related posts just use the current post’s tags as a search string. Remember to exclude the current post from the search results. For this blog, I use tags for the main categories, which were corrupting the results – most everything is tagged ‘Ruby’ so it doesn’t add any value in determining likeness. So rather than indexing all tags I excluded some of the main ones.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
class Post < ActiveRecord::Base has_many :searchable_tags, :through => :taggings, :source => :tag, :conditions => "tags.name NOT IN ('Ruby', 'Code', 'Life')" def related_posts(number = 3) Post.search(:limit => number + 1, :conditions => { :tag_list => tag_list.join("|") }).reject {|x| x == self }.first(number) end define_index do indexes searchable_tags(:name), :as => :tag_list # If you want to use this for normal search as well you'll have to # add in title/body here as well end end |
For a more complete example, see the relevant RHNH commits: cdc0bf and d4d844
Showing links to related content is a good way to stop the bottom of your page from being a ‘dead end’. In the event that no related posts are found, I’m linking to the archives instead.
Hash trumps case
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Two equivalent functions def rgb(color) case color when :red then 'ff0000' when :green then '00ff00' when :blue then '0000ff' else '000000' # Default to black end end def rgb2(color) { :red => 'ff0000', :green => '00ff00', :blue => '0000ff' }[color] || '000000' end |
Even though these functions are equivalent, the second carries more semantic weight – it maps a symbol directly to a color. The case sample makes no such guarantees since you can execute any arbitrary code in the then block. In addition, a hash is easier to work with – you can easily iterate over the keys, extract to another method if you need reuse, or query it for other properties (for example, 3 colors are available). It is also easier to read – both aesthetically and because it contains fewer tokens. In almost all circumstances I will prefer a hash over a case statement.
Relationships in data are easier to comprehend and manipulate than relationships in code.
Contextual Composition With Delegation
I’ve had some models getting rather large recently. This makes them hard to comprehend and makes the source difficult to browse. A lot of the time, a big chunk of functionality is fairly context specific – it is only relevant to one particular part of my application (reporting, data integration, etc…). Thoughtbot presented one way to do this recently by adding methods to the model that return another model with the extra goodness.
That’s not bad, but it still pollutes the class with methods that most users won’t care about. We can just decorate the class with extra methods at the time (context) that we need them. My first go at doing this used the extend method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
class PurchaseOrder attr_reader :id end module Reports::PurchaseOrderMethods def description "A Purchase Order" end end class ReportMakerWithExtend def self.report_for(po) po.extend(Reports::PurchaseOrderMethods) "#{po.id}: #{po.description}" end end |
This has a few edge case problems though.
- It can potentially override methods in our base class. Imagine if PurchaseOrder#description was defined as private, our module would override this defenition resulting in probably breakage.
- It is inelegant to test –
extendwill override any existing stubs, so you need to stub it out. This is unintuitive and may have unintended consequences, for instance if the class is also usingextendin a manner that doesn’t interfere with your stubs.
1 2 3 4 5 6 7 8 9 10 11 |
# Testing extended PurchaseOrder is inelegant describe 'ReportMakerWithExtend#report_for' do it 'returns a line containing both ID and description' do po = stub( :id => 1 :description => "hello", :extend => nil # :( ) ReportMaker.report_for(po).should == "1: hello" end end |
Ruby provides another method to achieve what we want in the form of SimpleDelegator. Basically, it passes on any methods not defined on itself to the object specified in the constructor. This way we can wrap another object without fear of interferring with its internals nor our stubs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
require 'delegate' class Reports::PurchaseOrder < SimpleDelegator def description "A Purchase Order" end end class ReportMaker def self.report_for(po) po = Reports::PurchaseOrder.new(po) "#{po.id}: #{po.description}" end end |
Much nicer. Of course, we would have specs for Reports::PurchaseOrder in addition to PurchaseOrder – this split allows us to keep our tests focussed and easy to read. Using delegation to split up your models allows you to separate code into areas where it is most relevant – helping keep both your models and your tests easy to read and maintain.
What's new in Enki - Admin Interface
I’ve just finished up a fairly major over haul of the Enki admin area, finally throwing away the ugly SimpleLog stylings. Features include:
- New visual style, heavily inspired by the new Habari Monolith look
- New dashboard, with space to add your own data (feedburner subscribers? analytics data?)
- Nicer forms (thanks formtastic!)
- AJAX goodness for UI snappiness
- Undo for item deletion (no more alert boxes!)
Screens:
Of course there’s still more I’d like to add (in particular to do with tags), but isn’t that always the case? I think it’s pretty swish – if you’ve already got an install just pull from master, if you think you might like an install, head over to the Enki website.
Testing flash.now with RSpec
flash.now has always been a pain to test. The the traditional rails approach is to use assert_select and find it in your views. This clearly doesn’t work if you want to test your controller in isolation.
Other folks have found work arounds to the problem, including mocking out the flash or monkey patching it.
These solutions feel a bit like using a sledgehammer to me. If you’re going to monkey patch/mock something, you want it to be as discreet as possible so to minimize the chance of the implementation changing underneath you and also to reduce the affect on other areas of your application. Also, why duplicate perfectly good code that is provided elsewhere?
The real problem with testing flash.now is that it gets cleaned up (via #sweep) at the end of the action before you get to test anything. So let’s solve that problem and that problem only: disable sweeping of flash.now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# spec/spec_helper.rb module DisableFlashSweeping def sweep end end # A spec describe BogusController, "handling GET to #index" do it "sets flash.now[:message]" do @controller.instance_eval { flash.extend(DisableFlashSweeping) } get :index flash.now[:message].should_not be_nil end end |
instance_eval is used to access the flash, since it’s a protected method, and we extend with the minimum possible code to do what we want – blanking out the sweep method. This should not cause problems because sweeping is only relevant across multiple requests, which we shouldn’t be doing in our controller specs.
Classifier gem rubbish for recommending posts
Chatting with Tim today he suggested maybe using Classifier::LSI would be a cool way to offer ‘related posts’ suggestions for a blog.
Not really knowing anything about it, I whipped up a prototype rake task. It creates the index then marshals it to disk because it takes ages to create and it’s not much fun to play with when you have to wait minutes each time. It then presents 3 related suggestions for each post.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
require 'classifier' namespace :lsi do task :test => :environment do if File.exists?("lsidata.dump") lsi = File.open("lsidata.dump") {|f| Marshal.load(f) } else lsi = Classifier::LSI.new Post.find(:all, :order => 'published_at DESC').each do |post| text = post.body categories = post.tags.collect(&:name) puts "Indexing " + post.title lsi.add_item(text, *categories) end File.open("lsidata.dump", "w") {|f| Marshal.dump(lsi, f) } end Post.find(:all).each do |post| puts post.title puts lsi.find_related(post.body, 3).collect {|i| Post.find_by_body(i).title }.inspect end end end |
Here’s the data for my last 5 articles. I don’t know what I was expecting, but this just doesn’t seem very helpful. I don’t have a very rich set of tags on my posts, so that probably has something to do with it. Was kind of hoping it would just look at text and all just work * waves hands *.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
Seagate 500Gb FreeAgent Pro external drive - first impressions - Building Firefox Extensions - The Colemak Diaries - Counting ActiveRecord associations: count, size or length? Coconut Oats - The Colemak Diaries - Summertime Tagliarini - Mary Iron Chef - Chocolate Jaffa Boxes Mary Iron Chef - Chocolate Jaffa Boxes - The Colemak Diaries - Building Firefox Extensions - Summertime Tagliarini Paypal IPN fails date standards - Building Firefox Extensions - Straight Sailing with Magellan - The Colemak Diaries I'm number 8! - Extending Rails - Practical Hpricot: SVG - Day of days |
Next step is to try tagging my stuff better and seeing if that helps out.
Getting classifier working
Quick side note – pure ruby classifier doesn’t work out of the box with rails because it also redefines Array#sum. If you install the GSL lib and the ruby bindings (see classifier docs) you’ll still need this one line patch to classifier to get it to work:
1 2 3 4 5 6 7 8 9 10 11 12 |
Index: lib/classifier/lsi.rb
===================================================================
--- lib/classifier/lsi.rb (revision 31)
+++ lib/classifier/lsi.rb (working copy)
@@ -25,6 +25,8 @@
# please consult Wikipedia[http://en.wikipedia.org/wiki/Latent_Semantic_Indexing].
class LSI
+ include GSL if $GSL
+
attr_reader :word_list
attr_accessor :auto_rebuild
|
UPDATE: I’ve forked classifier on github, so you can just grab that version if you like.
Nginx, OpenID delegation and YADIS
Typically OpenID delegation reads delegation information out of HTML headers on your home page:
1 2 |
<link rel="openid.server" ref="http://server.myid.net/server" /> <link rel="openid.delegate" href="http://xaviershay.myid.net/" /> |
The problem with this is that any client trying to discover this information needs to fetch your entire home page. If that client is your page (commenting on your own entry, for instance), that request can get queued up behind the same mongrel that was serving the original request, which of course now won’t complete until the OpenID delegation request times out.
There is another way to provide delegation information. Clients will request your home page with an accept header of application/xrds+xml – and you can use that information to serve up a static YADIS file rather than your home page. Mine looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
<xrds:XRDS xmlns:xrds="xri://$xrds" xmlns="xri://$xrd*($v*2.0)"
xmlns:openid="http://openid.net/xmlns/1.0">
<XRD>
<Service priority="1">
<Type>http://openid.net/signon/1.0</Type>
<URI>https://server.myid.net/server</URI>
<openid:Delegate>https://xaviershay.myid.net/</openid:Delegate>
</Service>
</XRD>
</xrds:XRDS>
|
And I serve it up with this Nginx rewrite rule:
1 2 3 |
if ($http_accept ~* application/xrds\+xml) {
rewrite (.*) $1/yadis.xrdf break;
}
|
Try it in the comfort of your own home:
1 |
curl -H 'Accept: application/xrds+xml' http://rhnh.net |
Powered by Enki
Finally got this blog switched over to Enki. Main feed has moved to feed burner. Please report any weirdness to the relevant authorities.
For some extra content, here’s what’s happening in the Enki world:
- Moved to github (keeping gitorious as a mirror)
- Tim has a functional multiple authors fork
- API is functional if you want to kick the tyres a bit, still needs some work though. Here is some code to publish from VIM
Paypal IPN fails date standards
Paypal Instant Payment Notification lets you know when you have received a paypal payment. Presumably, you then mark an order as paid or something. Do not use the current time as the paid_at date – despite the ‘instant’ in the title it can be many days later. You should use the payment_date provided by paypal. Your accountant will thank you.
payment_date is:
Time/Date stamp generated by PayPal system [format: “18:30:30 Jan 1, 2000 PST”]
Seen that date format before? No? Didn’t think so. That’s no RFC I’ve seen before. The popular Paypal gem uses Time.parse, but this is incorrect (as of 2.0.0). Observe:
1 2 3 4 |
>> Time.parse("18:30:30 Mar 28, 2008 PST") => Fri Mar 28 18:30:30 1100 2008 # Good >> Time.parse("18:30:30 Feb 28, 2008 PST") => Fri Mar 28 18:30:30 1100 2008 # FAIL |
Also, Time only has a range of about a week, so that could screw you over come any major system failures (either you or paypal). Also note the payment_date is in PST, which unless you’re on the right side of the US is fairly useless. I recommend the following:
1 2 |
>> DateTime.strptime("18:30:30 Jan 1, 2000 PST", "%H:%M:%S %b %e, %Y %Z").new_offset(0) => Sun, 02 Jan 2000 02:30:30 0000 |
The un-intuitive new_offset converts to UTC. Patch submitted. I hate you, Paypal.
Absence, with suitable recompense
I’m going on holidays until the end of January. The off line kind of holiday where I don’t see a computer. So sad.
So here is a tasty treat for you to devour until I return. A sneak preview of a Fashionable New Blogging App™ named Enki. It is an alternative to Mephisto and SimpleLog that is built on the principles espoused in my prior writings. The website is built using Enki itself, and the port of this site from mephisto is just about finished, so you know you’re getting code that’s got a real life application. There’s still a few rough edges, but it’s ready enough to start building something with if you don’t mind getting your hands a little dirty. I’ve set up a mailing list for it which I’ll be catching up on once I get back.
Unobtrusive live comment preview with jQuery
Live preview is shiny. First get your self a URL that renders a comment. In rails maybe something like the following.
1 2 3 4 5 6 7 8 9 |
def new @comment = Comment.build_for_preview(params[:comment]) respond_to do |format| format.js do render :partial => 'comment.html.erb' end end end |
Now you should have a form or div with an ID something like “new_comment”. Just drop in the following JS (you may need to customize the submit_url).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
$(function() { // onload
var comment_form = $('#new_comment')
var input_elements = comment_form.find(':text, textarea')
var submit_url = '/comments/new'
var fetch_comment_preview = function() {
jQuery.ajax({
data: comment_form.serialize(),
url: submit_url,
timeout: 2000,
error: function() {
console.log("Failed to submit");
},
success: function(r) {
if ($('#comment-preview').length == 0) {
comment_form.after('<h2>Your comment will look like this:</h2><div id="comment-preview"></div>')
}
$('#comment-preview').html(r)
}
})
}
input_elements.keyup(function () {
fetch_comment_preview.only_every(1000);
})
if (input_elements.any(function() { return $(this).val().length > 0 }))
fetch_comment_preview();
})
|
The only_every function is they key to this piece – it ensures that an AJAX request will be sent at most only once a second so you don’t overload your server or your client’s connection.
Obviously you’ll need jQuery, less obviously you’ll also need these support functions
1 2 3 4 5 6 7 8 9 10 11 12 13 |
// Based on http://www.germanforblack.com/javascript-sleeping-keypress-delays-and-bashing-bad-articles
Function.prototype.only_every = function (millisecond_delay) {
if (!window.only_every_func)
{
var function_object = this;
window.only_every_func = setTimeout(function() { function_object(); window.only_every_func = null}, millisecond_delay);
}
};
// jQuery extensions
jQuery.prototype.any = function(callback) {
return (this.filter(callback).length > 0)
}
|
Viola, now you’re shimmering in awesomeness. Demo up soon, but it’s similar to what you see on this blog (though this blog is done with inline prototype).
AtomFeedHelper produces invalid feeds
Summary: atom_feed is broken until changeset 8529
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# http://api.rubyonrails.org/classes/ActionView/Helpers/AtomFeedHelper.html#M000931 atom_feed do |feed| feed.title("My great blog!") feed.updated((@posts.first.created_at)) for post in @posts feed.entry(post) do |entry| entry.title(post.title) entry.content(post.body, :type => 'html') entry.author do |author| author.name("DHH") end end end end |
Produces the following feed (rails 2.0.2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<?xml version="1.0" encoding="UTF-8"?> <feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom"> <id>tag:localhost:posts</id> <link type="text/html" rel="alternate" href="http://localhost:3000"/> <title>My great blog!</title> <updated>2007-12-23T04:23:07+11:00</updated> <entry> <id>tag:localhost:3000:Post1</id> <published>2007-12-23T04:23:07+11:00</published> <updated>2007-12-30T15:29:55+11:00</updated> <link type="text/html" rel="alternate" href="http://localhost:3000/posts/1"/> <title>First post</title> <content type="html">Check out the first post</content> <author> <name>DHH</name> </author> </entry> </feed> |
Let’s run that through the feed validator
1 2 3 |
line 3, column 25: id is not a valid TAG line 2, column 0: Missing atom:link with rel="self" line 8, column 32: id is not a valid TAG |
Oh dear. Not a happy result. Let’s fix it.
Problem the first is the feed ID tag. It doesn’t include a date, as per the Tag URI specification. This is a little bit tricky – you can’t just add Time.now.year as a default because that will change every year, and we need IDs to stay the same. We will provide an option to the user to specify the schema date, and produce a warning if they do not (as much as I’d like to just break it, the pragmatic side of me keeps backwards compatibility in).
The entry tag has the same problem, but you’ll also note it concatenates the class and the ID with no separator to create the ID. While it’s an edge case, this will break if you have a class name ending in a number, so we need to add in a separator. I vote for a slash. Also, the port in the tag URI is inconsistent with the feed URI (no port), so remove it.
For further reading, I recommend How to make a good ID in Atom.
The missing self link is just your garden variety bug – the documentation says it should be provided by default, but the code does not.
I went ahead and fixed these problems. Changeset 8529. The example above, when you change the call to atom_feed(:schema_date => 2008), looks like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
<?xml version="1.0" encoding="UTF-8"?> <feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom"> <id>tag:localhost:/posts</id> <link type="text/html" rel="alternate" href="http://localhost:3000"/> <link type="application/atom+xml" rel="self" href="http://localhost:3000/posts.atom"/> <title>My great blog!</title> <updated>2007-12-23T04:23:07+11:00</updated> <entry> <id>tag:localhost:Post/1</id> <published>2007-12-23T04:23:07+11:00</published> <updated>2007-12-30T15:29:55+11:00</updated> <link type="text/html" rel="alternate" href="http://localhost:3000/posts/1"/> <title>First post</title> <content type="html">HOORAY. About ruby.</content> <author> <name>DHH</name> </author> </entry> </feed> |
mmm, semantic goodness
I don't want preferences
Or why I’m writing another blog engine for ruby
I’ve been running this site on Mephisto for a number of months now. It is fantastic at what it does, but I’ve just recently realised it’s not what I want.
I want to configure my blog by hacking code
I don’t want preferences or theme support – I want to edit code. Mephisto isn’t great for this – it uses non standard routing (everything goes through dispatch), it uses liquid templates. I feel like I have to learn Mephisto to hack it. SimpleLog is another rails option, but it sucks because it reads like a PHP app, and I don’t want to be hacking that. It’s built to be configured, not to be hacked.
So here is my grand plan.
An opinionated blog engine that does things my way. OpenID login, XHTML valid default template, RESTful stuff, code highlighting in comments, etc… To install, you branch my master git repo and customize away. You can just keep rebasing to get all the trunk updates. You can publish a ‘theme’ in the form of a patch against trunk. The code is going to be lean since I don’t need to accommodate for 5, 10 or 15 articles per page, so it will be easy to comprehend.
Basically, it’s so you can write your own blog without having to worry about boring stuff like admin, defensio integration, and OpenID auth.
I wonder what I’ll call it.
UPDATE: Look I made it – Enki
Don't use pagination on your blog
What problem are you trying to solve? In my case, I don’t want the bottom of the page to be a dead end. Paging would appear to be a good solution – click next page, get more content. Alas, it has issues:
- When you post a new article, it changes the content of all your pages. Google doesn’t like this – search traffic to your blog will suffer since people will click through expecting an older version of the page.
- Invalidates your entire cache when you post something new. Admittedly not a problem for most of us, but worth considering.
Archives solve my problem – not wanting a dead end, while avoiding the two problems with pagination mentioned above. It is harder to get your window size right though (you don’t want 2 or 200 articles per page).
For bonus points, add something like the Humanized Reader. Javascript fetches the next article when you’re near the bottom of the page, seamlessly adding it to the bottom of the page so the user can just keep on reading.
I’ve just added archives to this site – an interim fix to tide me over until I do it right.
Thanks to Rick Olson for telling me I didn’t need paging.

