YAML in Ruby Tutorial
UPDATE 2011-01-31: I have posted a newer tutorial which is probably going to be more useful to you than this one: YAML Tutorial
So you’ve got all these tasty ruby objects lying around in memory, and they’re going to be lost when your program ends. Such a tragic end. What’s a robot to do? Why, store them to disk in a language agnostic format, of course! Enter YAML, a language perfectly suited to the task, more so than it’s heavier bretheren, XML. YAML support comes built in to the ruby language, and it couldn’t be easier to use. Every object automagically gets a to_yaml
method that returns a string containing appropriate YAML markup when you include the right file.
1 2 3 |
require 'yaml' # Assumed in future examples puts "hello".to_yaml |
Of course this works for any object, using some of that oh-so-sweet reflection. to_yaml
recursively calls itself on all of your instance variables, and even knows how to handle complex structure like arrays and hashes. It even copes with cyclic references! How’s that for value?
1 2 3 4 5 6 7 8 9 10 |
class Square def initialize width, height @width = width @height = height @bonus = ['yo', {:msg => 'YAML 4TW'}] @me = self end end puts Square.new(2, 2).to_yaml |
Now that you’ve got a handy YAML string you can do whatever you like with it: write it to disk, store it in a database, email it to your cousin Benny. But Benny is going to spin out – how does he reproduce your shiny ruby objects? Thoughtfully, ruby makes it just about as easy to create an object from YAML markup – in other words to go the other way. The YAML::load
method takes either a string or an IO object and gives you back an object, ready to use. It’s worth noting that the initialize
method is not called on the new object – a fact that will become pertinent later.
1 2 3 |
serialized = Square.new(2, 2).to_yaml new_obj = YAML::load(serialized) puts new_obj.width |
Transience
The YAML serializer works in essentially the same manner as a sledgehammer. There’s no finesse – it will serialize all of your instance variables. Always. This is generally not a problem, but every now and then for reasons of space, security, beauty or public health you will have a transient variable that you really just don’t want to be serialized. There is no neat way in the supplied library to do this. You could override to_yaml
and blank out the transient fields before you call super
, but then you need to restore them afterwards. And what if those fields were calculated on initialization – how do you restore them when the object is deserialized?
Not to worry, our gallant hero (yours truly) has created a helper script that allows you to specify which fields are to be persisted in a declarative manner using a class attribute.
1 2 3 4 5 6 7 8 9 10 11 |
require 'rhnh/yaml_helper' # Assumed in future examples class Square persistent :width, :height def initialize width, height @width = width @height = height @me = self # @me will not be serialized end end |
The script also provides a post_deserialize
hook that is called, not surprisingly, after deserialization. It essentially acts as initialize
for deserialized objects. No setup is necessary to use this hook, it’s mere presence will attract enough attention.
1 2 3 4 5 6 7 |
class OnTheBall def post_deserialize puts "I'm awake!" end end YAML::load(OnTheBall.new.to_yaml) |
In closing
YAML is an excellent choice for serializing your Ruby objects. Its brevity and readability give it the edge over both XML and Marshal, and with the addition of YAML Helper it becomes more flexible as well.