Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

YAML Tutorial

Many years ago I wrote a tutorial on using YAML in ruby. It still sees the most google traffic of any post, by far. So people want to know about YAML? I’ll help them out.

What is YAML?

YAML is a flexible, human readable file format that is ideal for storing object trees. YAML stands for “YAML Ain’t Markup Language”. It is easier to read (by humans) than JSON, and can contain richer meta data. It is far nicer than XML. There are libraries available for all mainstream languages including Ruby, Python, C++, Java, Perl, C#/.NET, Javascript, PHP and Haskell. It looks like this:

1
2
3
4
5
6
--- 
- name: Xavier
  country: Australia
  age: 24
- name: Don
  country: US

That is a simple array of hashes. You can nest any combination of these simple data structures however you like. Most parsers will also detect the 24 as an integer too. Quoting strings is optional, and was omitted in this example.

YAML allows you to add tags to your objects, which is extra meta-data that your application can use to deserialize portions into complex data structures. For instance, in ruby if you serialize a set object it looks like this:

1
2
3
4
5
# Set.new([1,2]).to_yaml
--- !ruby/object:Set 
hash: 
  1: true
  2: true

Notice that ruby has added the ruby/object:Set tag so that the correct object can be instantiated on deserialization, while maintaining a human readable rendition of a set. These tags can be anything you like, ruby just happens to use that particular format.

You can remove duplication from YAML files by using anchors (&) and aliases (*). You typically see this in configuration files, such as:

1
2
3
4
5
6
7
8
9
10
11
defaults: &defaults
  adapter:  postgres
  host:     localhost

development:
  database: myapp_development
  <<: *defaults

test:
  database: myapp_test
  <<: *defaults

& sets up the name of the anchor (“defaults”), << means “merge the given hash into the current one”, and * includes the named anchor (“defaults” again). The expanded version looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
defaults:
  adapter:  postgres
  host:     localhost

development:
  database: myapp_development
  adapter:  postgres
  host:     localhost

test:
  database: myapp_test
  adapter:  postgres
  host:     localhost

Note that the defaults hash hangs around, even though it isn’t really required anymore.

YAML generators use this technique to correctly serialize repeated references to the same object, and even cyclic references. That’s pretty clever.

Flow style

YAML has an alternate synax called “flow style”, that allows arrays and hashes to be written inline without having to rely on indentation, using square brackets and curly brackets respectively.

1
2
3
4
5
6
7
8
9
10
11
12
13
--- 
# Arrays
colors:
  - red
  - blue
# in flow style...
colors: [red, blue]

# Hashes
- name: Xavier
  age: 24
# in flow style...
- {name: Xavier, age: 24}

This has the curious effect of making YAML a superset of JSON. A valid JSON document is also a valid YAML document.

Performance

Given YAML’s richness and human readability, you would expect it to be slower than native serialization or JSON. This would be correct. My brief testing shows it is about an order of magnitude slower. For the typical configuration use-case, this is irrelevant, but worth keeping in mind if you are doing something crazy. Remember to run your own benchmarks that represent your specific need.

1
2
3
4
5
6
7
8
9
                     user       system     total    real
Marshal serialize    0.090000   0.000000   0.090000 (  0.091822)
Marshal deserialize  0.090000   0.000000   0.090000 (  0.092186)
JSON serialize       0.480000   0.010000   0.490000 (  0.480291)
JSON deserialize     0.130000   0.010000   0.140000 (  0.134860)
YAML serialize       2.040000   0.020000   2.060000 (  2.065693)
YAML deserialize     0.520000   0.010000   0.530000 (  0.526048)
Psych serialize      2.530000   0.030000   2.560000 (  2.565116)
Psych deserialize    1.510000   0.120000   1.630000 (  1.622601)

Curiously, the new YAML parser Psych included in ruby 1.9.2 appears significantly slower than the old one. Not sure what is going on there.

Reading YAML from a file with ruby

1
2
3
4
5
6
7
require 'yaml'

parsed = begin
  YAML.load(File.open("/tmp/test.yml"))
rescue ArgumentError => e
  puts "Could not parse YAML: #{e.message}"
end

Writing YAML to a file with ruby

1
2
3
4
require 'yaml'

data = {"name" => "Xavier"}
File.open("path/to/output.yml", "w") {|f| f.write(data.to_yaml) }

Anything else you’d like to know? Leave a comment.

Psych YAML in ruby 1.9.2 with RVM and Snow Leopard OSX

Note that you must have libyaml installed before you compile ruby, so this probably means you’ll need to recompile your current version.

1
2
3
sudo brew install libyaml
rvm install ruby-1.9.2 --with-libyaml-dir=/usr/local
ruby -rpsych -e 'puts Psych.load("win: true")'

YAML in Ruby Tutorial

UPDATE 2011-01-31: I have posted a newer tutorial which is probably going to be more useful to you than this one: YAML Tutorial

So you’ve got all these tasty ruby objects lying around in memory, and they’re going to be lost when your program ends. Such a tragic end. What’s a robot to do? Why, store them to disk in a language agnostic format, of course! Enter YAML, a language perfectly suited to the task, more so than it’s heavier bretheren, XML. YAML support comes built in to the ruby language, and it couldn’t be easier to use. Every object automagically gets a to_yaml method that returns a string containing appropriate YAML markup when you include the right file.

1
2
3
require 'yaml' # Assumed in future examples

puts "hello".to_yaml

Of course this works for any object, using some of that oh-so-sweet reflection. to_yaml recursively calls itself on all of your instance variables, and even knows how to handle complex structure like arrays and hashes. It even copes with cyclic references! How’s that for value?

1
2
3
4
5
6
7
8
9
10
class Square
  def initialize width, height
    @width = width
    @height = height
    @bonus = ['yo', {:msg => 'YAML 4TW'}]
    @me = self
  end
end

puts Square.new(2, 2).to_yaml

Now that you’ve got a handy YAML string you can do whatever you like with it: write it to disk, store it in a database, email it to your cousin Benny. But Benny is going to spin out – how does he reproduce your shiny ruby objects? Thoughtfully, ruby makes it just about as easy to create an object from YAML markup – in other words to go the other way. The YAML::load method takes either a string or an IO object and gives you back an object, ready to use. It’s worth noting that the initialize method is not called on the new object – a fact that will become pertinent later.

1
2
3
serialized = Square.new(2, 2).to_yaml
new_obj = YAML::load(serialized)
puts new_obj.width

Transience

The YAML serializer works in essentially the same manner as a sledgehammer. There’s no finesse – it will serialize all of your instance variables. Always. This is generally not a problem, but every now and then for reasons of space, security, beauty or public health you will have a transient variable that you really just don’t want to be serialized. There is no neat way in the supplied library to do this. You could override to_yaml and blank out the transient fields before you call super, but then you need to restore them afterwards. And what if those fields were calculated on initialization – how do you restore them when the object is deserialized?

Not to worry, our gallant hero (yours truly) has created a helper script that allows you to specify which fields are to be persisted in a declarative manner using a class attribute.

1
2
3
4
5
6
7
8
9
10
11
require 'rhnh/yaml_helper' # Assumed in future examples

class Square
  persistent :width, :height
  
  def initialize width, height
    @width = width
    @height = height
    @me = self        # @me will not be serialized
  end
end

The script also provides a post_deserialize hook that is called, not surprisingly, after deserialization. It essentially acts as initialize for deserialized objects. No setup is necessary to use this hook, it’s mere presence will attract enough attention.

1
2
3
4
5
6
7
class OnTheBall
  def post_deserialize
    puts "I'm awake!"
  end
end

YAML::load(OnTheBall.new.to_yaml)

In closing

YAML is an excellent choice for serializing your Ruby objects. Its brevity and readability give it the edge over both XML and Marshal, and with the addition of YAML Helper it becomes more flexible as well.

Resources

YAML persistence

Fixed up my persistence code to not have to specify variables as an array, and committed my changes to CVS. Funny that on the day I got developer access to clxmlserial, I switched it out of my project in favour of YAML. Of course, I need to add a persistent attribute to that also, but it works a little different from XML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class Object
  def self._persist klass
    begin
      @@persist
    rescue
      @@persist = {}
    end
    @@persist[klass] = [] if !@@persist[klass]
    @@persist[klass]
  end

  def self._persist_with_parent klass
    begin
      @@persist
    rescue
      @@persist = {}
    end
    p = nil
    while (!p) && klass
      p = @@persist[klass.to_s]      
      klass = klass.superclass
    end
    p
  end

  def self.persistent *var
    p = self._persist(self.to_s)
    for i in (0..var.length-1)
      var[i] = var[i].to_s
    end
    p.concat(var)
  end

  def to_yaml ( opts = {} )       
    p = self.class._persist_with_parent(self.class)
   
    if p.size > 0
      YAML::quick_emit( object_id, opts ) do |out|
        out.map( taguri, to_yaml_style ) do |map|
          p.each do |m|
            map.add( m, instance_variable_get( '@' + m ) )
          end
        end
      end
    else
      YAML::quick_emit( object_id, opts ) do |out|
        out.map( taguri, to_yaml_style ) do |map|
                                  to_yaml_properties.each do |m|
            map.add( m[1..-1], instance_variable_get( m ) )
          end
        end
      end
    end
  end

  def save(filename)
    File.open( filename + '.yaml', 'w' ) do |out|
      YAML.dump( self, out )
    end
  end
end
A pretty flower Another pretty flower