Jabberwocky

snicker-snack!

Getting Cosy With MRI Ruby

| Comments

Have you ever wondered what is going on in the entrails of MRI Ruby ? I certainly have. I started out studying engineering because I wanted to know how everything works. As it turned out, every fact just generated more questions, but that won’t keep me from trying.

A little bit of poring through the source code (ruby 1.9.2) yielded some tools which can provide helpful insights for this research.

You’re probably already aware that a Ruby program is executed like so:

  • parsing: the program is tokenized, lexed and interpreted into an Abstract Syntax Tree (AST). This AST is basically the source code reduced to a tree form, which looks vaguely lispy in construction.
  • compilation: this tree is compiled to a set of instructions for the Ruby virtual machine.
  • execution: the instructions are then executed by the ruby virtual machine.

Here are some tools, built-in in ruby, to see what happens to a bit of code.

Parsing

1
ruby --dump parsetree code.rb

say code.rb contains just one statement:

1
answer = 42

The corresponding AST yielded by this option:

1
2
3
4
5
6
7
8
9
10
# @ NODE_SCOPE (line: 1)
# +- nd_tbl: :answer
# +- nd_args:
# |   (null node)
# +- nd_body:
#     @ NODE_DASGN_CURR (line: 1)
#     +- nd_vid: :answer
#     +- nd_value:
#         @ NODE_LIT (line: 1)
#         +- nd_lit: 42

This gives us an insight in what constitutes an AST node for Ruby.

Compilation

2 ways to see the instructions produced by compilation.

The first one is a dump parameter like the one above:

ruby --dump insns code.rb

outputs:

1
2
3
4
5
6
7
8
== disasm: <RubyVM::InstructionSequence:<main>@test>====================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1] s1)
[ 2] answer
0000 trace            1                                               (   1)
0002 putobject        42
0004 dup
0005 setdynamic       answer, 0
0008 leave

The second way to get information somewhat more tricky – in this case a C macro needs to be ‘turned on’ for it to work. A hint in the comments of compile.c:

1
2
3
4
5
6
7
 * debug level:
 *  0: no debug output
 *  1: show node type
 *  2: show node important parameters
 *  ...
 *  5: show other parameters
 * 10: show every AST array

There are two ways to set CPDEBUG: you can add a parameter to the compilation in the makefile -DCPDEBUG=5 Or you can change the default value directly in compile.c

1
2
3
#ifndef CPDEBUG
#define CPDEBUG 5
#endif

In both cases, recompilation is necessary, and will produce more output than usual. Running the same minimal program code.rb yields a whole lot of output about the generated instructions, maybe more than is useful, depending on how deep you want to go.

Execution in the VM

If you want to see output at the execution level: pointers, heaps, frame pointers, there’s another debug constant you can activate. In vm.c, change the value of PROCDEBUG to a non-zero value:

1
#define PROCDEBUG 1

Apparently, this causes segmentation faults out of the box, but by commenting out some stuff I was able to get it compiled. This causes an output like so:

1
2
3
4
5
6
7
8
---
envptr: 0x100482f48
orphan: 0x100887358
inheap: 0x0
lfp:    0x100482f48
dfp:    0x100482f48
<internal:gem_prelude>:1:in `require': cannot load such file -- rubygems.rb (LoadError)
  from <internal:gem_prelude>:1:in `<compiled>'

OK, not very enlightening still. Waiting for the ruby-core’s feedback on that one.


By running these few commands on extremely simple programs, it seems to me that we can get a better insight into how it works.

Note: setting tabstop=8 (in vim, or tab = 8 spaces in other editors) gives you the proper incrementation for the Ruby source code. I had to find out.

Comments