Have you ever wondered what is going on in the entrails of MRI Ruby ? I certainly have. I started out studying engineering because I wanted to know how everything works. As it turned out, every fact just generated more questions, but that won’t keep me from trying.
A little bit of poring through the source code (ruby 1.9.2) yielded some tools which can provide helpful insights for this research.
You’re probably already aware that a Ruby program is executed like so:
- parsing: the program is tokenized, lexed and interpreted into an Abstract Syntax Tree (AST). This AST is basically the source code reduced to a tree form, which looks vaguely lispy in construction.
- compilation: this tree is compiled to a set of instructions for the Ruby virtual machine.
- execution: the instructions are then executed by the ruby virtual machine.
Here are some tools, built-in in ruby, to see what happens to a bit of code.
say code.rb contains just one statement:
The corresponding AST yielded by this option:
1 2 3 4 5 6 7 8 9 10
This gives us an insight in what constitutes an AST node for Ruby.
2 ways to see the instructions produced by compilation.
The first one is a dump parameter like the one above:
ruby --dump insns code.rb
1 2 3 4 5 6 7 8
The second way to get information somewhat more tricky – in this case a C macro needs to be ‘turned on’ for it to work. A hint in the comments of compile.c:
1 2 3 4 5 6 7
There are two ways to set CPDEBUG: you can add a parameter to the compilation in the makefile -DCPDEBUG=5 Or you can change the default value directly in compile.c
1 2 3
In both cases, recompilation is necessary, and will produce more output than usual. Running the same minimal program code.rb yields a whole lot of output about the generated instructions, maybe more than is useful, depending on how deep you want to go.
Execution in the VM
If you want to see output at the execution level: pointers, heaps, frame pointers, there’s another debug constant you can activate. In vm.c, change the value of PROCDEBUG to a non-zero value:
Apparently, this causes segmentation faults out of the box, but by commenting out some stuff I was able to get it compiled. This causes an output like so:
1 2 3 4 5 6 7 8
OK, not very enlightening still. Waiting for the ruby-core’s feedback on that one.
By running these few commands on extremely simple programs, it seems to me that we can get a better insight into how it works.
Note: setting tabstop=8 (in vim, or tab = 8 spaces in other editors) gives you the proper incrementation for the Ruby source code. I had to find out.