Jabberwocky

snicker-snack!

Tail Call Optimization

| Comments

Tail call optimization: when in recursion, you can just reuse the stack of the recursive function you’re calling again and again, instead of creating a new stack every time. This means saving memory and gaining speed, and that count for a lot when you’re, say, calculating a large fibonacci number.

Now I’d been told time and again that Ruby doesn’t do that out of the box, unlike languages like Lisp and Erlang.

Today, I saw this tweet by Jim Weirich, where he pointed to a blog post implementing tail call in Ruby. The code is very clever – it took me a good few minutes to parse it. Have a look for yourself, it’s worth it !

Coincidentally (note: not so coincidentally after all, the person reporting the bug, Magnus Holm, is also the person having posted the blog), this very morning ruby-core closed an issue, showing a feature on ruby 1.9 I didn’t even know existed: the possibility to tell the VM you want tail call !

To use the same example used in the beautiful ruby code I referred to “` ruby RubyVM::InstructionSequence.compile_option = { :tailcall_optimization => true, :trace_instruction => false }

RubyVM::InstructionSequence.new(<<-EOF).eval class TCOTest # tail-recursive factorial def fact( n, acc = 1 )

if n < 2 then acc else fact( n-1, n*acc ) end

end

# length of factorial def fact_size( n )

fact( n ).size

rescue

$!

end
end EOF

1
2
3
4
5
6
7
8

The performance of the vm-based tail call seems to be slightly better than the ruby one, as could be expected.

How does it work:<br/>
The main difference seems to be in compile.c, in the function responding to the friendly name of iseq_peephole_optimize.<br/>
As far as I understand, this function deals with a linked list of binary instructions (corresponding to the program), and tries to optimize its execution by introducing some shortcuts where appropriate.

The relevant code (iobj is the linked list of instructions):

c if (do_tailcallopt && iobj->insn_id == BIN(leave)) { / * send … * leave * => * send …, … | VM_CALL_TAILCALL_BIT, … * leave # unreachable / INSN piobj = (INSN )get_prev_insn((INSN *)list);

if (piobj->insn_id == BIN(send) &&

  piobj->operands[2] == 0 /* block */
  ) {
  piobj->operands[3] = FIXNUM_OR(piobj->operands[3], VM_CALL_TAILCALL_BIT);

} }

1
2
3

If we're in the right situation, the instruction gets a VM_CALL_TAILCALL_BIT.
We find our friend the tailcall bit in vm_insnhelper.c in vm_setup_method

c VALUE sp, rsp = cfp->sp – argc; // rb_iseq_t iseq = me->def->body.iseq; / … */ sp = rsp + iseq->arg_size;

if (LIKELY(!(flag & VM_CALL_TAILCALL_BIT))) { // / clear local variables / for (i = 0; i < iseq->local_size – iseq->arg_size; i++) {

  *sp++ = Qnil;

} vm_push_frame(th, iseq,

      VM_FRAME_MAGIC_METHOD, recv, (VALUE) blockptr,
      iseq->iseq_encoded + opt_pc, sp, 0, 0);
}

cfp->sp = rsp – 1 / recv /;

} else {

VALUE p_rsp; th->cfp++; / pop cf */ p_rsp = th->cfp->sp;

/ copy arguments / for (i=0; i < (sp – rsp); i++) {

  p_rsp[i] = rsp[i];

}

sp –= rsp – p_rsp; vm_push_frame(th, iseq,

      VM_FRAME_MAGIC_METHOD, recv, (VALUE) blockptr,
      iseq->iseq_encoded + opt_pc, sp, 0, 0);
}

} “`

iseq contains the method instruction sequence. rsp contains the stack pointer of the control frame minus the size of the given arguments. sp contains the rsp augmented with the new methods’ arguments size.
In the case of tailcall optimization (else) the next frame gets rewritten to the method instructions, else a frame gets pushed normally.
As far as I understand. I’m not blown away by the naming of the variables or the structure, though I can’t claim I would have done an any better job since I’m not that familiar with the subject matter.

Since this tailcall optimization is an option and not a default, and since it’s not one that is widely publicized (to my knowledge) (although I did find some threads when googling), I’m guessing that it’s either experimental or that there are some downsides to it. Does someone know ?

Update: Magnus Holm (@judofyr) added notes in his blog post about this feature, and also benchmarking between about 5 ways to implement this ! nice.

Comments