Jabberwocky

snicker-snack!

Segfault in Ruby

| Comments

Note: the following works with C-based ruby, not JRuby or IronRuby, obviously. This is a sight most rubyists will fear: the segmentation fault. You’re running your tests quite innocently, or your web server is doing it’s job, until BOOM ! [BUG] Segmentation fault

What just happened ? A segfault means your program tries to play fast and loose with memory it hasn’t been allocated. The operating system says ‘hey you!’. When this occurs on a *nix, the process receives a signal, SIGSEGV. The program crashes, and in so doing leaves a core dump, which is a recording of the state of the program at the time of crash.

Ruby then traps the corresponding signal. You’ll find corresponding code in signal.c of the ruby source code: install_sighandler(SIGSEGV, sigsegv); and the sigsegv function is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#ifdef SIGSEGV
static RETSIGTYPE sigsegv _((int));
static RETSIGTYPE
sigsegv(sig)
  int sig;
{
  #if defined(HAVE_NATIVETHREAD) && defined(HAVE_NATIVETHREAD_KILL)
    if (!is_ruby_native_thread() && !rb_trap_accept_nativethreads[sig]) {
      sigsend_to_ruby_thread(sig);
      return;
    }
  #endif
  rb_gc_unstress();
  rb_bug("Segmentation fault");
}
#endif

The rb_bug at the bottom is responsible for the message you see appearing when a segmentation fault happens.

That’s all well, you’ll say, but how to I solve this ? First off, you have to determine where the issue came from. There’s where the core dump can help you, by telling you if the issue happened in ruby itself, or in its binding to another component, like a database or something similar.

But first, you need to recompile your ruby. Why ? Well, C compilation strips the executable from a lot of information linking it to the source, and optimization even changes the execution, like removing variables, taking shortcuts, unwinding loops etc. If you want to make the link between the executable and the source, you need to make sure all that information stays in the executable.

Make sure you’ve got the code of your ruby version. If you installed ruby enterprise, you have it in the package you downloaded, if you use MRI, you need to go and get it in ruby-lang. Instead of using the installer, go into the ‘source’ directory, and type

1
./configure optflags='-O0' debugflags='-ggdb'

then compile by doing

1
2
make
sudo make install

(you probably know all this, but i’m adding it for the sake of completeness). If you suspect the segfault happens in a library you’re binding to, you might have to do the same kind of thing for that library.

Then, reproduce the segfault. The core dump results in a file named ‘core’ to be written in the directory your program was running in. To be able to parse this file, you need a program like gdb (the GNU debugger). This means installing the gdb package, whether with macports or with the package tool of your distribution of choice.

Once you have gdb installed, go to the directory where the core is, and run

1
gdb ruby core

Here you’re telling gdb which program was dumped, so it can interpret the core. In our case, it’s always ruby. This opens a console, in which you can type ‘bt’ (as in backtrace)

1
gdb> bt

The first few frames just mean ruby is trapping the segmentation fault, but after that, you might find something familliar:

1
2
3
4
5
6
7
8
#0  0x0093a422 in __kernel_vsyscall ()
#1  0x0062b4d1 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0x0062e932 in abort () from /lib/tls/i686/cmov/libc.so.6
#3  0x080c3912 in rb_bug (fmt=0x80da37c "Segmentation fault") at error.c:227
#4  0x080a2f3b in sigsegv (sig=11) at signal.c:633
#5  
#6  0x00000019 in ?? ()
#7  0x0807197a in rb_hash_aref (hash=3069732200, key=1837) at hash.c:457

It gives you the source and the line (hash.c and 457) ! If you want more options, type ‘help’. Gdb, like the ruby-debugger you’re probably familiar with, allows you to get many information, like the value of variables in different frames.

The backtrace (bt) command will tell you where it went wrong – which might give you a pointer of where to look. Even if you don’t read C, this often allows you to find an answer by googling. I was able to backport a fix from ruby 1.9 to our version of ruby by looking around. If not, you’ll find many knowledgeable folks, on IRC for instance. Failing that, dig in. You now know what a segmentation fault is, pointers or array access on the line of crash are probably the cause.

If you are playing around and producing segfaults on a regular basis (like i seem to do these days), you could take a shortcut, and add the following line in the rb_bug function of error.c (remember rb_bug was triggered on segfault):

1
2
3
4
5
6
7
...
 210       va_end(args);
 211       fprintf(out, "\n%s\n\n", ruby_description);
**->          rb_backtrace();**
 212     }
 213     abort();
...

I even added some code to get the ruby stack trace out of it, but it is tentative and needs some more testing, so I won’t publish it just now.

This will of course require you to recompile ruby once again. In general, I’d say: don’t be shy, look around, the ruby code is surprisingly readable. After all, digging around in code is what we do (well, besides writing it).

Comments