Where Is the Global Interpreter Lock ?

| Comments

MRI Ruby and Rubinius are both blessed with what is called the Global Interpreter Lock. This means that only execute one thread can be executed at a time per VM instance (per CPU core). See also Ilya Grigorik’s and Yehuda Katz’s article on the subject.

This is not entirely a bad thing, since it means C extensions don’t have to be thread-safe (thanks Benoit Daloze for the input). And as the wikipedia entry says, full concurrency can be achieved by spawning separate processes, which each have their own VM, and no shared state at all.

Rubinius is now getting rid of this lock in its Hydra branch. This is pretty cool, but it also means that all the C extensions you want to run concurrently will need to be thread-safe. Locks will have to happen at a more fine-grained level: if you need to modify a shared resource, it needs to be protected from indeterministic accesses by different threads.

But this aside, my aim in this post is to have a look at the MRI code, and see which bits of the code implement the lock. In ruby 1.9.2, MRI ruby uses native threads, POSIX threads for *nix (thread_pthread.c), and windows threads for win (thread_win32.c). The native interfaces are wrapped in thread.c.

Native threads have a mechanism to block access to a resource, which is called mutex (as in mutual exclusion).

So, which resource is protected in this case ? vm_core.h provides a hint in thread.c, you find the top comment:

1
2
3
4
5
6
model 2:
    A thread has mutex (GVL: Global VM Lock or Giant VM Lock) can run.
    When thread scheduling, running thread release GVL.  If running thread
    try blocking operation, this thread must release GVL and another
    thread can continue this flow.  After blocking operation, thread
    must check interrupt (RUBY_VM_CHECK_INTS).

vmcore.h:

1
#if RUBY_VM_THREAD_MODEL == 2

Ruby always has a ‘current thread’.

1
2
RUBY_EXTERN rb_thread_t *ruby_current_thread;
extern rb_vm_t *ruby_current_vm;

The entry point of Ruby is, of course, main.c. main calls (amongst others) ruby_init (eval.c), which calls (amongst others) Init_BareVM. There we find rb_thread_set_current_raw(th), which is defined as

1
#define rb_thread_set_current_raw(th) (void)(ruby_current_thread = (th))

The actual thread creation happens in Init_native_thread, a bit lower in Init_BareVM. This is a call to the native library to create a thread.

Nearly every function in the VM (vm.c) starts by getting a thread GET_THREAD(), which gets the current thread to work with.

1
#define GET_THREAD() ruby_current_thread

There are a few mutexes in the Ruby source, but it’s hard to miss the one used for the global interpreter (or VM) lock, thanks to the thread struct member being locked on: global_vm_lock.

At load time, the mutex in question is initialized (thread.c) in Init_Thread:

1
2
3
4
5
6
7
8
9
10
11
/* init thread core */
{
    /* main thread setting */
    {
        /* acquire global vm lock */
        rb_thread_lock_t *lp = &GET_THREAD()->vm->global_vm_lock;
        native_mutex_initialize(lp);
        native_mutex_lock(lp);
        native_mutex_initialize(&GET_THREAD()->interrupt_lock);
    }
}

At thread scheduling (thread.c) in rb_thread_schedule_rec: grab current thread and unlock the lock, run the Thread block, and then relock the lock on current thread again.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
thread_debug("rb_thread_schedule\n");
if (!rb_thread_alone()) {
    rb_thread_t *th = GET_THREAD();

    thread_debug("rb_thread_schedule/switch start\n");

    RB_GC_SAVE_MACHINE_CONTEXT(th);
    native_mutex_unlock(&th->vm->global_vm_lock);
    {
        native_thread_yield();
    }
    native_mutex_lock(&th->vm->global_vm_lock);

    rb_thread_set_current(th);
    thread_debug("rb_thread_schedule/switch done\n");

The function thread_start_func2 (in thread.c) is called at thread creation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* ... */
    native_mutex_lock(&th->vm->global_vm_lock);
    {
        thread_debug("thread start (get lock): %p\n", (void *)th);
        rb_thread_set_current(th);
/* ... */
/* and at the end : if the thread is not the 'main' thread, it releases the mutex */
    if (th->vm->main_thread == th) {
        ruby_cleanup(state);
    }
    else {
        thread_cleanup_func(th);
        native_mutex_unlock(&th->vm->global_vm_lock);
    }
/* ... */

Interesting factoid: it seems to be possible to run bits of code without GVL, in a ‘blocking region’. I think this fact is not widely advertized because, as mentioned earlier, anything you want to run that way needs to be thread-safe.

1
2
3
4
5
6
7
8
9
10
11
/*
blocking region to release GVL
  rb_thread_blocking_region - permit concurrent/parallel execution.
 
  This function does:
    (1) release GVL.
        Other Ruby threads may run in parallel.
    (2) call func with data1.
    (3) acquire GVL.
        Other Ruby threads can not run in parallel any more.
*/

For my next blog post, I’ll try to do a similar analysis for Rubinius, and how they managed to make a branch that is GVL-free. I needn’t add that JRuby, since it benefits from the JVM threading, has never had a global interpreter lock at all.

Comments