Tag Archive for 'blocking'

Of MySQL/Ruby, EventMachine, and the need for non-blocking APIs

Part of the service we’re building is a socket server which uses Flash’s XMLSocket API to push updates to clients. Initially we developed this using the excellent Twisted library in Python, but as it grew, having to duplicate some of our data model code in another language started to hurt, and it made sense for us to port it to Ruby.

Luckily by that point, the EventMachine library had sprung up, offering something very similar to Twisted for Ruby, and we’ve been using that since.

While it’s well known that Ruby’s threading is non-native and not particularly speedy, event-based libraries don’t actually require much use of threading - one is encouraged to structure ones code in such a way that you write small methods which are called asynchronous and return quickly, yielding back to the event loop. For those with client-side experience, this is quite comparable with Javascript runtimes, where there is no threading but a core event loop, the ability to register event handlers, call setTimeout, and asynchronous APIs for longer-running IO (AJAX anyone?).

For this to work well, it is essential that your event handlers do their business as quickly as possible, and yield back to the event loop - as everything else in the event queue is sat there waiting for you to finish. This is all very well, until you need to deal with IO - other things (pesky database servers and clients) have a nasty habit of taking a while to get back to you, and if the API you’re calling to communicate with them blocks you, then it’s blocking everything else in the event queue too.

One way to get around this (despite the concurrency paradigm being based around an event-loop rather than pre-emptive threading), is to have some spare threads lying around to take care of blocking API calls, and fire off an event to the core event loop thread when they’re done. A way of turning a blocking API call into a non-blocking one, something asynchronous. While Ruby’s threads aren’t native or very performant, this shouldn’t matter too much in this case, as the threads aren’t really being used to do very much - just to sit around waiting for IO.

While this doesn’t require an asyncronous API at the Ruby level, it does at least require that the API calls only block the current Ruby thread, and don’t require an interpreter-wide lock in order to go about their business.

Unfortunately, it seems that many (most?) C-based Ruby libraries, including MySQL/Ruby (rather crucial to many), block the whole interpreter while waiting for IO, because they aren’t able to yield to Ruby’s “green” threading code while calling a blocking C API. This is hard to work around unless there’s an non-blocking C API available (which there isn’t, currently, for MySQL, but is for Postgres, hence the non-blocking postgres Ruby library). It may be possible for the C extension to use a separate OS-level thread for the blocking API calls, but as I understand it, one has to be very careful when using multiple OS-level threads in a process which embeds the Ruby interpreter, as the interpreter is not natively-threadsafe in the least.

Anyway the unfortunate upshot of all this is that you can have as many Ruby threads as you like, but only one MySQL query will ever happen at a time. If you don’t believe me, try firing off a Thread.new { connection.execute(”sleep 10″) } and then see if you have any joy querying MySQL in the next 10 seconds. Even with a connection pool, you’re shit outta luck.

This kind of thing rather removes the whole point and usefulness of event-loop based libraries like EventMachine when used with MySQL, and makes ActiveRecord’s specially-thread-safe “allow_concurrency” option considerably less use when used with the MySQL adapter - if all the mysql query grunt work ends up serialized anyway, why bother using threads?

So, there’s a real need for non-blocking APIs, and for Ruby library writers, and (perhaps more critically) those working on the new round of Ruby implementations, to get serious about this if they want Ruby libraries to  be able to get anything out of sub-process-level concurrency.

There’s also a real need for an asynchronous C API for MySQL which Ruby library authors could use. This project appears to have been trying - looking forward to progress!