Archive for the 'Development' Category

Job ad: Music service developer

We’re hiring!

Playlouder MSP has been working with ISPs and the music industry to develop both an innovative business model for music consumption, and innovative user experiences around music and communication to complement ISPs’ offerings.

As a key addition to our small but growing development team, you will be critical in helping to refine, scale and roll out our application and service to white-label ISP clients.

Key requirements

  • Background in computer science, mathematics, software engineering or similar (degree or equivalent experience)
  • A solid technical all-rounder with software development experience on sizeable projects

The specifics

Our work involves all of the following, and you’ll be tackling problems involving many of them:

  • Dynamic languages (experience with Ruby, on which our current implementation is based, would be particularly desirable)
  • Rich web application development with a large Javascript-based client-side portion
  • A modular, widget-based user interface framework for the above
  • Databases (MySQL at present), data modeling and ORM tools
  • Unix-based deployment environments
  • Large volumes of media, media metadata and usage statistics
  • Web services APIs, and large-scale integration work
  • Server push technologies, and scaling applications with a live messaging component
  • Common agile software development tools, processes and techniques - source control, bug tracking, testing etc
  • A warehouse full of music geeks :)

Some other things which you might get to play with:

  • Other languages - Java, Python, C, possibly Erlang (see ‘messaging’ above), …
  • Messaging technologies like AMQP, ActiveMQ, XMPP
  • Music technology R&D projects
  • Lots more in a fast-growing company

In return we can offer challenging problems in an interesting domain, competitive pay, and a great work environment for music fans in our E2 warehouse!

Enquiries to matthew@playlouder.com

Of MySQL/Ruby, EventMachine, and the need for non-blocking APIs

Part of the service we’re building is a socket server which uses Flash’s XMLSocket API to push updates to clients. Initially we developed this using the excellent Twisted library in Python, but as it grew, having to duplicate some of our data model code in another language started to hurt, and it made sense for us to port it to Ruby.

Luckily by that point, the EventMachine library had sprung up, offering something very similar to Twisted for Ruby, and we’ve been using that since.

While it’s well known that Ruby’s threading is non-native and not particularly speedy, event-based libraries don’t actually require much use of threading - one is encouraged to structure ones code in such a way that you write small methods which are called asynchronous and return quickly, yielding back to the event loop. For those with client-side experience, this is quite comparable with Javascript runtimes, where there is no threading but a core event loop, the ability to register event handlers, call setTimeout, and asynchronous APIs for longer-running IO (AJAX anyone?).

For this to work well, it is essential that your event handlers do their business as quickly as possible, and yield back to the event loop - as everything else in the event queue is sat there waiting for you to finish. This is all very well, until you need to deal with IO - other things (pesky database servers and clients) have a nasty habit of taking a while to get back to you, and if the API you’re calling to communicate with them blocks you, then it’s blocking everything else in the event queue too.

One way to get around this (despite the concurrency paradigm being based around an event-loop rather than pre-emptive threading), is to have some spare threads lying around to take care of blocking API calls, and fire off an event to the core event loop thread when they’re done. A way of turning a blocking API call into a non-blocking one, something asynchronous. While Ruby’s threads aren’t native or very performant, this shouldn’t matter too much in this case, as the threads aren’t really being used to do very much - just to sit around waiting for IO.

While this doesn’t require an asyncronous API at the Ruby level, it does at least require that the API calls only block the current Ruby thread, and don’t require an interpreter-wide lock in order to go about their business.

Unfortunately, it seems that many (most?) C-based Ruby libraries, including MySQL/Ruby (rather crucial to many), block the whole interpreter while waiting for IO, because they aren’t able to yield to Ruby’s “green” threading code while calling a blocking C API. This is hard to work around unless there’s an non-blocking C API available (which there isn’t, currently, for MySQL, but is for Postgres, hence the non-blocking postgres Ruby library). It may be possible for the C extension to use a separate OS-level thread for the blocking API calls, but as I understand it, one has to be very careful when using multiple OS-level threads in a process which embeds the Ruby interpreter, as the interpreter is not natively-threadsafe in the least.

Anyway the unfortunate upshot of all this is that you can have as many Ruby threads as you like, but only one MySQL query will ever happen at a time. If you don’t believe me, try firing off a Thread.new { connection.execute(”sleep 10″) } and then see if you have any joy querying MySQL in the next 10 seconds. Even with a connection pool, you’re shit outta luck.

This kind of thing rather removes the whole point and usefulness of event-loop based libraries like EventMachine when used with MySQL, and makes ActiveRecord’s specially-thread-safe “allow_concurrency” option considerably less use when used with the MySQL adapter - if all the mysql query grunt work ends up serialized anyway, why bother using threads?

So, there’s a real need for non-blocking APIs, and for Ruby library writers, and (perhaps more critically) those working on the new round of Ruby implementations, to get serious about this if they want Ruby libraries to  be able to get anything out of sub-process-level concurrency.

There’s also a real need for an asynchronous C API for MySQL which Ruby library authors could use. This project appears to have been trying - looking forward to progress!

An interesting Ruby hash semantics gotcha

Thought this might amuse or perplex some Rubyists (or be useful to know - it’s been the source of a couple of hard-to-track-down bugs in the past).

>> {{} => true}[{}]
=> nil

>> {{} => true, {} => true}
=> {{}=>true, {}=>true}

but yet,

>> {} == {}
true

What’s going on here?

Ruby’s Hashes behave very strangely when you try to use a Hash itself as a key of a Hash.

This acts as a subtle gotcha when you try to memoize a function which takes hash arguments - and so a tricky-to-address bug in libraries like this: http://raa.ruby-lang.org/project/memoize/

Why?

Ruby calls Object#hash on each key of a Hash, using that numeric hash (small h) to allocate that object to a bucket of the underlying hash table data structure. Equality, when it comes to Hash lookups and unique keys of a Hash, will only happen if the keys generate the same numeric hash as a result of their hash methods.

For most ruby data structures, x.hash == y.hash is implied by x == y, and everything works fine.
But, not for Hashes themselves!

(NB. this also affects data structures like Arrays which themselves contain a Hash, since Array#hash must call hash recursively on its contents).

(Interestingly, for things like 1.0 == 1, x.hash == y.hash also fails. Note, x.hash == y.hash is always implied by x.eql?(y), but this equality isn’t a desperately useful one, and seems to have been constructed artificially as an equality for use with Hash which is consistent with .hash)

Why might it have been implemented this way?

Hashes are insensitive to the order of their keys - so, for example, we have:
{:a => true, :b => true} == {:b => true, :a => true}

When you’re actually being given two concrete objects to compare, you can just check that each key from the one has an equal corresponding value in the other, and vice versa.

But, when you’re asked to generate a numeric hash which is constant for the whole equivalence class, you’d have to do something to ensure the hashing isn’t order-sensitive. Like ordering the key/value pairs by their individual hashes before feeding into the hash function.

Some attempts at a fix in the form of a monkey-patched Hash#hash:

(yes, that’s pronounced ‘Hash hash hash’)

  1. Sort key/value pairs by the numeric hash of the pair first:
    class Hash
     def hash
       sort_by {|pair| pair.hash}.hash
     end
    
     def eql?(other)
       self == other
     end
    end
  2. Use an XOR of the hashes of the key/value pairs (XOR is order-insensitive, and should preserve entropy in the bits of the hash)
    class Hash
     def hash
       inject(0) {|hash,pair| hash ^ pair.hash}
     end
    
     def eql?(other)
       self == other
     end
    end

These then fix, eg:

>> {}.hash == {}.hash
=> true

>> {{} => true}[{}]
=> true

>> {{} => true, {} => false}
=> {{}=>false}

(Note, overriding eql? is required to make the last two work - it seems the Hash implementation uses eql? to do the equality comparison that follows the more approximate hash comparison)

Now, I’m sure there’s a reason Matz didn’t do it this way - perhaps a performance reason, perhaps a gotcha that I haven’t noticed with my approach. Perhaps it’ll be fixed in 1.9.
But at any rate, it’s useful to be aware of the issue.

Minification

In case anyone noticed, we’ve done a bit of client-side optimization. Namely:

  • Javascript and CSS files are now ‘minified’ (I prefer ’squished’) as part of our build process, using the handy YUI compressor - this shaves a good 40% of bloat off our Javascript and 20% off our CSS, and makes them ever so slightly quicker to parse at the other end too
  • Common Javascript and CSS includes are now packaged up into combined packages, which saves a lot of HTTP requests
  • Javascript includes have moved to the bottom of the page, meaning they won’t delay page rendering

Some stuff we were already doing:

  • Far-ahead Expires headers on all static resources, meaning they’ll be filled from the browser’s cache without any HTTP request where possible
  • Gzipping static files with lighttpd - this shaves a good 76% off our Javascript for example (and still 74% off the minified javascript). It also shaves 82% off CSS, and interestingly manages to shave even more (83%) off minified CSS - indicating that stripping syntactically-irrelevant information actually makes the remaining data more amenable to compression in this case.

Some more still to do

  • Convert PNG24s to PNG8s with alpha. Yes PNG8s with more-than-just-1-bit alpha do exist! and are considerably less bulky than PNG24s. Sadly neither Photoshop nor imageMagick can export them, but Fireworks, or the PNGNQ utility can. They don’t work too well in IE6, but then what does…)
  • Consider using CSS sprites and background-position hackery for some of our icons, where possible, to cut down on requests
  • Serve up static files from assets1.playlouder.com and assets2.playlouder.com, to increase the number of concurrent requests browsers make (they typically limit to 2 per hostname)
  • Optimize our javascript to improve page initialization times - browser-native getElementsByClassName may help here, as may selectively delaying some DOM lookups until they’re needed, and using more bubbled-up event handling to avoid the need for more specific DOM lookups
  • Optimize the crap out of the server-side (another topic for another day…)

Much as I love Ruby

And much as we bend Rails to our will, I am getting a bit jealous of these guys developing web apps with Scala - an elegant hybrid functional/object-oriented language with a powerful type-inferencing type system, Erlang-style Actors and other goodies. It compiles and runs fast on the JVM too and can access Java libraries in quite a native way. It’s kinda like Ruby plus OCaml plus Java minus the suck of Java.

I think it’s because the inner maths and type theory geek in me (the one who can never quite get over how awesome http://en.wikipedia.org/wiki/Curry-Howard_isomorphism is) really misses having a powerful type system - and Scala’s does seem to hit the sweet spot when it comes to a middle ground between the bafflingly powerful Hindley-Milner extensions of Haskell and OCaml, and more accessible Object-oriented type systems with subtyping.

liftweb (or ‘Scala with Sails’ - see what they did there?) seems like a pretty neat framework too. I’m just plugging it so that someone else will do (continue doing) the work of making it sufficiently ‘enterprise-ready’ for me to use in ‘the real world’. ;-)

Meet us at LRUG

This monday I shall be giving a talk at the London Ruby Users Group. I’ll be giving a tour through our experiences building a modular composable Widget UI framework ontop of Ruby on Rails. Some of the steps we took along the way, problems we encountered, and a tour of the results.

There’s also a juicy debate to be had comparing the REST-driven web application architecture pushed by the Rails project, with our more modular widget-based approach, and in deciding what’s appropriate for your application. For those in the know, comparisons abound with the approach taken by Avi Bryant’s Seaside framework, and the Apotomo plugin already in development for Rails.

The framework comes with a client-side component too, and means of serializing Widgets to constructors for corresponding client-side javascript classes - so the talk may also interest those attempting to do Javascript in an unobtrusive, object-oriented way with Rails.

After that I believe (sincerely hope ;-) ) there are drinks.

The talk is open to all but the venue ask that you Register here; see LRUG for more details.

The more I have to hack ActiveRecord’s guts

to make it do something right - the more I’m tempted to rewrite it from scratch.

That whole “I never took (or never understood) a database theory course at university, so I’m just going to pretend it doesn’t matter, and that a relational database may be treated as nothing more than a glorified filestore for my Objects” attitude just doesn’t cut any ice with me, but seems sadly prevalent within the Rails community. Yes, good object-oriented design is really important - but you need a sophisticated relational approach in order to get a handle on the data model behind any kind of non-trivial inheritance & mixin hierarchy, and to persist it in a logically sound and efficiently-indexable fashion.

What’s more, I contend that your ORM tool needs to understand something of the relational algebra in order to represent what is going on in a sufficiently elegant, flexible way - otherwise you’ll always be piling hack ontop of hack whenever you want to map the results of a moderately complex query over to the OO side. Joining SQL strings together is not the way forward - these things are syntax trees with structure!

Ahem. Sorry if I sound exasperated. SQLAlchemy on the Python side gets this kind of thing ABSOLUTELY SPOT ON, and dare I say it, so do some of the Java ORM frameworks (shame about the XML config files and Java’s tendency towards boilerplate code and bloated syntax, but don’t throw the baby out with the bathwater Rails-ers)

The problem with Rails’ ActiveRecord is that it’s neither here nor there - neither the kind of lightweight, simple ‘map objects to database rows and nothing much else’ approach originally implied by Fowler’s Active Record design pattern - nor the kind of powerful ORM tool which is capable of turning the kind of tricks that are increasingly demanded of it in anything like an elegant fashion.

It seems the Rails team’s solution to some of the endemic problems with ActiveRecord’s messy guts is to wrap them up in a huge plastic bag known as caching - an acceptable pragmatic approach, I accept, in many situations, but one which would not be nearly so necessary had a different approach been taken to ActiveRecord’s architecture.

I feel that superior approach needn’t have come at the cost of ActiveRecord’s ‘convention over configuration’ and ‘easy to get started with’ benefits either - it just would have required a little more forethought and a little humility in learning about the Relational Model before attempting a tool which maps complex data models to a Relational Database.

Crap, I’m starting to sound like Fabian Pascal now aren’t I.

We moved to Git

I’ll admit it took a while to convince me of the merits of decentralised version control. But after a really nasty couple of merges, enough was enough. We’ve dropped SVN in favour of Git - which seems to be the biggest contender.

The main selling point for me was that it keeps track of all the metadata surrounding merges - information which SVN forgets about, requiring you to document your own merge metadata in commit messages, and scour the SVN logs trying to figure out which changesets have already been merged, when, whence, whither, by whom.

Other wins:

  • It’s really fast
  • Easy creation and painless switching between local ‘topic branches’, which you can create for each feature you’re working on and merge into eachother easily
  • Easy to swap work-in-progress patches with other developers without having to commit to a centralised trunk
  • Easy to make lots of quick local / offline commits, which you can later crunch down into one whn merging, if you want
  • Have the whole history available locally, and lots of backups of the repository

Some minuses:

  • The git-svnimport tool appears slightly buggy. Don’t count on it to import your SVN branches properly, especially if you moved them around at any point in the SVN history. In our case the branches it created only contained the files which had changed since the branch was created in svn - rather than fiddle around sorting them out, I just deleted them and re-created them from the git master branch by applying an svn diff. Which is OK if the history of your branches isn’t super important, but less than ideal otherwise. I also found that a small number of files were missing from the trunk, and had to be re-added manually - I suspect git-svnimport gets a bit lost when files have been moved around in non-trivial ways in the SVN history.
  • There is something of a learning curve with Git, especially when it comes to more complex merging, branching, tagging, cherry-picking tasks which were the reasons I first wanted to move to Git. I found this set of lessons learned helped a lot, ontop of the ‘Git from SVN’ tutorial. Once you know what you’re doing though, it’s faster and a lot less fiddly at merging than SVN.

New Feature: Playlists - Rip. Mix. Burn.

One of the wake up calls for the music industry (which arguably is still fumbling sleepily with its alarm clock) was the Apple advertisement ‘Rip. Mix. Burn.’

Not only did the phrase lead to some heavy whinging from record labels - for irresponsibly encouraging piracy - it also spawned a swarm of academic papers and conference speeches right across the copyright reformist movement. More chilling for the industry however, it positioned Apple as a far more emotionally engaged intermediary between the artists and the fan than the labels themselves.

Apple’s press release is here:
http://www.apple.com/pr/library/2001/feb/22imac.html
and the TV ad is here:
http://www.theapplecollection.com/Collection/AppleMovies/mov/concert_144a.html

Yesterday we added our own contribution to remix culture. You can now make a Playlist in the Playlouder website, give it a title, and share it with other members. Just the basics are there right now, but this is a feature we think is very important so we shall be developing it over the coming months.

Let us know what you think.

‘Hijax’ - aren’t buzzwords great

Turns out, unbeknown to me, someone had already invented a buzzword for my “optional ajax navigation” technique.

Hijax” - because it Hijacks clicks on anchor tags and form submissions, and takes them over with Ajax.

I wonder if his code deals with all the same tricky corner cases that mine does, though.

Turns out someone’s made a dynamic history library for jQuery too. It looks a lot more modern and less verbose than the dhtmlHistory code we’re using at the moment - any kindly souls fancy porting it to work with prototype.js ?