Many Opinions, Not Much Information

I was asked recently to write a guest blog at UK national newspaper website, the Telegraph. Here’s the result.

In it I predicted, perhaps rashly but we shall see, some kind of breakthrough in the uneasy relationship between ISPs and the music industry. It’s long overdue: as anyone can discover with a few internet searches we have been working on building this bridge for the last five years. I was adding my voice to a small wave of comment about us, most of it ill informed about not only our own business model, but also about the enabling technology and the respective positions of ISPs and record labels.

One of our favourite comments was that our enterprise is ‘naive, flawed and doomed to fail’, which we liked so much that we thought we’d make a t-shirt with it.

And one of the reasons I liked that comment so much was that in some way you can apply it to all human endeavour, including blogging itself. It seems that we have created a proposition that some people struggle to accept as possible let alone probable or inevitable. Here’s Jupiter’s Mark Mulligan:

My take is that if they are close to announcing something, it will be significantly watered down from the proposition they’ve been trying to get labels to sign up to for years. (here)

and Paul Resnikoff:

the service represents a step down from current, free acquisition options (here)

You can’t please all the people all the time. I shall continue to predict a breakthrough, sooner rather than later, we shall continue to develop our platform (see the recent job advert on this blog), and we shall continue to make ‘no comment’ to public speculation about our business and partnerships.

Job ad: Music service developer

We’re hiring!

Playlouder MSP has been working with ISPs and the music industry to develop both an innovative business model for music consumption, and innovative user experiences around music and communication to complement ISPs’ offerings.

As a key addition to our small but growing development team, you will be critical in helping to refine, scale and roll out our application and service to white-label ISP clients.

Key requirements

  • Background in computer science, mathematics, software engineering or similar (degree or equivalent experience)
  • A solid technical all-rounder with software development experience on sizeable projects

The specifics

Our work involves all of the following, and you’ll be tackling problems involving many of them:

  • Dynamic languages (experience with Ruby, on which our current implementation is based, would be particularly desirable)
  • Rich web application development with a large Javascript-based client-side portion
  • A modular, widget-based user interface framework for the above
  • Databases (MySQL at present), data modeling and ORM tools
  • Unix-based deployment environments
  • Large volumes of media, media metadata and usage statistics
  • Web services APIs, and large-scale integration work
  • Server push technologies, and scaling applications with a live messaging component
  • Common agile software development tools, processes and techniques - source control, bug tracking, testing etc
  • A warehouse full of music geeks :)

Some other things which you might get to play with:

  • Other languages - Java, Python, C, possibly Erlang (see ‘messaging’ above), …
  • Messaging technologies like AMQP, ActiveMQ, XMPP
  • Music technology R&D projects
  • Lots more in a fast-growing company

In return we can offer challenging problems in an interesting domain, competitive pay, and a great work environment for music fans in our E2 warehouse!

Enquiries to matthew@playlouder.com

80k of client-side-only storage for javascript, without browser extensions

Thought I would share this hack.

The problem - you want to maintain some state on the client, but you don’t want to send this state on a pointless round-trip to the server with every request, as typically happens with Cookies.

There is a way around this though!

  1. add an hidden iframe to your page, with src=”/client-side-cookie/blank.html”
  2. from this directory, you serve a static empty HTML file, and, crucially, you serve this with Expires headers way into the future (see this Yahoo tip for some info about this technique)
  3. This file will not (typically) be re-requested before the time given in your Expire header
  4. Set cookies for the document in the iframe from your javascript code, with path=/client-side-cookie/, and with whatever expiry time you like. eg, iframe.contentDocument.cookie = ‘test=long_data_which_we_dont_want_to_send_to_the_server; path=/client-side-cookie/’
  5. When you want the data back in future - again, create the iframe (the HTML file will NOT be loaded from the server because of the Expires header, and so no cookies will be sent to the server). Then inspect iframe.contentDocument.cookie to get the data.
  6. Because you have restricted the path of the cookie, it will never be sent with requests for files outside of your special /client-side-cookie/ directory.
  7. Profit!

Problems:

  • This can’t be relied on for security or privacy purposes not to send the data to server. The user could purge their browser cache, do a hard refresh on the file, etc.
  • Even a far-future Expires header will expire eventually - and browsers may limit the length of Expires headers.
  • So you should be prepared for the event that this data might, albeit very infrequently, get sent to the server.
  • You are still limited to approx 4k per cookie (including key and value - google for detail on precisely what is supported cross-browser but it is at least very close to 4k)
  • You are limited to 20 cookies per domain (in older IE versions at least, others allow more)
  • So that caps it overall at about 80k, with some fiddling around to distribute the data between 20 separate cookies. Still, not to be sneezed at!

Mysterious Flash bug on change of background

Just incase anyone else runs into this and Googles.

If you have a flash movie which mysteriously seems to reload itself during some fairly innocuous and un-connected Javascript execution - take note:

Dynamically setting document.body.style.background was the culprit for us. Don’t ask how I managed to identify this as the culprit, suffice to say it involved a lot of logging statements and patience. Doing this immediately caused the flash movie to reload itself, causing havoc in our case as we use it to play music and connect to a socket server.

You may find, like us, that setting separate background properties, eg document.body.style.backgroundImage, worked around the issue.

Anyone hazard a guess as to why the Flash runtime feels the need to implement this behaviour? Perhaps something related to wmode=transparent? (although we’re not using it)

Of MySQL/Ruby, EventMachine, and the need for non-blocking APIs

Part of the service we’re building is a socket server which uses Flash’s XMLSocket API to push updates to clients. Initially we developed this using the excellent Twisted library in Python, but as it grew, having to duplicate some of our data model code in another language started to hurt, and it made sense for us to port it to Ruby.

Luckily by that point, the EventMachine library had sprung up, offering something very similar to Twisted for Ruby, and we’ve been using that since.

While it’s well known that Ruby’s threading is non-native and not particularly speedy, event-based libraries don’t actually require much use of threading - one is encouraged to structure ones code in such a way that you write small methods which are called asynchronous and return quickly, yielding back to the event loop. For those with client-side experience, this is quite comparable with Javascript runtimes, where there is no threading but a core event loop, the ability to register event handlers, call setTimeout, and asynchronous APIs for longer-running IO (AJAX anyone?).

For this to work well, it is essential that your event handlers do their business as quickly as possible, and yield back to the event loop - as everything else in the event queue is sat there waiting for you to finish. This is all very well, until you need to deal with IO - other things (pesky database servers and clients) have a nasty habit of taking a while to get back to you, and if the API you’re calling to communicate with them blocks you, then it’s blocking everything else in the event queue too.

One way to get around this (despite the concurrency paradigm being based around an event-loop rather than pre-emptive threading), is to have some spare threads lying around to take care of blocking API calls, and fire off an event to the core event loop thread when they’re done. A way of turning a blocking API call into a non-blocking one, something asynchronous. While Ruby’s threads aren’t native or very performant, this shouldn’t matter too much in this case, as the threads aren’t really being used to do very much - just to sit around waiting for IO.

While this doesn’t require an asyncronous API at the Ruby level, it does at least require that the API calls only block the current Ruby thread, and don’t require an interpreter-wide lock in order to go about their business.

Unfortunately, it seems that many (most?) C-based Ruby libraries, including MySQL/Ruby (rather crucial to many), block the whole interpreter while waiting for IO, because they aren’t able to yield to Ruby’s “green” threading code while calling a blocking C API. This is hard to work around unless there’s an non-blocking C API available (which there isn’t, currently, for MySQL, but is for Postgres, hence the non-blocking postgres Ruby library). It may be possible for the C extension to use a separate OS-level thread for the blocking API calls, but as I understand it, one has to be very careful when using multiple OS-level threads in a process which embeds the Ruby interpreter, as the interpreter is not natively-threadsafe in the least.

Anyway the unfortunate upshot of all this is that you can have as many Ruby threads as you like, but only one MySQL query will ever happen at a time. If you don’t believe me, try firing off a Thread.new { connection.execute(”sleep 10″) } and then see if you have any joy querying MySQL in the next 10 seconds. Even with a connection pool, you’re shit outta luck.

This kind of thing rather removes the whole point and usefulness of event-loop based libraries like EventMachine when used with MySQL, and makes ActiveRecord’s specially-thread-safe “allow_concurrency” option considerably less use when used with the MySQL adapter - if all the mysql query grunt work ends up serialized anyway, why bother using threads?

So, there’s a real need for non-blocking APIs, and for Ruby library writers, and (perhaps more critically) those working on the new round of Ruby implementations, to get serious about this if they want Ruby libraries to  be able to get anything out of sub-process-level concurrency.

There’s also a real need for an asynchronous C API for MySQL which Ruby library authors could use. This project appears to have been trying - looking forward to progress!

An interesting Ruby hash semantics gotcha

Thought this might amuse or perplex some Rubyists (or be useful to know - it’s been the source of a couple of hard-to-track-down bugs in the past).

>> {{} => true}[{}]
=> nil

>> {{} => true, {} => true}
=> {{}=>true, {}=>true}

but yet,

>> {} == {}
true

What’s going on here?

Ruby’s Hashes behave very strangely when you try to use a Hash itself as a key of a Hash.

This acts as a subtle gotcha when you try to memoize a function which takes hash arguments - and so a tricky-to-address bug in libraries like this: http://raa.ruby-lang.org/project/memoize/

Why?

Ruby calls Object#hash on each key of a Hash, using that numeric hash (small h) to allocate that object to a bucket of the underlying hash table data structure. Equality, when it comes to Hash lookups and unique keys of a Hash, will only happen if the keys generate the same numeric hash as a result of their hash methods.

For most ruby data structures, x.hash == y.hash is implied by x == y, and everything works fine.
But, not for Hashes themselves!

(NB. this also affects data structures like Arrays which themselves contain a Hash, since Array#hash must call hash recursively on its contents).

(Interestingly, for things like 1.0 == 1, x.hash == y.hash also fails. Note, x.hash == y.hash is always implied by x.eql?(y), but this equality isn’t a desperately useful one, and seems to have been constructed artificially as an equality for use with Hash which is consistent with .hash)

Why might it have been implemented this way?

Hashes are insensitive to the order of their keys - so, for example, we have:
{:a => true, :b => true} == {:b => true, :a => true}

When you’re actually being given two concrete objects to compare, you can just check that each key from the one has an equal corresponding value in the other, and vice versa.

But, when you’re asked to generate a numeric hash which is constant for the whole equivalence class, you’d have to do something to ensure the hashing isn’t order-sensitive. Like ordering the key/value pairs by their individual hashes before feeding into the hash function.

Some attempts at a fix in the form of a monkey-patched Hash#hash:

(yes, that’s pronounced ‘Hash hash hash’)

  1. Sort key/value pairs by the numeric hash of the pair first:
    class Hash
     def hash
       sort_by {|pair| pair.hash}.hash
     end
    
     def eql?(other)
       self == other
     end
    end
  2. Use an XOR of the hashes of the key/value pairs (XOR is order-insensitive, and should preserve entropy in the bits of the hash)
    class Hash
     def hash
       inject(0) {|hash,pair| hash ^ pair.hash}
     end
    
     def eql?(other)
       self == other
     end
    end

These then fix, eg:

>> {}.hash == {}.hash
=> true

>> {{} => true}[{}]
=> true

>> {{} => true, {} => false}
=> {{}=>false}

(Note, overriding eql? is required to make the last two work - it seems the Hash implementation uses eql? to do the equality comparison that follows the more approximate hash comparison)

Now, I’m sure there’s a reason Matz didn’t do it this way - perhaps a performance reason, perhaps a gotcha that I haven’t noticed with my approach. Perhaps it’ll be fixed in 1.9.
But at any rate, it’s useful to be aware of the issue.

Visualising the tea-making process

We Playlouder developers are constantly working to improve our users experience of our product, and the regular activity of making hot drinks is often an unnecessary distraction from this. While we’ve certainly made great reductions in our refreshment-preparation time (I for instance discovered that the time it takes for a cup of tea to steep is the same as the time it takes to smoke a cigarette, so the two tasks can be run in parallel for greater efficiency), there’s still a lot of unnecessary to-and-fro-ing betweeen the Playlouder office and the kitchen when one forgets a colleagues hot drink preferences that could be factored out.

Therefore, in an attempt to maximise the amount of time available to us to bring you more exciting social-music-discovery tools (And in tribute to the wonderful Indexed), Myself and Matt both attempted to improve the efficiency of the tea-making process through the power of maths:

My attempt:

My attempt at a tea-making decision-tree

Matt’s Attempt: (Slightly simplified - as he points out, we really need a third dimension for ‘amount of tea’ as Matt doesn’t drink tea at all. He has an espresso machine on his desk, though, for the ultimate in caffeine-provision efficiency.)

Matt's attempt at a tea-making diagram

I’m drinking branded Lawyer beer

Called ‘Wiggin Wallop’.

We have trendy lawyers.

I’m also working on something called ‘Brix’, which may interest those who saw my rather hastily-prepared LRUG talk last year. It’s another Ruby web framework - I know, I know - why yet another? Here’s an idea of the philosophy:

  • Ruby needs a component-based web framework, to compete with the likes Tapestry, Seaside and WebObjects
  • Separation, composability and loose coupling of components are more important for agile application development, than rigid separation of MVC layers
  • Components have lives on the client-side as well as the server side, and the server-side needs to handle javascript and css includes, and the instantiation of client-side javascript objects, with the minimum of hassle
  • Components take parameters. Components may be nested inside other components. Components may be requested on their own (an Ajax update?) or as part of a bigger component tree. This has big implications for the routing component of a framework.
  • Trying to adhere too religiously to what is typically a muddled interpretation of MVC, is often counter-productive
  • REST is wonderful for APIs, but it is the wrong paradigm for modular web application user interfaces in general. (It copes OK for a UI structure which is simple and closely coupled to the data model, but this isn’t typical of more dynamic web applications in my experience)
  • To rephrase: If your web application UI wants REST, give it a holiday. Preferably a long one. (DHH is allowed to be an opinionated jerk, why not me?)
  • XHTML is the wire protocol. We’re in the business of putting together DOM trees, not Strings.
  • Ever wanted to subclass HTMLDivElement ? Nope? Maybe just me?

The good news is that I’ve found that it’s actually been pretty easy to write, thanks to some of the great tools the Ruby community already has available. I’m only reinventing the parts of the wheel that were creaking badly - for the rest, I’m relying on:

  • Haml (take the hour’s time to learn this, it’s really clean, and especially well suited to programmatic generation of small chunks of DOM tree)
  • Rack ontop of Mongrel
  • ActiveSupport
  • for the time being, ActiveRecord (this is due for the chop once I find something closer to the relational model, or heck, something which can do Class Table Inheritance in something resembling an elegant fashion)
  • Bits cheekily stolen from Merb. I was close to building this entirely ontop of Merb, but I ran into some nasty segfaults on Leopard, it didn’t seem to play well with ActiveSupport, its Router would have needed replacing, and Merb’s controllers, while more lightweight than rails, still got in the way of entirely component-based dispatch. But I’m still really impressed with Merb - It’s like Rails done right - leaner, faster, without the cruft, and with the benefit of hindsight. Easier to extend, too.)

Minification

In case anyone noticed, we’ve done a bit of client-side optimization. Namely:

  • Javascript and CSS files are now ‘minified’ (I prefer ’squished’) as part of our build process, using the handy YUI compressor - this shaves a good 40% of bloat off our Javascript and 20% off our CSS, and makes them ever so slightly quicker to parse at the other end too
  • Common Javascript and CSS includes are now packaged up into combined packages, which saves a lot of HTTP requests
  • Javascript includes have moved to the bottom of the page, meaning they won’t delay page rendering

Some stuff we were already doing:

  • Far-ahead Expires headers on all static resources, meaning they’ll be filled from the browser’s cache without any HTTP request where possible
  • Gzipping static files with lighttpd - this shaves a good 76% off our Javascript for example (and still 74% off the minified javascript). It also shaves 82% off CSS, and interestingly manages to shave even more (83%) off minified CSS - indicating that stripping syntactically-irrelevant information actually makes the remaining data more amenable to compression in this case.

Some more still to do

  • Convert PNG24s to PNG8s with alpha. Yes PNG8s with more-than-just-1-bit alpha do exist! and are considerably less bulky than PNG24s. Sadly neither Photoshop nor imageMagick can export them, but Fireworks, or the PNGNQ utility can. They don’t work too well in IE6, but then what does…)
  • Consider using CSS sprites and background-position hackery for some of our icons, where possible, to cut down on requests
  • Serve up static files from assets1.playlouder.com and assets2.playlouder.com, to increase the number of concurrent requests browsers make (they typically limit to 2 per hostname)
  • Optimize our javascript to improve page initialization times - browser-native getElementsByClassName may help here, as may selectively delaying some DOM lookups until they’re needed, and using more bubbled-up event handling to avoid the need for more specific DOM lookups
  • Optimize the crap out of the server-side (another topic for another day…)

Much as I love Ruby

And much as we bend Rails to our will, I am getting a bit jealous of these guys developing web apps with Scala - an elegant hybrid functional/object-oriented language with a powerful type-inferencing type system, Erlang-style Actors and other goodies. It compiles and runs fast on the JVM too and can access Java libraries in quite a native way. It’s kinda like Ruby plus OCaml plus Java minus the suck of Java.

I think it’s because the inner maths and type theory geek in me (the one who can never quite get over how awesome http://en.wikipedia.org/wiki/Curry-Howard_isomorphism is) really misses having a powerful type system - and Scala’s does seem to hit the sweet spot when it comes to a middle ground between the bafflingly powerful Hindley-Milner extensions of Haskell and OCaml, and more accessible Object-oriented type systems with subtyping.

liftweb (or ‘Scala with Sails’ - see what they did there?) seems like a pretty neat framework too. I’m just plugging it so that someone else will do (continue doing) the work of making it sufficiently ‘enterprise-ready’ for me to use in ‘the real world’. ;-)