Archive for the 'Geekery' Category

‘Hijax’ - aren’t buzzwords great

Turns out, unbeknown to me, someone had already invented a buzzword for my “optional ajax navigation” technique.

Hijax” - because it Hijacks clicks on anchor tags and form submissions, and takes them over with Ajax.

I wonder if his code deals with all the same tricky corner cases that mine does, though.

Turns out someone’s made a dynamic history library for jQuery too. It looks a lot more modern and less verbose than the dhtmlHistory code we’re using at the moment - any kindly souls fancy porting it to work with prototype.js ?

Log rotation with Rails and cronolog

Some problems doing log rotation on Rails’ production.log:

  • The built-in rotation feature of the Logger class doesn’t play nice with concurrent fastcgi processes
  • Rails doesn’t appear to support the more advanced Log4r library, despite claiming to in a few places. In particular benchmarking seems to break it
  • Using logrotate proves annoying as you need to kill the fastcgi processes after rotating the logs

The solution appears to be relatively simple though - just use the standard logger, but pass it a pipe to cronolog rather than a filename. Don’t know why this wasn’t documented anywhere! So in environment.rb, or environments/production.rb say, something like:

# Use cronolog for log rotation
cronolog_io = IO.popen(
    'cronolog /space/log/%Y/%m/%d/production.log', 'w')
config.logger = Logger.new(cronolog_io)

Bus waiting times

Lazy bastard and travelcard-holder that I am, I regularly hover around the bus stop for a while waiting to see if a 55 or a 243 will arrive and take me some of the ~10min walk from tube station to Playlouder MSP (see, relevance to work!).

Having a maths degree which I struggle to put to use, I find myself wondering about the following problem: How long do you wait for a bus before you start walking? What is the optimal strategy for this?

To simplify things you might presume, along with numerous textbook probability questions, that buses follow an exponential distribution. In english: that buses are randomly scattered subject to an average number per time period. The answer is then quite clear-cut. There are only two valid strategies - a strategy in which you always wait indefinitely until a bus arrives, and a strategy where you always walk no matter what. If the average waiting time, plus the bus journey time, is greater than the walking time, then you always walk; otherwise you always wait.

This is a bit counter-intuitive, though, and doesn’t satisfy my desire for a practical strategy. In practise buses don’t follow an exponential distribution, the waiting times are correlated as buses are subject to bunching phenomena, service disruptions etc. So if you wait 3 hours and no bus arrives, you might well be able to infer extra information about subsequent waiting times based on more sophisticated distributional assumptions (perhaps a bus route suspension is a probable occurrence? perhaps you’re likely to have just missed a bunched grouping of buses?). And so with a real-life bus distribution, there is actually likely to be a valid strategy based on “Wait for n minutes; if no bus arrives then start walking”.

It turns out that there’s a lot theory on this, and the M/G/1 queue is a better model to use than the more simple M/M/1 queue based on an exponential waiting times. M/G/1 tells us that, if bus waiting times are correlated, you may actually expect to wait longer than the mean interval between buses! This is due to phenomena like bunching (’why do buses always come in threes?’), and the extra expected time can be given in quite a general way, in terms of the mean and variance of the distribution of inter-arrival times.

However the M/G/1 queuing model is a little too general to infer more detail, such as specific strategies on how long to wait before walking. To do this I’d need to assume a particular distribution for bus arrival times, and it seems at this point that most of the research turns to using empirical distributions (ie just going out and measuring loads of buses rather than trying to derive from a mathematical model), and simulations (simulate a bunch of buses on a computer and measure the arrival time distribution in different situations).

And so I lack a simple answer: given sensible real-world assumptions about bus waiting times, how do I calculate a value for my “Wait n minutes and then walk” strategy?! Queuing theorists please respond kthx.

On with some MSP work ;)

Migrating from Trac to Fogbugz

As we’re an ISP and need to do lots of customer-support stuff, we decided to migrate to FogBugz for both our customer support ticketing and our development bug and feature-tracking. It has some great features, although I do miss the wiki and the tighter source-control integration from Trac.

Anyhow nobody seems to have a decent migration script from Trac to Fogbugz, so I thought I’d post my script. It’s obviously pretty specific to both our Trac setup and our Fogbugz setup, with hard-coded IDs and so on, but hopefully will be a decent starting-point to make your own script.

Before you run this, you need to migrate your Trac database to MySQL if it isn’t already. Chances are it’s currently in SQLite. This isn’t entirely trivial - I followed the steps here.

Caveat: please don’t ask me to support or modify this script for you. You do get what you pay for here.

Continue reading ‘Migrating from Trac to Fogbugz’

Woo: New Safari

Apple has a new public beta of Safari 3! Nice to see all that work on the Webkit engine reach a wider audience.

The reason I’m excited, though, is that the new Safari has some great web development tools built in which previously were limited to the (somewhat unstable) nightly WebKit builds. There’s a great DOM inspector, you can view the engine’s render tree, and most crucially, there’s support for a decent javascript debugger. Previously debugging javascript in Safari has been hell.

How to get this working:

  • Install the public beta, and reboot (sigh - stop making me reboot Apple!)
  • Download the latest nightly build of webkit, and install Drosera from the dmg
  • Enter the following into your terminal of choice:

    defaults write com.apple.Safari WebKitScriptDebuggerEnabled -bool true
    defaults write com.apple.Safari IncludeDebugMenu 1
  • Fire up the new Safari, fire up Drosera and attach to Safari. Note the Debug menu and the Drosera goodness.

Note the new Safari supports custom CSS for form elements, although seems to switch to using some slightly dubious non-apple-looking custom widgets the moment you touch the CSS.

IPv6

wah-hey! buzzword!

I’ve had this vague notion in the back of my mind for a while now, that with IPv6’s multicast support, distributing streaming music to a membership would be incredibly slick and easy.. and as we all know, vague notions can sometimes turn into reality, and knock you on the back of the head when you are least expecting them.

also, I like to see what these people are doing with IPv6 too, even if it is tongue in cheek:

http://www.sixxs.net/misc/coolstuff/
The Great IPv6 Experiment

these guys have done it already:
http://www.blackcatnetworks.co.uk/

I’d be interested to see anyone else’s setups - don’t be shy.

P2P, data dissemination and the future of networking

While only indirectly relevant to the MSP - I found this recent Google tech talk has changed the way I think about networking a lot.

This is from one of the guys who helped build the whole paradigm of packet-switched networking, the internet and TCP/IP. As the dude points out, networking has moved on since then. Networking today is more about dissemination of named chunks of data, and less about end-to-end connections. Our networking protocols need rethinking with this in mind. P2P protocols like bittorrent are the first steps towards this new way of thinking, but are suboptimal in many ways. Go listen to the talk as he explains better than I could - although feel free to skip the first 1/3 or so if you don’t need the background on the history of packet-switched networking.

The relational model, RDF, and data modelling with loose semantics in ActiveRecord

Recently I’ve been attempting to model a variety of generic interfaces in our data layer: obvious ones like ‘Taggable’, but also ‘Discussable’, ‘Write-about-able’ (for want of a better word), ‘Nameable’ (useful for matching/splitting/merging records based on names and aliases), ‘Stashable’, ‘Buyable’ (currently ‘Product’). I’ve found some of these present tough challenges to good relational data modelling principles, and push the limits of ORM tools like ActiveRecord in a way that I felt like writing about.

Up until now I’ve resisted the temptation to use ActiveRecord’s ‘Polymorphic Associations’, which are the obvious candidate for implementing these sorts of generic relationships between objects. Why? from a mathematics background I can be a bit of a purist about the Relational Model, and these sorts of dirty ORM tricks violate it in all sorts of assorted horrible ways that would doubtless have Edgar Codd turning in his grave. My main objections are to the ideas of storing foreign keys to different tables’ primary keys in the same column (making proper foreign key constraints impossible), and storing schema metadata in the schema itself as strings (ActiveRecord’s ‘type’ columns).

It also brings up deep-seated philosophical unease about surrogate primary keys in general, but these are a necessary evil which I’ve grown to accept. That said, this plugin will help those who seek to minimize their use in ActiveRecord, which, unchecked, will sprinkle the things around like MSG in a chinese takeaway.

I’ve struggled to find good alternatives to polymorphic associations, though. The problem is this: you have a large-ish variety of tightly-modelled entities (Artist, User, Release, Track, Event, Content, Location, …) and you want the ability to form loose, generic, semantic associations between these objects. There’s a parallel here, on the OO side of the ORM chasm, with Java’s Interfaces, or Ruby’s Mixins, or C++’s multiple abstract superclasses. You want the ability to link to objects using some kind of common superclass/module/interface. Different approaches:

  • Have a common ‘base’ table, used by every object in the database, which is basically just holds a sequence of IDs which all the objects draw from for their primary keys. This is the approach Perl’s Tangram library takes, although is a little tricky in ActiveRecord. Codd-turning-in-grave factor: medium roast, due to the semantically-useless surrogate key, and the loss of static assurances from foreign keys. A foreign key pointing at a global ‘base’ table is rather pointless.
  • Have a table for each generic interface, with its own sequence of IDs as primary key. Objects which implement the interface have an extra column (’taggable_id’ say) with a unique key constraint and a foreign key pointing to the ID from (say) ‘taggable’. This has the advantage that the ‘taggable’ table can also store data common to all objects which implement the Taggable interface. Codd-turning-in-grave factor: slightly lower, since foreign keys can now point at specific interfaces. We’re still throwing surrogate keys around like confetti though. This approach parallels with the Class Table Inheritance ORM pattern, and reveals the way in which the ‘Has a’ / ‘Is a’ distinction beloved of OOP is a rather artificial one at the level of relational data.
  • Create separate join tables for each type of association. So ‘artists_tags’, ‘releases_tags’, ‘tracks_tags’, etc. This is best from a strict relational modelling point of view, but if you want associations that are polymorphic on both ends, the number of join tables can explode as n^2 in the number of object types, and this doesn’t lead to elegant queries or happy programmers. There is a possibility of building a View ontop of these separate tables which makes them all available in a combined table for ActiveRecord, but this I suspect would be highly inefficient. Codd-turning-in-grave factor: low.
  • ActiveRecord’s polymorphic associations. Codd-turning-in-grave factor: high, for the reasons I explained. But, it’s easy to implement with ActiveRecord and it works.

Why did I settle for the ugly solution (polymorhpic associations) in the end?

Firstly, while ‘table for each generic interface’ was my preferred approach, doing this in ActiveRecord, while possible, isn’t terribly pretty - it’s hard to make it look like an ‘implements’ or ‘is a’ relationship on the OO side, rather than ‘has a’. It also means adding a column for each interface to each table which needs to support that interface, and hence more migration work.

Secondly, I realised that the Relational Model isn’t all it’s cracked up to be when it comes to loose semantic modeling. There’s a parallel with the distinction between dynamic typing (Ruby) and static (Java) - sometimes you just need that extra flexibility in your model, and are willing to give up the static assurances to get it.

Taking this to extremes, of course, is the alternative philosophy to data modelling espoused by RDF triple-stores and query languages like SPARQL. Sadly robust support for these technologies is somewhat lacking in common web development frameworks, and the database back-ends don’t appear nearly as highly-developed, scalable and robust as SQL databases.

There are also parallels with single-kinded vs multi-kinded predicate logic, and logic databases with languages like Datalog (cf Prolog). Sadly these databases seem to have fallen by the wayside too in favour of SQL.

Introducing the back button and bookmarking to AJAX

One challenge I’ve alluded to before conerns the embedded streaming music player in our sidebar. What happens when it’s playing a track and you navigate to a new page? The music Must Carry On - the music player must not suffer from a full page reload - but how? We considered three options here.

The Frameset method

We put the sidebar/music player in a separate frame which doesn’t get reloaded. Pros:

  • Well-worn technique, easy to implement
  • The back button still works, kinda

Cons:

  • Deprecated by the W3C
  • Bookmarking doesn’t work - the URL in the address bar stays fixed
  • Frames generally a bit outdated and icky, and have some undesirable behaviours in terms of their UI

The popup method

We open a popup containing the music player, and subsequently command this popup to play things. Pros:

  • Back button and bookmarking unaffected
  • Reasonably easy to implement

Cons:

  • Popup blockers. Nuff said.
  • Users Just Don’t Like Popups. I don’t want to have an extra little browser window floating around or taking up space in the task bar and am likely to close it, either as a reflex reaction (scarred by ad popups?) or out of irritation.
  • The interface isn’t well-integrated with the main web page - the user can’t drag things into it, for example.
  • Have to switch to a separate window in order to change the music - a pain if you like to keep your main browser window maximized
  • It’s easy to forget where the music is actually coming from this way!

The naive AJAX approach

We just re-load the main content area with an AJAX request! AJAX is the way forwards, right! Simple as!

Well, not quite. Pros:

  • Feels very responsive and avoids the ugliness of frames
  • Quick to implement with prototype.js
  • Instantly improves your score at Web 2.0 buzzword bingo

Cons:

  • Like frames, the URL in the address bar stays unchanged
  • The back button doesn’t work
  • Unless you’re careful, search engines may never find the AJAX-loaded content

I like javascript though, I hate popups, and I don’t like to give up too easily - so I looked into how I could mitigate or eliminate these three rather severe Cons.

The sophisticated AJAX approach

Bookmarking and the back button can be fixed in one fell swoop through the help of this most excellect library: Really Simple History / dhtmlHistory.js (Make sure you apply the very helpful patch in this comment here, to get this working with IE7). This library lets you insert items programmatically into the browser’s history stack, and will let you intercept a ‘history event’ (back/forward button click) with a javascript callback. It will also update the address bar for you when the browser history state changes - although this is subject to the limitation that it can only change the anchor portion of the URL without doing a reload of the page. How does it do this cross browser? Do you even want to know? It’s slightly hacky under the hood, with different techniques used in different browsers - an off-screen IFRAME for one. But wrapped up in a nicely-abstracted library, which makes all the difference.

How did I address the search-engine problem? I make the AJAX navigation an entirely optional thing. Pages just use normal HTML links and forms - and if javascript is disabled (or you’re a search engine bot), you’ll follow the links without any cleverness, loading the whole layout each time.

If you have javascript on and are listening to some music though - we have global javascript event handlers which catch click events on anchor tags and submit events on forms - and convert them, behind the scenes, into Ajax requests. The web application is told, by means of an extra GET parameter, to render the page without the surrounding layout.

There’s quite a bit of extra complexity which I’ve not mentioned here - there were challenges in having this framework cope with redirects, tell which type of events it should and shouldn’t intercept, maintain compatibility with existing HTML onclick/onsubmit attributes, displaying an hourglass during requests and so on. There were also significant challenges in dynamically loading the CSS and javascript includes which individual pages require, and handling initialization and finalization of javascript on individual pages without a full reload. But it’s doable, and, provided you’re willing to apply a bit of discpline to the way you structure your client-side code, it’s doable in a way that’s reasonably transparent to the developer.

So - do you think this is neat, do you think I’m insane, or both of the above? That is the question. :)

If there’s interest, I may wrap up this collection of techniques into a nicely-abstracted library and unleash it on the world. It’s not ideal of course - really it just demonstrates how much we’re in need of new standards for rich applications on the web - but there’s something satisfying about having this work.

On structured Javascript in Rails - first some criticisms

Much as I love Rails, and ubiquitous as its use is amongst the current crop of agile ‘Web 2.0′ start-ups, I feel its approach to Javascript and client-side code leaves a lot of room for improvement.

As the new Playlouder app does some quite clever things with Javascript, improve it is what we’ve done. So I thought I’d write a little about our approach and some of the issues we’ve had. Before this though, I’m going to outline some of my issues with the now-standard Rails approach, RJS templates. For the un-initiated, these consist of a domain-specific language embedded in Ruby which allows one to generate a range javascript expressions. Expressions which, when evalled in response to an AJAX request, are rather handy for updating the DOM and running some of the neat range of effects available through the Scriptaculous library.

At first glance, these are wonderfully magical and clever, but on a deeper inspection they leave rather a bad taste in my mouth. Why?

  • The magic makes it look, at a first glance, as though one can genuinely compile Ruby to Javascript in a similar way to, say, Google’s Web Toolkit and its compilation of Java, or HaXe (an interesting platform which I’m keeping an eye on). This is not infact the case, and the magic is limited to a fairly restrictive set of possibilities. I would prefer something a little more transparent, to a clever Ruby DSL whose magic boasts more than it can realistically deliver.
  • They’re geared very much towards a server-centric model where client-side code is kept to a minimum, DOM updates and effects are generally only pushed to the client in response to an AJAX request. If yours is just a matter of adding the now-requisite ’sprinkle of AJAX’ to a traditionally-structured web application then this is fine - ideal even - but if you need to push things even a little beyond this then you’ll start to find it a real pain:
  • When you’re maintaining a library of ‘real’ client-side javascript code alongside the magically-generated kind, it can lead to duplication in two different languages (RJS and Javascript), and to confusing spaghetti code. Some code will be found in standalone little snippets of RJS in your view folders and controllers, whereas other code that runs alongside it will be in proper javascript classes - often the two will get out of sync, and bugs will arise from updating the one without remembering about the other.
  • While it’s true that one can make one’s own RJS helpers, the little snippets of RJS can be rather brittle and don’t exactly lead effortlessly into good object-oriented design and code re-use practices for the client-side code.
  • RJS encourages requests to the server (what should I do next? give me more javascript to run!) when they’re not really needed, when more sophisticated client-side code could easily cache results or figure out what to do without additional data from the server. I suspect this may be part of the reason some AJAX-heavy Rails-based apps have gotten a bad rep for performance.
  • Whatever happened to the MVC design pattern so beloved to Rails on the client-side? It’s nice to have the client run its own controller code, and to have a little model/view/controller separation on the client side as well. It’s also nice to have structure in place for re-usable UI widgets - with their own controller code, views and data binding. RJS doesn’t really address this need or provide structure for it - although the helpers we have for some of the Scriptaculous controls are a step in the right direction, they too leave a lot of room for improvement.

In case this sounds overly critical, I have absolutely no beef with using RJS in applications which don’t require a significant client-side component, but which benefit from a little special AJAX sauce on some of their forms. Use the right tools for the right job and so on. But I would definitely warn against it to those setting out to write a moderately complex web application with significant client-side interactivity.

Next entry, find out which alternatives we chose, and some suggested best practises for structuring client-side code within a Rails application.