Recently I’ve been attempting to model a variety of generic interfaces in our data layer: obvious ones like ‘Taggable’, but also ‘Discussable’, ‘Write-about-able’ (for want of a better word), ‘Nameable’ (useful for matching/splitting/merging records based on names and aliases), ‘Stashable’, ‘Buyable’ (currently ‘Product’). I’ve found some of these present tough challenges to good relational data modelling principles, and push the limits of ORM tools like ActiveRecord in a way that I felt like writing about.
Up until now I’ve resisted the temptation to use ActiveRecord’s ‘Polymorphic Associations’, which are the obvious candidate for implementing these sorts of generic relationships between objects. Why? from a mathematics background I can be a bit of a purist about the Relational Model, and these sorts of dirty ORM tricks violate it in all sorts of assorted horrible ways that would doubtless have Edgar Codd turning in his grave. My main objections are to the ideas of storing foreign keys to different tables’ primary keys in the same column (making proper foreign key constraints impossible), and storing schema metadata in the schema itself as strings (ActiveRecord’s ‘type’ columns).
It also brings up deep-seated philosophical unease about surrogate primary keys in general, but these are a necessary evil which I’ve grown to accept. That said, this plugin will help those who seek to minimize their use in ActiveRecord, which, unchecked, will sprinkle the things around like MSG in a chinese takeaway.
I’ve struggled to find good alternatives to polymorphic associations, though. The problem is this: you have a large-ish variety of tightly-modelled entities (Artist, User, Release, Track, Event, Content, Location, …) and you want the ability to form loose, generic, semantic associations between these objects. There’s a parallel here, on the OO side of the ORM chasm, with Java’s Interfaces, or Ruby’s Mixins, or C++’s multiple abstract superclasses. You want the ability to link to objects using some kind of common superclass/module/interface. Different approaches:
- Have a common ‘base’ table, used by every object in the database, which is basically just holds a sequence of IDs which all the objects draw from for their primary keys. This is the approach Perl’s Tangram library takes, although is a little tricky in ActiveRecord. Codd-turning-in-grave factor: medium roast, due to the semantically-useless surrogate key, and the loss of static assurances from foreign keys. A foreign key pointing at a global ‘base’ table is rather pointless.
- Have a table for each generic interface, with its own sequence of IDs as primary key. Objects which implement the interface have an extra column (’taggable_id’ say) with a unique key constraint and a foreign key pointing to the ID from (say) ‘taggable’. This has the advantage that the ‘taggable’ table can also store data common to all objects which implement the Taggable interface. Codd-turning-in-grave factor: slightly lower, since foreign keys can now point at specific interfaces. We’re still throwing surrogate keys around like confetti though. This approach parallels with the Class Table Inheritance ORM pattern, and reveals the way in which the ‘Has a’ / ‘Is a’ distinction beloved of OOP is a rather artificial one at the level of relational data.
- Create separate join tables for each type of association. So ‘artists_tags’, ‘releases_tags’, ‘tracks_tags’, etc. This is best from a strict relational modelling point of view, but if you want associations that are polymorphic on both ends, the number of join tables can explode as n^2 in the number of object types, and this doesn’t lead to elegant queries or happy programmers. There is a possibility of building a View ontop of these separate tables which makes them all available in a combined table for ActiveRecord, but this I suspect would be highly inefficient. Codd-turning-in-grave factor: low.
- ActiveRecord’s polymorphic associations. Codd-turning-in-grave factor: high, for the reasons I explained. But, it’s easy to implement with ActiveRecord and it works.
Why did I settle for the ugly solution (polymorhpic associations) in the end?
Firstly, while ‘table for each generic interface’ was my preferred approach, doing this in ActiveRecord, while possible, isn’t terribly pretty - it’s hard to make it look like an ‘implements’ or ‘is a’ relationship on the OO side, rather than ‘has a’. It also means adding a column for each interface to each table which needs to support that interface, and hence more migration work.
Secondly, I realised that the Relational Model isn’t all it’s cracked up to be when it comes to loose semantic modeling. There’s a parallel with the distinction between dynamic typing (Ruby) and static (Java) - sometimes you just need that extra flexibility in your model, and are willing to give up the static assurances to get it.
Taking this to extremes, of course, is the alternative philosophy to data modelling espoused by RDF triple-stores and query languages like SPARQL. Sadly robust support for these technologies is somewhat lacking in common web development frameworks, and the database back-ends don’t appear nearly as highly-developed, scalable and robust as SQL databases.
There are also parallels with single-kinded vs multi-kinded predicate logic, and logic databases with languages like Datalog (cf Prolog). Sadly these databases seem to have fallen by the wayside too in favour of SQL.