the more things change …

I’ve been reading Jeni (Tennison)’s Musings about techie web stuff XML, RDF, etc.  Two articles particularly caught my eye.  One was Versioning URIs about URIs for real world and conceptual objects (schools, towns), and in particular how to deal with the fact that these change over time.  The other was Working With Fragmented Overlapping Markup all about managing multiple hierarchies of structure for the same underlying data.

In the past I’ve studied issues both of versioning and of multiple structures on the same data1, and Jeni lays out the issues for both really clearly. However, both topics gave a sense of deja vu, not just because of my own work, but because they reminded me of similar issues that go way back before the web was even thought of.

Versioning URIs and unique identifiers2

In my very first computing job (COBOL programming for Cumbria County Council) many many years ago, I read an article in Computer Weekly about choice of keys (I think for ISAM not even relational DBs). The article argued that keys should NEVER contain anything informational as it is bound to change. The author gave an example of standard maritime identifiers for a ship’s journey (rather like a flight number) that were based on destination port and supposed to never change … except when the ship maybe moved to a different route. There is always an ‘except’, so, the author argued, keys should be non-informational.

Just a short while after reading this I was working on a personnel system for the Education Dept. and was told emphatically that every teacher had a DES code given to them by government and that this code never changed. I believed them … they were my clients. However, sure enough, after several rounds of testing and demoing when they were happy with everything I tried a first mass import from the council’s main payroll file. Validations failed on a number of the DES numbers. It turned out that every teacher had a DES number except for new teachers where the Education Dept. then issued a sort of ‘pretend’ one … and of course the DES number never changed except when the real number came through. Of course, the uniqueness of the key was core to lots of the system … major rewrite :-/

The same issues occurred in many relational DBs where the spirit (rather like RDF triples) was that the record was defined by values, not by identity … but look at most SQL DBs today and everywhere you see unique but arbitrary identifying ids. DOIs, ISBNs, the BBC programme ids – we relearn the old lessons.

Unfortunately, once one leaves the engineered world of databases or SemWeb, neither arbitrary ids nor versioned ones entirely solve things as many real world entities tend to evolve rather than metamorphose, so for many purposes http://persons.org/2009/AlanDix is the same as http://persons.org/1969/AlanDix, but for others different: ‘nearly same as’ only has limited transitivity!

  1. e.g. Modelling Versions in Collaborative Work and Collaboration on different document processing platforms; quite a few years ago now![back]
  2. edited version of comments I left on Jeni’s post[back]