the more things change …

I’ve been reading Jeni (Tennison)’s Musings about techie web stuff XML, RDF, etc.  Two articles particularly caught my eye.  One was Versioning URIs about URIs for real world and conceptual objects (schools, towns), and in particular how to deal with the fact that these change over time.  The other was Working With Fragmented Overlapping Markup all about managing multiple hierarchies of structure for the same underlying data.

In the past I’ve studied issues both of versioning and of multiple structures on the same data1, and Jeni lays out the issues for both really clearly. However, both topics gave a sense of deja vu, not just because of my own work, but because they reminded me of similar issues that go way back before the web was even thought of.

Versioning URIs and unique identifiers2

In my very first computing job (COBOL programming for Cumbria County Council) many many years ago, I read an article in Computer Weekly about choice of keys (I think for ISAM not even relational DBs). The article argued that keys should NEVER contain anything informational as it is bound to change. The author gave an example of standard maritime identifiers for a ship’s journey (rather like a flight number) that were based on destination port and supposed to never change … except when the ship maybe moved to a different route. There is always an ‘except’, so, the author argued, keys should be non-informational.

Just a short while after reading this I was working on a personnel system for the Education Dept. and was told emphatically that every teacher had a DES code given to them by government and that this code never changed. I believed them … they were my clients. However, sure enough, after several rounds of testing and demoing when they were happy with everything I tried a first mass import from the council’s main payroll file. Validations failed on a number of the DES numbers. It turned out that every teacher had a DES number except for new teachers where the Education Dept. then issued a sort of ‘pretend’ one … and of course the DES number never changed except when the real number came through. Of course, the uniqueness of the key was core to lots of the system … major rewrite :-/

The same issues occurred in many relational DBs where the spirit (rather like RDF triples) was that the record was defined by values, not by identity … but look at most SQL DBs today and everywhere you see unique but arbitrary identifying ids. DOIs, ISBNs, the BBC programme ids – we relearn the old lessons.

Unfortunately, once one leaves the engineered world of databases or SemWeb, neither arbitrary ids nor versioned ones entirely solve things as many real world entities tend to evolve rather than metamorphose, so for many purposes http://persons.org/2009/AlanDix is the same as http://persons.org/1969/AlanDix, but for others different: ‘nearly same as’ only has limited transitivity!

  1. e.g. Modelling Versions in Collaborative Work and Collaboration on different document processing platforms; quite a few years ago now![back]
  2. edited version of comments I left on Jeni’s post[back]

databases as people think – dabble DB

I was just looking at Enrico Bertini‘s blog Visuale for the first time for ages. In particular at his December entry on DabbleDB & Magic/Replace. Dabble DB allows web-based databases and in some ways sits in similar ground with Freebase, Swivel or even Google docs spreadsheet, all ways to share data of different forms on/through the web.

The USP for Dabble DB amongst other online data sharing apps, is that it appears to really be a complete database solution online … and its USB amongst conventional databses is the way they seem to have really thought about real use.  This focus on real use by ordinary users includes dynamically altering the structure of the data as you gradually understand it more.  The model they have is that you start with plain table data from a spreadsheet or other document and gradually add structure as opposed to the “first analyse and then enter” model of traditional DBs.

As I read Enrico’s blog I remembered that he had mailed me about the ‘magic/replace‘ feature ages ago.  This lets you tidy up  data during import (but apparently not data already imported … wonder why?), using a ‘by example’ approach and is a really nice example of all that ‘programming by example‘ and related work that was so hot 15 years ago eventually finding its way into real products.

The downside to Dabble DB is that editing is via forms only … it is often so much easier to enter data in a spreadsheet view, the API is quite limited, and while they have a ‘Dabble DB Commons‘ for public data (rather like Swivel), there is no directory or other way to see what people have put up 🙁

I was particularly hoping the API was better as it would have been nice to link it into my web version of Query-by-Browsing. or even integrate with the Query-through-Drilldown approach for constructing complex table joins that Damon Oram implemented more recently.

In general, while the DB and (many) UI features are strong it is not really looking outwards to creating shared linked data (in the broadest sense of the term, not just pure SemWeb world linked data), … so still room there for the absolute killer shared data app!

practical RDF

I just came across D2RQ, a notation (plus implementation) for mapping relational databases to RDF, developed over the last four years by Chris Bizer, Richard Cyganiak and others at Freie Universität Berlin. In a previous post, “digging ourselves back from the Semantic Web mire“, I worried about the ghetto-like nature of RDF and the need for “abstractions that make non-triple structures more like the Semantic Web”, D2RQ is exactly the sort of thing, allowing existing relational databases to be accessed (but not updated) as if they were and RDF triple stores, including full SPARQL queries.

As D2RQ has clearly been around for years, I tried to do a bit of a web search to find things the other way around – more programmer-friendly layers on top of RDF (or XML) allowing it to be manipulated with IDL-like or other abstractions closer to ‘normal’ programming. ECMAScript for XML (E4X) seems to be just this allowing reasonably easy access to XML (but I guess RDF would be ‘flat’ in this). E4X has been around a few years (standard since 2005), but as far as I can see not yet in IE (surprise!). I guess for really practical XML it would be JSON, and there’s a nice discussion of different RDF in JSON representation issues on the n2 wiki “RDF JSON Brainstorming“. However, both E4X and RDF in JSON still are just accessing RDF nicely not adding higher level structure.

Going back to the beginning I was wondering about any tools that represent RDF as SQL / RDMS in order to make it available to ‘old technology’ … but then remembered that SPARQL creates tuples not triples so, I guess, one could say that is exactly what it does :-/