the more things change …

I’ve been reading Jeni (Tennison)’s Musings about techie web stuff XML, RDF, etc.  Two articles particularly caught my eye.  One was Versioning URIs about URIs for real world and conceptual objects (schools, towns), and in particular how to deal with the fact that these change over time.  The other was Working With Fragmented Overlapping Markup all about managing multiple hierarchies of structure for the same underlying data.

In the past I’ve studied issues both of versioning and of multiple structures on the same data1, and Jeni lays out the issues for both really clearly. However, both topics gave a sense of deja vu, not just because of my own work, but because they reminded me of similar issues that go way back before the web was even thought of.

Versioning URIs and unique identifiers2

In my very first computing job (COBOL programming for Cumbria County Council) many many years ago, I read an article in Computer Weekly about choice of keys (I think for ISAM not even relational DBs). The article argued that keys should NEVER contain anything informational as it is bound to change. The author gave an example of standard maritime identifiers for a ship’s journey (rather like a flight number) that were based on destination port and supposed to never change … except when the ship maybe moved to a different route. There is always an ‘except’, so, the author argued, keys should be non-informational.

Just a short while after reading this I was working on a personnel system for the Education Dept. and was told emphatically that every teacher had a DES code given to them by government and that this code never changed. I believed them … they were my clients. However, sure enough, after several rounds of testing and demoing when they were happy with everything I tried a first mass import from the council’s main payroll file. Validations failed on a number of the DES numbers. It turned out that every teacher had a DES number except for new teachers where the Education Dept. then issued a sort of ‘pretend’ one … and of course the DES number never changed except when the real number came through. Of course, the uniqueness of the key was core to lots of the system … major rewrite :-/

The same issues occurred in many relational DBs where the spirit (rather like RDF triples) was that the record was defined by values, not by identity … but look at most SQL DBs today and everywhere you see unique but arbitrary identifying ids. DOIs, ISBNs, the BBC programme ids – we relearn the old lessons.

Unfortunately, once one leaves the engineered world of databases or SemWeb, neither arbitrary ids nor versioned ones entirely solve things as many real world entities tend to evolve rather than metamorphose, so for many purposes http://persons.org/2009/AlanDix is the same as http://persons.org/1969/AlanDix, but for others different: ‘nearly same as’ only has limited transitivity!

  1. e.g. Modelling Versions in Collaborative Work and Collaboration on different document processing platforms; quite a few years ago now![back]
  2. edited version of comments I left on Jeni’s post[back]

fix for WordPress shortcode bug

I’m starting to use shortcodes heavily in WordPress1 as we are using it internally on the DEPtH project to coordinate our new TouchIT book.  There was minor bug which meant that HTML tags came out unbalanced (e.g. “<p></div></p”).

I’ve just been fixing it and posting a patch2, interestingly the bug was partly due to the fact that back-references in regular expressions count from the beginning of the regular expression, making it impossible to use them if the expression may be ‘glued’ into a larger one … lack of referential transparency!

For anyone having similar problems, full details and patch below (all WP and PHP techie stuff).

Continue reading

  1. see section “using dynamic binding” in What’s wrong with dynamic binding?[back]
  2. TRAC ticket #10490[back]

What’s wrong with dynamic binding?

Dynamic scoping/binding of variables has a bad name, rather like GOTO and other remnants of the Bad Old Days before Structured Programming saved us all1.  But there are times when dynamic binding is useful and looking around it is very common in web scripting languages, event propagation, meta-level programming, and document styles.

So is it really so bad?

Continue reading

  1. Strangely also the days when major advances in substance seemed to be more important than minor advances in nomenclature[back]

spam going up

I noticed the size of my spam folder seemed to be going up.  16.5 Mb in the first 14 days of July.  I checked back to see what it was previously (I am truly sad and archive my Trash folders, so can see what t was!).  The 9 months October-June was 154 Mb, so with July at 33Mb in one month that looks like near doubling in rate.  Actually looking back the previous 19 month period was 88Mb, so again doubling.

I checked Eudora’s record of the numbers of emails arriving (right).  This incoming mail is  dominated by Spam and shows it going up from about 3000 a week in January to over 5000 a week now, again consonant with a doubling about every 9 months.  I’m not sure if this is because the server-side spam exclusion is letting more through or because the total volume is increasing. If the latter, the trend is worrying, not just for those of us personally trying to cope with the volume, but also for mail servers.

Moore’s law for disk capacity is about doubling in 18 months, and personally I’ve noticed that my actual document sizes tend to double about every 3 years (more media etc.), so basically disk space keeps ahead of disk need.  However, if the Spam volume is doubling every 9 months it is faster than disk size increases, so mail servers may find themselves struggling with the throughput, even if they can filter and remove it.

Is it just me or are other people seeing a similar pattern?

grammer aint wot it used two be

Fiona @ lovefibre and I have often discussed the worrying decline of language used in many comments and postings on the web. Sometimes people are using compressed txtng language or even leetspeak, both of these are reasonable alternative codes to ‘proper’ English, and potentially part of the natural growth of the language.  However, it is often clear that the cause is ignorance not choice.  One of the reasons may be that many more people are getting a voice on the Internet; it is not just the journalists, academics and professional classes.  If so, this could be a positive social sign indicating that a public voice is no longer restricted to university graduates, who, of course, know their grammar perfectly …

Earlier today I was using Google to look up the author of a book I was reading and one of the top links was a listing on ratemyprofessors.com.  For interest I clicked through and saw:

“He sucks.. hes mean and way to demanding if u wanan work your ass off for a C+ take his class1

Hmm I wonder what this student’s course assignment looked like?

Continue reading

  1. In case you think I’m a complete pedant, personally, I am happy with both the slang ‘sucks’ and ‘ass’ (instead of ‘arse’!), and the compressed speech ‘u’. These could be well-considered choices in language. The mistyped ‘wanna’ is also just a slip. It is the slightly more proper “hes mean and way to demanding” that seems to show  general lack of understanding.  Happily, the other comments, were not as bad as this one, but I did find the student who wanted a “descent grade” amusing 🙂 [back]

a simple PHP record sorter class

Not for the first time I needed to sort arrays of arrays in PHP (structures like tiny DB tables).  I have previously written little wrapper functions round usort, but decided this time to make a small class. It is a simple, but generic utility, so popping it up in case useful to anyone.

The rest of this post has moved to a permanent page at:

http://www.alandix.com/blog/code/sorter/

a new version of … on downgrades and preferences

I’m wondering why people break things when they create new versions.

Firefox used to open a discreet little window when you downloaded papers.  Now-a-days it opens a full screen window completely hiding the browser.

A minor issue, but makes me wonder about both new versions and also defaults and personalisation in general.

Continue reading

going SIOC (Semantically-Interlinked Online Communities)

I’ve just SIOC enabled this blog using the SIOC Exporter for WordPress by Uldis Bojars.  Quoting from the SIOC project web site:

The SIOC initiative (Semantically-Interlinked Online Communities) aims to enable the integration of online community information. SIOC provides a Semantic Web ontology for representing rich data from the Social Web in RDF.

This means you can explore the blog as an RDF Graph including this post.

<sioc:Post rdf:about="http://www.alandix.com/blog/?p=176">
    <sioc:link rdf:resource="http://www.alandix.com/blog/?p=176"/>
    <sioc:has_container rdf:resource="http://www.alandix.com/blog/index.php?sioc_type=site#weblog"/>
    <dc:title>going SIOC (Semantically-Interlinked Online Communities)</dc:title>
    <sioc:has_creator>
        <sioc:User rdf:about="http://www.alandix.com/blog/author/admin/" rdfs:label="alan">
            <rdfs:seeAlso rdf:resource="http://www.alandix.com/blog/index.php?sioc_type=user&amp;sioc_id=1"/>
        </sioc:User>
    </sioc:has_creator>
...

Crash Report

You would think crash reporting would be made as seamless and helpful as possible, after all your product has just failed in some way and you wish (a) to mollify the user; and (b) to solicit their assistance in obtaining a full report.

You would think …

In the following I will reflect on what goes wrong in Adobe’s crash reporting and some  lessons we can learn from it.

Continue reading