Sandwich proofs and odd orders

Revisiting an old piece of work I reflect on the processes that led to it: intuition and formalism, incubation and insight, publish or perish, and a malaise at the heart of current computer science.

A couple of weeks ago I received an email requesting an old technical report, “Finding fixed points in non-trivial domains: proofs of pending analysis and related algorithms” [Dx88].  This report was from nearly 30 years ago, when I was at York and before the time when everything was digital and online. This was one of my all time favourite pieces of work, and one of the few times I’ve done ‘real maths’ in computer science.

As well as tackling a real problem, it required new theoretical concepts and methods of proof that were generally applicable. In addition it arose through an interesting story that exposes many of the changes in academia.

[Aside, for those of more formal bent.] This involved proving the correctness of an algorithm ‘Pending Analysis’ for efficiently finding fixed points over finite lattices, which had been developed for use when optimising functional programs. Doing this led me to perform proofs where some of the intermediate functions were not monotonic, and to develop forms of partial order that enabled reasoning over these. Of particular importance was the concept of a pseudo-monotonic functional, one that preserved an ordering between functions even if one of them is not itself monotonic. This then led to the ability to perform sandwich proofs, where a potentially non-monotonic function of interest is bracketed between two monotonic functions, which eventually converge to the same function sandwiching the function of interest between them as they go.

Oddly while it was one my favourite pieces of work, it was at the periphery of my main areas of work, so had never been published apart from as a York technical report. Also, this was in the days before research assessment, before publish-or-perish fever had ravaged academia, and when many of the most important pieces of work were ‘only’ in technical report series. Indeed, our Department library had complete sets of many of the major technical report series such as Xerox Parc, Bell Labs, and Digital Equipment Corporation Labs where so much work in programming languages was happening at the time.

My main area was, as it is now, human–computer interaction, and at the time principally the formal modelling of interaction. This was the topic of my PhD Thesis and of my first book “Formal Methods for Interactive Systems” [Dx91] (an edited version of the thesis).   Although I do less of this more formal work now-a-days, I’ve just been editing a book with Benjamin Weyers, Judy Bowen and Philippe Pallanque, “The Handbook of Formal Methods in Human-Computer Interaction” [WB17], which captures the current state of the art in the topic.

Moving from mathematics into computer science, the majority of formal work was far more broad, but far less deep than I had been used to. The main issues were definitional: finding ways to describe complex phenomena that both gave insight and enabled a level of formal tractability. This is not to say that there were no deep results: I recall the excitement of reading Sannella’s PhD Thesis [Sa82] on the application of category theory to formal specifications, or Luca Cardelli‘s work on complex type systems needed for more generic coding and understanding object oriented programing.

The reason for the difference in the kinds of mathematics was that computational formalism was addressing real problems, not simply puzzles interesting for themselves. Often these real world issues do not admit the kinds of neat solution that arise when you choose your own problem — the formal equivalent of Rittel’s wicked problems!

Crucially, where there were deep results and complex proofs these were also typically addressed at real issues. By this I do not mean the immediate industry needs of the day (although much of the most important theoretical work was at industrial labs); indeed functional programming, which has now found critical applications in big-data cloud computation and even JavaScript web programming, was at the time a fairly obscure field. However, there was a sense in which these things connected to a wider sphere of understanding in computing and that they could eventually have some connection to real coding and computer systems.

This was one of the things that I often found depressing during the REF2014 reading exercise in 2013. Over a thousand papers covering vast swathes of UK computer science, and so much that seemed to be in tiny sub-niches of sub-niches, obscure variants of inconsequential algebras, or reworking and tweaking of algorithms that appeared to be of no interest to anyone outside two or three other people in the field (I checked who was citing every output I read).

(Note the lists of outputs are all in the public domain, and links to where to find them can be found at my own REF micro-site.)

If this had been pure mathematics papers it is what I would have expected; after all mathematics is not funded in the way computer science is, so I would not expect to see the same kinds of connection to real world issues. Also I would have been disappointed if I had not seen some obscure work of this kind; you sometimes need to chase down rabbit holes to find Aladdin’s cave. It was the shear volume of this kind of work that shocked me.

Maybe in those early days, I self-selected work that was both practically and theoretically interesting, so I have a golden view of the past; maybe it was simply easier to do both before the low-hanging fruit had been gathered; or maybe just there has been a change in the social nature of the discipline. After all, most early mathematicians happily mixed pure and applied mathematics, with the areas only diverging seriously in the 20th century. However, as noted, mathematics is not funded so heavily as computer science, so it does seem to suggest a malaise, or at least loss of direction for computing as a discipline.

Anyway, roll back to the mid 1980s. A colleague of mine, David Wakeling, had been on a visit to a workshop in the States and heard there about Pending Analysis and Young and Hudak’s proof of its correctness . He wanted to use the algorithm in his own work, but there was something about the proof that he was unhappy about. It was not that he had spotted a flaw (indeed there was one, but obscure), but just that the presentation of it had left him uneasy. David was a practical computer scientist, not a mathematician, working on compilation and optimisation of lazy functional programming languages. However, he had some sixth sense that told him something was wrong.

Looking back, this intuition about formalism fascinates me. Again there may be self-selection going on, if David had had worries and they were unfounded, I would not be writing this. However, I think that there was something more than this. Hardy and Wright, the bible of number theory , listed a number of open problems in number theory (many now solved), but crucially for many gave an estimate on how likely it was that they were true or might eventually have a counter example. By definition, these were non-trivial hypotheses, and either true or not true, but Hardy and Wright felt able to offer an opinion.

For David I think it was more about the human interaction, the way the presenters did not convey confidence.  Maybe this was because they were aware there was a gap in the proof, but thought it did not matter, a minor irrelevant detail, or maybe the same slight lack of precision that let the flaw through was also evident in their demeanour.

In principle academia, certainly in mathematics and science, is about the work itself, but we can rarely check each statement, argument or line of proof so often it is the nature of the people that gives us confidence.

Quite quickly I found two flaws.

One was internal to the mathematics (math alert!) essentially forgetting that a ‘monotonic’ higher order function is usually only monotonic when the functions it is applied to are monotonic.

The other was external — the formulation of the theorem to be proved did not actually match the real-world computational problem. This is an issue that I used to refer to as the formality gap. Once you are in formal world of mathematics you can analyse, prove, and even automatically check some things. However, there is first something more complex needed to adequately and faithfully reflect the real world phenomenon you are trying to model.

I’m doing a statistics course at the CHI conference in May, and one of the reasons statistics is hard is that it also needs one foot on the world of maths, but one foot on the solid ground of the real world.

Finding the problem was relatively easy … solving it altogether harder! There followed a period when it was my pet side project: reams of paper with scribbles, thinking I’d solved it then finding more problems, proving special cases, or variants of the algorithm, generalising beyond the simple binary domains of the original algorithm. In the end I put it all into a technical report, but never had the full proof of the most general case.

Then, literally a week after the report was published, I had a notion, and found an elegant and reasonably short proof of the most general case, and in so doing also created a new technique, the sandwich proof.

Reflecting back, was this merely one of those things, or a form of incubation? I used to work with psychologists Tom Ormerod and Linden Ball at Lancaster including as part of the Desire EU network on creativity. One of the topics they studied was incubation, which is one of the four standard ‘stages’ in the theory of creativity. Some put this down to sub-conscious psychological processes, but it may be as much to do with getting out of patterns of thought and hence seeing a problem in a new light.

In this case, was it the fact that the problem had been ‘put to bed’, enabled fresh insight?

Anyway, now, 30 years on, I’ve made the report available electronically … after reanimating Troff on my Mac … but that is another story.


[Dx91] A. J. Dix (1991). Formal Methods for Interactive Systems. Academic Press.ISBN 0-12-218315-0

[Dx88] A. J. Dix (1988). Finding fixed points in non-trivial domains: proofs of pending analysis and related algorithms. YCS 107, Dept. of Computer Science, University of York.

[HW59] G.H. Hardy, E.M. Wright (1959). An Introduction to the Theory of Numbers – 4th Ed. Oxford University Press.

[Sa82] Don Sannella (1982). Semantics, Imlementation and Pragmatics of Clear, a Program Specification Language. PhD, University of Edinburgh.

[WB17] Weyers, B., Bowen, J., Dix, A., Palanque, P. (Eds.) (2017) The Handbook of Formal Methods in Human-Computer Interaction. Springer. ISBN 978-3-319-51838-1

[YH96] J. Young and P. Hudak (1986). Finding fixpoints on function spaces. YALEU/DCS/RR-505, Yale University, Department of Computer Science

JavaScript gotcha: var scope

I have been using JavaScript for more than 15 years with some projects running to several thousand lines.  But just discovered that for all these years I have misunderstood the scope rules for variables.  I had assumed they were block scoped, but in fact every variable is effectively declared at the beginning of the function.

So if you write:

function f() {
    for( var i=0; i<10; i++ ){
        var i_squared = i * i;
        // more stuff ...

This is treated as if you had written:

function f() {
    var i, i_squared
    for( i=0; i<10; i++ ){
         i_squared = i * i;
         // more stuff ...

The Mozilla Developer Network describes the basic principle in detail, however, does not include any examples with inner blocks like this.

So, there is effectively a single variable that gets reused every time round the loop.  Given you do the iterations one after another this is perfectly fine … until you need a closure.

I had a simple for loop:

function f(items)
    for( var ix in items ){
        var item = items[ix];
        var value = get_value(item)

This all worked well until I needed to get the value asynchronously (AJAX call) and so turned get_value into an asynchronous function:


which fetches the value and then calls callback(value) when it is ready.

The loop was then changed to

function f(items)
    for( var ix in items ){
        var item = items[ix];
        get_value_async( item, function(value) {
                          }; );

I had assumed that ‘item’ in each callback closure would be bound to the value for the particular iteration of the loop, but in fact the effective code is:

function f(items)
    var ix, item;
    for( ix in items ){
        item = items[ix];
        get_value_async( item, function(value) {
                          }; );

So all the callbacks point to the same ‘item’, which ends up as the one from the last iteration.  In this case the code is updating an onscreen menu, so only the last item got updated!

JavaScript 1.7 and ECMAScript 6 have a new ‘let’ keyword, which has precisely the semantics that I have always thought ‘var’ had, but does not seem to widely available yet in browsers.

As a workaround I have used the slightly hacky looking:

function f(items)
    for( var ix in items ){
        (function() {
            var item = items[ix];
            get_value_async( item, function(value) {
                              }; );

The anonymous function immediately inside the for loop is simply there to create scope for the item variable, and effectively means there is a fresh variable to be bound to the innermost function.

It works, but you do need to be confident with anonymous functions!

Offline HTML5, Chrome, and infinite regress

I am using HTML5’s offline mode as part of the Tiree Mobile Archive project.

This is, in principle, a lovely way of creating web sites that behave pretty much like native apps on mobile devices.  However, things, as you can guess, do not always go as smoothly as the press releases and blogs suggest!

PhotobucketSome time I must write at length on various useful lessons, but, for now, just one – the potential for an endless cycle of caches, rather like Jörmungandr, the Norse world serpent, that wraps around the world swallowing its own tail.

My problem started when I had a file (which I will call ‘shared.prob’ below, but was actually ‘place_data.js’), which I had updated on the web server, but kept showing an old version on Chrome no matter how many times I hit refresh and even after I went to the history settings and asked chrome to empty its cache.

I eventually got to the bottom of this and it turned out to be this Jörmungandr, cache-eats-cache, problem (browser bug!), but I should start at the beginning …

To make a web site work off-line in HTML5 you simply include a link to an application cache manifest file in the main file’s <html> tag.  The browser then pre-loads all of the files mentioned in the manifest to create the application cache (appCache for short). The site is then viewable off-line.  If this is combined with off-line storage using the built-in SQLite database, you can have highly functional applications, which can sync to central services using AJAX when connected.

Of course sometimes you have updated files in the site and you would like browsers to pick up the new version.  To do this you simply update the files, but then also update the manifest file in some way (often updating a version number or date in a comment).  The browser periodically checks the manifest file when it is next connected (or at least some browsers check themselves, for some you need to add Javascript code to do it), and then when it notices the manifest has changed it invalidates the appCache and rechecks all the files mentioned in the manifest, downloading the new versions.

Great, your web site becomes an off-line app and gets automatically updated 🙂

Of course as you work on your site you are likely to end up with different versions of it.  Each version has its own main html file and manifest giving a different appCache for each.  This is fine, you can update the versions separately, and then invalidate just the one you updated – particularly useful if you want a frozen release version and a development version.

Of course there may be some files, for example icons and images, that are relatively static between versions, so you end up having both manifest files mentioning the same file.  This is fine so long as the file never changes, but, if you ever do update that shared file, things get very odd indeed!

I will describe Chrome’s behaviour as it seems particularly ‘aggressive’ at caching, maybe because Google are trying to make their own web apps more efficient.

First you update the shared file (let’s call it shared.prob), then invalidate the two manifest files by updating them.

Next time you visit the site for appCache_1 Chrome notices that manifest_1 has been invalidated, so decides to check whether the files in the manifest need updating. When it gets to shared.prob it is about to go to the web to check it, then notices it is in appCache_2 – so uses that (old version).

Now it has the old version in appCache_1, but thinks it is up-to-date.

Next you visit the site associated with appCache_2, it notices manifest_2 is invalidated, checks files … and, you guessed it, when it gets to shared.prob, it takes the same old version from appCacche_1 🙁 🙁

They seem to keep playing catch like that for ever!

The only way out is to navigate to the pseudo-url ‘chrome://appcache-internals/’, which lets you remove caches entirely … wonderful.

But don’t know if there is an equivalent to this on Android browser as it certainly seems to have odd caching behaviour, but does seem to ‘sort itself out’ after a time!  Other browsers seem to temporarily have problems like this, but a few forced refreshes seems to work!

For future versions I plan to use some Apache ‘Rewrite’ rules to make it look to the browser that the shared files are in fact to completely different files:

RewriteRule  ^version_3/shared/(.*)$   /shared_place/$1 [L]

To be fair the cache cycle more of a problem during development rather than deployment, but still … so confusing.

Useful sites:

These are some sites I found useful for the application cache, but none sorted everything … and none mentioned Chrome’s infinite cache cycle!

    The W3C specification – of course this tell you how appCache is supposed to work, not necessarily what it does on actual browsers!
    It is called “A Beginner’s Guide to using the Application Cache”, but is actually pretty complete.
    Really useful quick reference, but:  “FACT: Any changes made to the manifest file will cause the browser to update the application cache.” – don’t you believe it!  For some browsers (Chrome, Android) you have to add your own checks in the code (See “Updating the cache” section in “A Beginner’s Guide …”).).
    Wonderful on-line manifest file validator checks both syntax and also whether all the referenced files download OK.  Of course it cannot tell whether you have included all the files you need to.

using the Public Suffix list

On a number of occasions I have wanted to decompose domain names, for example in the URL recogniser in Snip!t.  However, one problem has always been the bit at the end.  It is clear that ‘com’ and ‘’ are the principle suffixes of ‘’ and ‘’ respectively.  However, while I know that for UK domains it is the last two components that are important (second level domains), I never knew how to work this out in general for other countries.  Happily, Mozilla and other browser vendors have an initiative called the Public Suffix List , which provides a list of just these important critical second level (and deeper level) suffixes.

I recently found I needed this again as part of my Talis research.  There is a Ruby library and a Java sourceforge project for reading the Public Suffix list, and an implementation by the DKIM Reputation project, that transforms the list into generated tables for C, PHP and Perl.  However, nothing for easily and automatically maintaining access to the list.  So I have written a small PHP class to parse, store and access the Public Suffix list. There is an example in the public suffix section of the ‘code’ pages in this blog, and it also has its own microsite including more examples, documentation and a live demo to try.

Hierarchical grammars for more human-like compiler parsing

Nearly twenty years ago, back when I was in York, one of my student project suggestions was to try to make compiler parsers operate a little more like a human: scanning first for high-level structures like brackets and blocks and only moving on to finer level features later.  If I recall there were several reasons for this, including connections with ‘dynamic pointers’1, but most important to help error reporting, especially in cases of mismatched brackets or missing ‘;’ from line ends … still a big problem.

Looking back I can see that one MEng student considered it, but in the end didn’t do it, so it lay amongst that great pile of “things to do one day” and discuss occasionally over tea or beer. I recall too looking at grammar-to-grammar parsers … I guess now-a-days I might imagine using XSLT!

Today, 18 years on, while scanning David Unger’s publications I discover that he actually did this in the Java parser at Sun2.  I don’t know if this is actually used in the current Java implementations.  Their reasons for looking at the issue  were to do with making the parser easier to maintain, so it may actually be that this is being done under the hood, but the benefits for the Java programmer not being realised.

While I was originally thinking about programming languages, I have more recently found myself using the general methods in anger when doing data cleaning as often one approaches this in a pipeline fashion, creating elements of structure along the way that are picked up by future parsing/cleaning steps.

To my knowledge there are no general purpose tools for doing this.  So, if anyone is looking for a little project, here is my own original project suggestion from 1993 …

When compilers parse a computer program, they usually proceed in a sequential, left-to-right fashion. The computational requirement of limited lookahead means that the syntax of programming languages must usually be close to LL(1) or LR(1). Human readers use a very different strategy. They scan the text for significant features, building up an understanding of the text in a more top down fashion. The human reader thus looks at the syntax at multiple levels and we can think of this as a hierarchical grammar.

The purpose of this project is to build a parser based more closely on this human parsing strategy. The target language could be Pascal or C (ADA is probably a little complex!). The parser will operate in two or more passes. The first pass would identify the block structure, for example, in C this would be based on matching various brackets and delimiters `{};,()’. This would yield a partially sequential, partially tree-like structure. Mismatched brackets could be detected at this stage, avoiding the normally confusing error messages generated by this common error. Subsequent passes would `parse’ this tree eventually obtaining a standard syntax tree.

Depending on progress, the project can develop in various ways. One option is to use the more human-like parsing to improve error reporting, for example, the first pass could identify the likely sites for where brackets have been missed by analysing the indentation structure of the program. Another option would be to build a YACC-like tool to assist in the production of multi-level parsers.


1.  S. P. Robertson, E. F. Davis, K. Okabe and D. Fitz-Randolf, “Program comprehension beyond the line”3, pp. 959-963 in Proceedings of Interact’90, North-Holland, 1990.
2.  Recommended reading from compiler construction course
3.  YACC manual from UNIX manual set.

  1. For more on Dynamic Pointers see my first book “Formal Methods for Interactive Systems“, a CSCW journal paper “Dynamic pointers and threads“[back]
  2. Modular parser architecture with mini parsers. D M Ungar, US Patent 7,089,541, 2006[back]
  3. Incidentally, “Program comprehension beyond the line” is a fantastic paper both for its results and also methodologically. In the days when eye-tracking was still pretty complex (maybe still now!), they wanted to study program comprehension, so instead of following eye gaze, they forced experimental subjects to physically scroll through code using a  single-line browser.  [back]

Names, URIs and why the web discards 50 years of computing experience

Names and naming have always been a big issue both in computer science and philosophy, and a topic I have posted on before (see “names – a file by any other name“).

In computer science, and in particular programming languages, a whole vocabulary has arisen to talk about names: scope, binding, referential transparency. As in philosophy, it is typically the association between a name and its ‘meaning’ that is of interest. Names and words, whether in programming languages or day-to-day language, are, what philosophers call, ‘intentional‘: they refer to something else. In computer science the ‘something else’ is typically some data or code or a placeholder/variable containing data or code, and the key question of semantics or ‘meaning’ is about how to identify which variable, function or piece of data a name refers to in a particular context at a particular time.

The emphasis in computing has tended to be about:

(a) Making sure names have unambiguous meaning when looking locally inside code. Concerns such as referential transparency, avoiding dynamic binding and the deprecation of global variables are about this.

(b) Putting boundaries on where names can be seen/understood, both as a means to ensure (a) and also as part of encapsulation of semantics in object-based languages and abstract data types.

However, there has always been a tension between clarity of intention (in both the normal and philosophical sense) and abstraction/reuse. If names are totally unambiguous then it becomes impossible to say general things. Without a level of controlled ambiguity in language a legal statement such as “if a driver exceeds the speed limit they will be fined” would need to be stated separately for every citizen. Similarly in computing when we write:

function f(x) { return (x+1)*(x-1); }

The meaning of x is different when we use it in ‘f(2)’ or ‘f(3)’ and must be so to allow ‘f’ to be used generically. Crucially there is no internal ambiguity, the two ‘x’s refer to the same thing in a particular invocation of ‘f’, but the precise meaning of ‘x’ for each invocation is achieved by external binding (the argument list ‘(2)’).

Come the web and URLs and URIs.

Fiona@lovefibre was recently making a test copy of a website built using WordPress. In a pure html website, this is easy (so long as you have used relative or site-relative links within the site), you just copy the files and put them in the new location and they work 🙂 Occasionally a more dynamic site does need to know its global name (URL), for example if you want to send a link in an email, but this can usually be achieved using configuration file. For example, there is a development version of Snip!t at cardiff.snip! (rather then, and there is just one configuration file that needs to be changed between this test site and the live one.

Similarly in a pristine WordPress install there is just such a configuration file and one or two database entries. However, as soon as it has been used to create a site, the database content becomes filled with URLs. Some are in clear locations, but many are embedded within HTML fields or serialised plugin options. Copying and moving the database requires a series of SQL updates with string replacements matching the old site name and replacing it with the new — both tedious and needing extreme care not to corrupt the database in the process.

Is this just a case of WordPress being poorly engineered?

In fact I feel more a problem endemic in the web and driven largely by the URL.

Recently I was experimenting with Firefox extensions. Being a good 21st century programmer I simply found an existing extension that was roughly similar to what I was after and started to alter it. First of course I changed its name and then found I needed to make changes through pretty much every file in the extension as the knowledge of the extension name seemed to permeate to the lowest level of the code. To be fair XUL has mechanisms to achieve a level of encapsulation introducing local URIs through the ‘chrome:’ naming scheme and having been through the process once. I maybe understand a bit better how to design extensions to make them less reliant on the external name, and also which names need to be changed and which are more like the ‘x’ in the ‘f(x)’ example. However, despite this, the experience was so different to the levels of encapsulation I have learnt to take for granted in traditional programming.

Much of the trouble resides with the URL. Going back to the two issues of naming, the URL focuses strongly on (a) making the name unambiguous by having a single universal namespace;  URLs are a bit like saying “let’s not just refer to ‘Alan’, but ‘the person with UK National Insurance Number XXXX’ so we know precisely who we are talking about”. Of course this focus on uniqueness of naming has a consequential impact on generality and abstraction. There are many visitors on Tiree over the summer and maybe one day I meet one at the shop and then a few days later pass the same person out walking; I don’t need to know the persons NI number or URL in order to say it was the same person.

Back to Snip!t, over the summer I spent some time working on the XML-based extension mechanism. As soon as these became even slightly complex I found URLs sneaking in, just like the WordPress database 🙁 The use of namespaces in the XML file can reduce this by at least limiting full URLs to the XML header, but, still, embedded in every XML file are un-abstracted references … and my pride in keeping the test site and live site near identical was severely dented1.

In the years when the web was coming into being the Hypertext community had been reflecting on more than 30 years of practical experience, embodied particularly in the Dexter Model2. The Dexter model and some systems, such as Wendy Hall’s Microcosm3, incorporated external linkage; that is, the body of content had marked hot spots, but the association of these hot spots to other resources was in a separate external layer.

Sadly HTML opted for internal links in anchor and image tags in order to make html files self-contained, a pattern replicated across web technologies such as XML and RDF. At a practical level this is (i) why it is hard to have a single anchor link to multiple things, as was common in early Hypertext systems such as Intermedia, and (ii), as Fiona found, a real pain for maintenance!

  1. I actually resolved this by a nasty ‘hack’ of having internal functions alias the full site name when encountered and treating them as if they refer to the test site — very cludgy![back]
  2. Halasz, F. and Schwartz, M. 1994. The Dexter hypertext reference model. Commun. ACM 37, 2 (Feb. 1994), 30-39. DOI=[back]
  3. Hall, W., Davis, H., and Hutchings, G. 1996 Rethinking Hypermedia: the Microcosm Approach. Kluwer Academic Publishers.[back]

fix for toString error in PHPUnit

I was struggling to get PHPUnit to run under PHP 5.2.9. I’ve only used PHPUnit a little, so may have simply got something wrong, but I kept getting the error:

Catchable fatal error: Object of class AbcTest could not be converted to string in {dir}/PHPUnit/Framework/TestFailure.php on line 98

The error happens in the PHPUnit_Framework_TestFailure::toString method, which tries to implicitly convert a test case to a string.

The class AbcTest is my test case, which it is trying to display following a test failure.  PHPUnit test cases all extend PHPUnit_Framework_TestCase and while this has a toString method it does not have the ‘magic method__toString required by PHP 5.2 onwards.

To fix the problem I simply added the following method to the class PHPUnit_Framework_TestCase in PHPUnit/Framework/TestCase.php .

public function __toString()
 return $this->toString();

I am using PHPUnit 3.4.9, but peeking at 3.5.0beta it looks the same.  I’m guessing the PHPUnit_Framework_TestFailure::toString method is not used much so has got missed since the change to PHP 5.2.x.

PHPUnit is now in GITHub so I really ought to work out how to submit corrections to that … but another day I think.

Time Machine – when it goes wrong and how to fix it

Unfortunately only fixing Mac OS X backup, not the Tardis 🙁 … but, nonetheless, critical.

What bit of software do you really need to be reliable?  If anything else goes really wrong you have the backup — but if the backup fails you really are lost.

And Mac OS X Time Machine, while it does have a very pretty  interface, is inclined to get stuck sometimes.

This is my own story of how it goes wrong … and how to put it right.

… and throughout I’ve dropped in a few lessons for anyone implementing critical system software — maybe the odd Apple engineer is reading

how to tell when things are wrong

Occasionally Time Machine seems to be stuck, but isn’t really.  When you first do a backup, or when you haven’t backed up to a particular disk for ages (perhaps if you have been away on a trip), it can spend several hours ‘preparing’.  You can tell it is ‘preparing’ because when you open the Time Machine preferences there is the little barbers pole saying ‘preparing’ 😉

This is when it is running over the disk working out what it needs to backup, and always seems to be the lengthiest operation, actually backing up the disk is often quite fast, and yet, for some reason there is no indication of how far through the ‘preparing’ process it has got.

Lesson 1: make sure you include progress indicators for anything that can take a while, not just the obvious ‘slow’ things.

So, when you see ‘preparing’, just be patient!

However, at least half-a-dozen times over the last year, my Time Machine has got completely stuck.  I have seen this happen in three ways:

(i)  it is still saying ‘preparing’ after leaving it overnight!

(ii)  it starts to transfer to disk, but then gets stuck part way:

(iii)  if you look in the Time Machine preferences it says the backup has failed

This last time in fact the first sign was (iii), but it doesn’t actually tell you (if you don’t look) until it has failed for ten days, by which time I was travelling.  In the days before Time Machine I always did a manual backup before travelling as I knew that was when things were most likely to go wrong, but now-a-days I have got used to relying on it and forget to check it is working OK … so if you are paranoid about your data, do peek occasionally at Time Machine to check it is still working!

When I got home and told Time Machine to backup to the Time Capsule here rather than my office disk (why can’t it remember that I have two backup disks??).  Then (after being very very patient while to was ‘preparing’ for four hours), I saw it got stuck in step (ii) at 1.4 GB or 4.2 GB.  Of course progress indicators are never very good for very slow operations, when transferring several GB of data there may be several minutes before the bar even moves a pixel … but I was very very patient and it definitely did not move!

Lesson 2: for very long processes supplement the progress indicator with some other indicator to show things are still working, in this case perhaps amount transferred in last minute

At this point I did the normal things, turn Time Machine On/Off,  restart machine a couple of times, etc., but when it persists then you know something is deeply wrong.

so why does it go wrong?

In fact Fiona@lovefibre has found Time Machine flawless for her desktop machine backing up to exactly the same Time Capsule.  I am guessing the problem I have is because I use a laptop so possible reasons:

  • it may go to sleep occasionally, breaking connection to the Time Capsule
  • maybe the WiFi aerial on a laptop is not as good as the desktop

However, if every laptop failed as often surely Apple would have fixed it by now.  So guessing there is an additional factor:

  • my disk has 196 Gb of data, much of it in smaller document files (word docs, code files, etc.), not just a few giant movies.

The software will be designed to withstand a certain amount of external failure, especially when connecting to disks over WiFi as the Time Capsule is designed to do.  However, I imagine that there are places in the code where there are race conditions, or critical portions where external failure really makes a difference.  If the external connections are reliable and the backup is quite fast the likelihood of hitting one of the nasty spots in the code is low.  However, if you have a lot of data to check and then transfer and the external failures more frequent, then the likelihood of hitting one increases and things start to go wrong.

I see similar problems with other software, Dreamweaver in particular, which has got better, but still can crash if the Internet connection is poor (see also “Why software need never hang“).  What happens is that during testing, the test machines often have minimal data, little software (maybe just the operating system and what is being tested), and operate in perfect situations.  In such circumstances these hidden flaws never become apparent.

Lesson 3: make sure your test machine is fully loaded with data and applications, and operates in an unreliable environment, so that testing is realistic

However, this is not like Word crashing and losing your most recent edits to one document.  When Time Machine fails it seems to occasionally leave something corrupt in the backup disk so that subsequent attempts to backup also fail.  There is no excuse for this, the techniques for dealing with potential disk-writing failures are well established in both databases and low-level disk management.  For example, one can save a timestamp file at the end of successful operations so that, when  returning to the data, if the timestamp file is not there the software knows something went wrong last time.

Maybe Time Machine is trying to be too clever, picking up where it left off when, for example, connection to the disk is broken.  If so it clearly needs some additional mechanism to notice “I’ve tried this several times and it keeps going wrong, maybe I need to back off to the last successful state”.  Perhaps not something to worry about in less critical software, but not difficult to get right when it is really needed … as in backups!

Lesson 4: build critical software defensively in layers so that errors in one part do not affect the whole; and if saving to disk ensure there is some sort of atomic transaction

The aim during testing should be what I call “fail-fast programming” trying to make sure that failures happen during testing not real use!

One thing I found particularly disturbing about my most recent Time Machine hang is that when I looked at the system console it had regular spats of “unknown SIGSEGV” several times a minute … in the kernel!  If you don’t know UNIX internals the ‘kernel’ is the heart of the operating system of the Mac, where all the lowest level work is done and where if something goes wrong everything fails.  SIGSEGV means that some bit of software is trying to access a memory location that doesn’t exist.  In fact while this is caught it is not so bad, the greater worry is that if it is trying to access non-existent memory, then it may corrupt other memory … and the kernel has access to everything – not good.

Please, please Apple if you cannot get Time Machine to work properly, do not let it affect the kernel!

how to put it right

One might hope that even if Time Machine cannot notice itself there is something wrong at least there would be an option to say “restart yourself”.  One might hope, but there is not.  However, you can do it yourself by digging a little into the backup disk itself.

First problem is to stop the Time Machine backup if it has hung.

In the Time Machine control panel, you can simply slide the OFF-ON button to OFF.  The status should change to ‘stopping’ and after a while stop.  Then you can restart the machine and try to fix things.

This is the ideal thing to do, but I find that when Time Machine is really hung this rarely works.  I do turn it to OFF, but either it never changes to ‘stopping’ and stays ‘preparing’, or it changes to ‘stopping’, but never does.  If this happens the system restart typically doesn’t restart the system as Time Machine won’t stop running.  Then, always with much trepidation, I reach for the on/off button on the Mac itself :-/

After doing a hard on/off like this, I usually do anther restart from the Apple menu … not sure if this is necessary, but just to be on the safe side!

Occasionally I skip to the next step before the hard restart.

Then you can start to fix the problem properly.

Find the backup disk.  If it is not obvious in the Finder use the ‘Go’ menu and select “Computer”; it shows all the locally connected disks (or it may simply appear in the left hand favourites pane in each Finder window).

If you skipped the restart stage (or of you just peek now to see what it is like when it hasn’t gone wrong), you will see something like “Backup of Alan Dix’s MacBook Pro” (obviously for you it will not be “Alan Dix’s MacBook Pro”!).  This is the Time Machine backup.  However, if you have restarted the machine with Time Machine off you will have to find the actual disk that you chose as your backup disk and on it look for a file called something like “Alan Dix’s MacBook  Pro_0039fc56f8a2.sparsebundle”.  This is some form of compressed disk image.  In the older versions of Time Machine there was simply a folder with all the backups in it — I felt much more secure.  Now this is a single opaque file and I worry that if one day it gets corrupted :-/

Having found the ‘sparsebundle’ double click it and it will display a little pop-up window that says ‘checking volumes’.  I keep meaning to see if this ever stops, but I am not patient enough and press the button that says to skip this state and then (after a while) it mounts the disk image and the disk “Backup of Alan Dix’s MacBook Pro” appears.

Double click “Backup of Alan Dix’s MacBook Pro” and look inside and then inside the folder “Backups_backupd” and you find loads of dated folders, which are the actual backups of your system that you can browse if you prefer instead of using the Time Machine interface.  In addition there may be one file ending “.inProgress”, which is some sort of internal file created while it is in the middle of doing the backup.

Delete the “.inProgress” file.

In addition, I usually delete the last of the dated folders (sort by “Date Modified” to get the last one).  However, if you don’t want to lose the last backup you can try just deleting the “inProgress” file and only delete the last dated backup if Time Machine still gets stuck.

Important: only delete the latest of the dated backup folders (e.g. “2010-06-09-225547” in the screen shot above), NOT the entire “Alan Dix’s Macbook Pro” folder.  If you do that you lose all your backups!

I recall doing this all with extreme trepidation the first time, but had got to the point when I couldn’t do backups or access them anyway so had nothing to lose.  Actually it seems pretty OK getting in here and doing this sort of thing, the nice thing about Time Machine is that it uses ordinary folder structures that you can peek around in and see are there all secure.  I am much happier with this than the kind of backup where you only know if it is working the day you try to restore something!  At least half the times I have used such backups over the years I’ve found the backup is in some way corrupt or incomplete. So actually one up for Time Machine 🙂

Now reboot again (for luck).  Turn Time Machine back on in the control panel and wait … a long time … it will start ‘preparing’ as if for the first backup … and several hours later hopefully all will be well.

But do remember to set the power save options not to go to sleep in the middle!

In fact the above has always worked for me except for this last time when, for some reason (maybe I missed something on the way?), it hung again and I had to go through the whole process again.  This time I waited until yesterday evening before turning Time Machine back on so that I could leave it to do the long 4 hour ‘preparing’ stage without me doing anything else.

And then:


PHP syntax checker updated

Took a quick break today from Project Phoenix1.

I’ve had a PHP syntax checker on meandeviation for several years, but only checking PHP 4 as that is what is running on the server.  However, I had an email asking about PHP 5 , so now there is a PHP 5 version too 🙂

The syntax checker is a pretty simple layer over the PHP command line option “php -l” and also uses the PHP highlight_file function.  The main complication is parsing the HTML outputs of both as they change between versions of PHP!

There is also a download archive so you can also have it running locally on your own system.

  1. watch this space …[back]

What’s wrong with dynamic binding?

Dynamic scoping/binding of variables has a bad name, rather like GOTO and other remnants of the Bad Old Days before Structured Programming saved us all1.  But there are times when dynamic binding is useful and looking around it is very common in web scripting languages, event propagation, meta-level programming, and document styles.

So is it really so bad?

Continue reading

  1. Strangely also the days when major advances in substance seemed to be more important than minor advances in nomenclature[back]