Searle’s wall, computation and representation

Reading a bit more of Brain Cantwell Smith’s “On the Origin of Objects” and he refers (p.30-31) to Searle‘s wall that, according to Searle, can be interpreted as implementing a word processor. This all hinges on predicates introduced by Goodman such as ‘grue’, meaning “green is examined before time t or blue if examined after”:

grue(x) = if ( now() < t ) green(x)
          else blue(x)

The problem is that an emerald apparently changes state from grue to not grue at time t, without any work being done. Searle’s wall is just an extrapolation of this so that you can interpret the state of the wall at a time to be something arbitrarily complex, but without it ever changing at all.

This issue of the fundamental nature of computation has long seemed to me the ‘black hole’ at the heart of our discipline (I’ve alluded to this before in “What is Computing?“). Arguably we don’t understand information much either, but at least we can measure it – we have a unit, the bit; but with computation we cannot even measure except without reference to specific implementation architecture whether Turing machine or Intel Core. Common sense (or at least programmer’s common sense) tells us that any given computational device has only so much computational ‘power’ and that any problem has a minimum amount of computational effort needed to solve it, but we find it hard to quantify precisely. However, by Searle’s argument we can do arbitrary amounts of computation with a brick wall.

For me, a defining moment came about 10 years ago, I recall I was in Loughbrough for an examiner’s meeting and clearly looking through MSc scripts had lost it’s thrill as I was daydreaming about computation (as one does). I was thinking about the relationship between computation and representation and in particular the fast (I think fastest) way to do multiplication of very large numbers, the Schönhage–Strassen algorithm.

If you’ve not come across this, the algorithm hinges on the fact that multiplication is a form of convolution (sum of a[i] * b[n-i]) and a Fourier transform converts convolution into pointwise multiplication (simply a[i] * b[i]). The algorithm looks something like:

1. represent numbers, a and b, in base B (for suitable B)
2. perform FFT in a and b to give af and bf
3. perform pointwise multiplication on af and bf to give cf
4. perform inverse FFT on cf to give cfi
5. tidy up cfi a but doing carries etc. to give c
6. c is the answer (a*b) in base B

In this the heart of the computation is the pointwise multiplication at step 3, this is what ‘makes it’ multiplication. However, this is a particularly extreme case where the change of representation (steps 2 and 4) makes the computation easier. What had been a quadratic O(N²) convolution is now a linear O(N) number of pointwise multiplications (strictly O(n) where n = N/log(B) ). This change of representation is in fact so extreme, that now the ‘real work’ of the algorithm in step 3 takes significantly less time (O(n) multiplications) compared to the change in representation at steps 2 and 4 (FFT is O( n log(n) ) multiplications).

Forgetting the mathematics this means the majority of the computational time in doing this multiplication is taken up by the change of representation.

In fact, if the data had been presented for multiplication already in FFT form and result expected in FFT representation, then the computational ‘cost’ of multiplication would have been linear … or to be even more extreme if instead of ‘representing’ two numbers as a and b we instead ‘represent’ them as a*b and a/b, then multiplication is free. In general, computation lies as much in the complexity of putting something into a representation as it is in the manipulation of it once it is represented. Computation is change of representation.

In a letter to CACM in 1966 Knuth said¹:

When a scientist conducts an experiment in which he is measuring the value of some quantity, we have four things present, each of which is often called “information”: (a) The true value of the quantity being measured; (b) the approximation to this true value that is actually obtained by the measuring device; (c) the representation of the value (b) in some formal language; and (d) the concepts learned by the scientist from his study of the measurements. It would seem that the word “data” would be most appropriately applied to (c), and the word “information” when used in a technical sense should be further qualified by stating what kind of information is meant.

In these terms problems are about information, whereas algorithms are operating on data … but the ‘cost’ of computation has to also include the cost of turning information into data and back again.

Back to Searle’s wall and the Goodman’s emerald. The emerald ‘changes’ state from grue to not grue with no cost or work, but in order to ask the question “is this emerald grue?” the answer will involve computation (if (now()<t) …). Similarly if we have rules like this, but so complicated that Searle’s wall ‘implements’ a word processor, that is fine, but in order to work out what is on the word processor ‘screen’ based on the observation of the (unchanging) wall, the computation involved in making that observation would be equivalent to running the word processor.

At a theoretical computation level this reminds us that when we look at the computation in a Turing machine, vs. an Intel processor or lambda calculus, we need to consider the costs of change of representations between them. And at a practical level, we all know that 90% of the complexity of any program is in the I/O.

Donald Knuth, “Algorithm and Program; Information and Data”, Letters to the editor. Commun. ACM 9, 9, Sep. 1966, 653-654. DOI= http://doi.acm.org/10.1145/365813.858374 [back]