In explaining how to make sentences more readable (I know I am one to talk!), I frequently explain to students that language understanding is a combination of a schema-based syntactic structure with more sequential associative reading. Only recently I realised this was also the way we had been addressing the issue of task sequence inference in the TIM project. and is related also to the way we interpret action in the real world.
Reading language
“The parrot, which was continually watched by the big black cat, who sometimes sat on the mat until displaced by the bigger brown Doberman and sometimes on the garden wall in the hope of catching the sparrow, flew through the window.” (sentence 1)
Why is this difficult to read? It is grammatically well-formed and has commas in all the right places. However, the large subordinate clause (if I have got my grammatical terms right) means that by the time you have go to “flew through the window” you have ‘forgotten’ the parrot who it was all about.
In contrast, a computer parser has no such trouble. Take the fragment:
if ( y > 10 ) { x1 = y+1; x2 = y+2; . . . // 97 more lines of code x100 = y+100 } else { . . .
The compiler does not ‘forget’ about the ‘if’, because it is part of a relatively simple syntactic structure:
if ( condition ) { statement_block } else { statement_block }
In the English sentence about the parrot, like the program fragment, even the ‘depth’ of the parsing stack is not deep, yet still there is confusion.
Years ago when I was a student in Cambridge a friend who was studying Politics came to me with a passage in Nozick’s “Anarchy State and Utopia“. I read the paragraph in question several times. Each time it made sense, until I came to the last few words, which seemed to be a total non sequitur. Only after many readings did I realise that the paragraph was a single sentence (all 10 lines of it!), and that the sentence was of precisely the same form as the parrot one, except with nested sub-clauses. The words that appeared to come from nowhere were in fact the equivalent of “flew through the window” … indeed imagine if the description of the Doberman had extended to another seven lines!
Clearly our ability to hold on to relatively simple hierarchical parsing structures is quite limited, and is definitely smaller than the 7+/- 2 for working memory1. However, there is also another mechanism at work.
Consider the following alternative sentences:
“The policeman, which was continually watched by the big black cat, who sometimes sat on the mat until displaced by the bigger brown Doberman and sometimes on the garden wall in the hope of catching the sparrow, flew through the window.” (sentence 2)
“The parrot, which was continually watched by the big black cat, who sometimes sat on the mat until displaced by the bigger brown Doberman and sometimes on the garden wall in the hope of catching a snail, flew through the window.” (sentence 3)
It is hard to read these ‘fresh’ as if you hadn’t read the initial sentences, but notice that the first is far harder than the second. This is because we have an additional sequential associative mechanism at work as well as syntactic parsing.
Consider instead the much simpler ‘sentence’:
“bone dog eat”
We have little problem making sense of this even though the structure is not grammatical. The verb ‘eat’ is ‘looking for’ something that eats to act as its subject, and something that can be eaten to be its object. As there is only one eatable and one eater in the ‘sentence’, the parsing (or perhaps connecting) happens purely by association.
If there are several ‘matches’ for the words then the word combinations become ambiguous:
“policeman demonstrator hit”
Some languages, such as English, largely deal with this ambiguity by use of syntactic rules (subject verb object), others, such as Latin, by tagging the words with their role … and I don’t know my Latin enough to give a proper example, but I think something like, in pigeon Latin: “policemanus demonstratori hit” vs “policemani demonstratorus hit”! I believe Latin has quite strong syntactic rules as well, but there are certainly languages where order is less significant, you can think of them as Lego block languages; you throw the words into a bag, shake them up, and they stick together where the words fit together.
When we are reading a sentence both forms of parsing are happening at once. There is a level of grammatical parsing, but also when a word, like ‘parrot’ in sentence 1, is encountered and does not get ‘bound’ it is ‘waiting’ for a verb to connect to. Later in the sentence, when we encounter ‘flew’ it wants something that can fly to be its subject, if ‘parrot’ is still sufficiently in mind then all is well and we get “The parrot flew through the window”. However, the last thing we read was ‘sparrow’ a thing that can fly. The sequential associative parsing says “sparrow flew” and even if the syntactic parsing ‘wins’, the conflict still makes the sentence hard to grasp.
Sentence 2 was chosen to make the binding of the grammatical subject ‘policeman’ unusual to bind to ‘flew’ and so more confusing, whereas sentence 3 is less confusing as there is no obvious flying candidate except the parrot.
Poets make good use of these multiple mechanisms as it allows them to bend grammar and yet have (sometimes) comprehensible sentences. For my students it is not usually poetry I am after in their PhD theses, but understanding the role of sequential parsing can help us to construct more comprehensible English.
Reading action
It was only quite recently I realised there was a very close parallel to work we had been doing in the TIM project on tasks2 – that is action rather than language.
There are various algorithms used in intelligent user interface research to help predict the next action based on the previous one. Mostly these make use of some sort of Markov model or similar sliding window where the next command is the one that has occurred most often after the preceding N commands. However, there have been various proposals for performing more hierarchical pattern formation effectively building task structures as found in hierarchical task analysis.
A fundamental block to both approaches is that human activity is not always focused on one task at a time, but we do a little of one, then a little of another, just as I might break off from writing to make a cup of tea.
In the TIM project we have been using a personal ontology for various purposes including helping to propose values for actions such as web forms. If there is a ‘name’ field, then the ontology includes names of friends so these can be proposed as potential options to the user.
Because of the ontology, we are also able to detect that the telephone number used in one web form is the telephone number of the person in the mail that has just been received. This semantic binding through the ontology means that we can effectively see that individual actions are or are not related depending on whether they have semantic relationships through the ontology, that is we can thread together the actions that belong a single task sequence and pull them out of the chronological sequence that involves a mix of task and activities.
There are many differences between this and the sequential associative parsing, not least that the basic actions we are considering are typically parameterised (the completion of a web form) while words are singular, but there are also connections. Previously, the relationship between task analysis and traces of actions has been compared with that between grammar and sentences, basically treating task analysis as a way to ‘parse’ activity3. This parallels the conventional parsing approach. However, what I only belatedly realised was the use of semantic relationship between actions we are adopting in the TIM project also parallels the use of semantic connections between words in our task inference.
Reading the world
The above is about the way a computer can make sense of human action. However, as humans we need to make sense of the things that happen around us in the world not least other people.
When we sense the world we are confronted with myriad stimuli, as I write the sound of an ambulance passing outside, the insistent hiss of the kettle on the stove, all clamour against the words I type. Against this visual, aural and tactile cacophony, we have to unravel the threads of meaning. If my leg aches I do not ascribe it to the cup of tea I am drinking, or the ambulance that recently passed, but instead to the run round the Circo Massimo earlier this morning. However, equally I do not ascribe it to running on the beach a weak ago. Temporal and semantic proximity both play their part in our reading of the signs of day-to-day life. It is not surprising that these same tools are at play as we listen or read language.
We also perceive patterns of actions in the world, most significantly in the actions of other creatures. In a keynote at Tamodia in Pisa last year, I discussed the production of human action and the rich intertwining of sequential planned or proceduralised activity with more stimulus-driven reactions. Both could be explicit or tacit. The computer aid is trying to ‘read’ these patterns of intentional (and reactive) activity.
pre-planned | environment-driven | |
---|---|---|
explicit | (a) following known plan of action | (b) means-end analysis |
implicit | (c) proceduralised or routine actions | (d) stimulus-response reaction |
It is of course likely that our ability to produce and understand language builds on our ability to act within and to comprehend the world. However, the key difference is that language is about intentional communication. As well as expecting parts to link together in fragments, we also expect the whole to make sense. In a detective novel each incident may seem disparate, but we are sure that in the end they will come together in the dénouement. This is a large-scale application of Grice’s cooperation principle. For us interpreting the world and each other’s actions, or for the computer interpreting human action, no such overarching message is expected with the exception of life as a whole: God, or death, depending on your beliefs, as the ultimate dénouement.
- George A. Miller. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. The Psychological Review, 1956, vol. 63, pp. 81-97[back]
- The best account of this task inference at present is probably my keynote Tasks = data + action + context: automated task assistance through data-oriented analysis at Engineering Interactive Systems 2008 last September in Pisa.[back]
- Personally, I first used this ‘task analysis as grammar’ approach in teaching task analysis and it appeared in the task model slides for the third edition if the HCI book (but not in the book itself, maybe fourth edition). However, it has been elaborated in work with Stavros Asimakopoulos and Robert Fyldes “Grammatically interpreted task analysis for supply chain forecasting“.[back]
Pingback: Alan’s blog » Language and Action (2): from observation to communication