Wednesday, 1 August 2012

Lost 'lost ''lost' sign' sign' sign

A wonderful example of centre embedding from a ridiculously silly blog, via my friend Valdemar:

The image shows a lost sign, and the lost thing that it's advertising is another lost sign. And the thing that sign was advertising (before it got lost) was a lost sign... and so on. 

It's called centre embedding because, unsurprisingly, it means embedding a phrase in the centre of another one. By 'in the centre' we don't mean that it's precisely central, but rather that words from the higher-up phrase are on both sides of the embedded one. Here's an example of the more common type of embedding we find in English:
I really hate people [who don't think of others]
The bracketed part is a relative clause, which means that it tells you more about people, and it's embedded in the main clause. It's at the end, which is nice and easy to understand. We can go on for a surprisingly long time like this:
This is the farmer sowing his cornThat kept the cock that crowed in the mornThat waked the priest all shaven and shornThat married the man all tattered and tornThat kissed the maiden all forlornThat milked the cow with the crumpled hornThat tossed the dog that worried the catThat killed the rat that ate the maltThat lay in the house that Jack built!
Every line in that rhyme is a new embedded clause, but we can keep track of it all and it's not terribly remarkable. We actually do it quite a lot in normal speech. This example, which inspired Language Log's Trent Reznor Prize for Tricky Embedding, contains a whole stack of embedded clauses and other stuff but is completely understandable, and was produced in natural speech in an interview:
"When I look at people that I would like to feel have been a mentor or an inspiring kind of archetype of what I'd love to see my career eventually be mentioned as a footnote for in the same paragraph, it would be, like, Bowie."
The thing with centre embedding is that it is totally grammatical (it does not break any of the rules of English (by which I mean the rules that speakers intuitively know and that cannot be broken, rather than the prescriptive rules that we all break in our everyday speech), but not acceptable (i.e. speakers don't say things like this and if asked, don't think they are good sentences at all). This is very different from most other grammatical puzzles that we (linguists) have, which are far more often of the type 'this is ungrammatical in most dialects but some speakers produce it - why?' or 'this theory predicts this to be ungrammatical but it's not, because it occurs in language X - why?'. 

It's really striking how quickly examples of centre embedding get impossible to parse (work out the grammar of). In the poster, we can of course easily understand the phrase with no embedding at all:
Lost sign
But then even just one layer of embedding, equivalent to I hate people who don't think of others, is a bit hard to work out:
Lost lost sign sign
And then when you get just one more, it's too hard:
Lost lost lost sign sign sign
The quotation marks help a bit here, but not much, and that's obviously no good in spoken language. This example is obviously designed for humour, and some are more or less easy to work out. Wikipedia (yeah, I'm being lazy today - I've got a PhD to write) cites this example of double embedding, attributing it to De Roeck et al (1982):
Isn't it true [that example-sentences [that people [that you know] produce] are more likely to be accepted]?
The double-embedded part that might cause trouble is the that people that you know produce part, but here it's not too difficult, perhaps because we're used to hearing know+verb constructions. But the Wikipedia page also says (summarising Karlsson 2007) that three is the maximum degree of embedding in written language, and even two is vanishingly rare in spoken language. It gives this example of super-tricky centre embedding, where the first one (with one level of embedding, and not centre embedding) is fine, but adding just one centre-embedded clause makes it incredibly difficult to parse:
A man [that a woman loves]

A man [that a woman [that a child knows] loves]
It means a man who is loved by a woman, who in turn is known by a child. But you try working that out while you're in full conversational flow. It's supposed to be basically just that while we're super-good at keeping track of relations and actions, we're really really bad at keeping track of a whole load of subjects without linking them to their predicates (what they did). 

Finally, this completely incomprehensible paragraph from SpecGram

An apparently new speech disorder a linguistics department our correspondent 
visited was affected by has appeared. Those affected our correspondent a local grad student called could hardly understand apparently still speak fluently. The cause experts the LSA sent investigate remains elusive. Frighteningly, linguists linguists linguists sent examined are highly contagious. Physicians neurologists psychologists other linguists called for help called for help called for help didn’t help either. The disorder experts reporters SpecGram sent consulted investigated apparently is a case of pathological center embedding.


  1. Nice post! I'm fond of centre-embedding, and love Language Log's Trent Reznor Prize. Sometimes I play with the construction on Twitter, e.g., 'To say "To say 'To say X would be an understatement' would be a cliché" would be stating the obvious.'

    And '100 lists of 100 books that everyone should read that everyone should read', to which Kyle Jasmin replied: 'The grammar the tweet the man wrote contained entertained.'

  2. Thanks! Great examples too :)