Wednesday, 21 September 2011

Is that a fish in your ear?

There's a new book out which I haven't read yet. However, that never stopped anyone posting an Amazon review, so I'll throw my thoughts into the pot. It's called Is that a fish in your ear?: Translation and the meaning of everything, by David Bellos. His son Alex wrote a book called Numberland, which I also haven't read but is always on Waterstone's featured displays.

I've got another book called The meaning of everything (which is excellent, by the way - by Simon Winchester, about the Oxford English Dictionary), so no points for the sub-title. Points for the title though, which references the Babelfish from Douglas Adams' Hitchhiker's guide to the galaxy.

There was an extract of this book featured in the Independent the other day, describing how Google Translate works. Google Translate is a much-mocked tool, and originally rightly so. It could be relied upon to give you absolute garbage, no matter what you put into it. Hours of fun could be had translating text from one language to another and back again, and sniggering at the Chinese whispers result. Even better fun if you put it through more than one language on the way. These days, however, Google translate is disappointingly good. It gets translations pretty much completely accurate most of the time (NB It still should NOT be used to translate if you don't know the output language - you cannot guarantee it isn't utter nonsense).

The section featured in the Independent describes how it works. Here's an extract from the extract:
In fact, at bottom, it doesn't deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.
The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.
It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation. Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what's been submitted to it.
This is fascinating, and obviously a good way to do it. After all, people do speak and write in fairly formulaic chunks a lot of the time. It's an efficiency device, so that we don't have to create new expressions from scratch all the time. This is why you get annoying cliches like at the end of the day and in any way, shape or form. It's also why you have standard greetings (how's it going) and ways of expressing yourself like I'm so sick of (X).

And as the author points out, human translators basically work this way too: they can often pre-empt the person they're translating and guess what will come next, based on frequently-used expressions. But this way of translating assumes that everything we say or write (or almost everything) has been said before. One of the first things we tell beginning linguistics students is that we can come up with a completely new sentence, that's never been uttered before, and any speaker of English can understand it. The standard practice is then to come up with some ridiculous sentence, like All of my armadillos have been put through too hot a wash and have shrunk.

I suppose that, faced with this sentence, Translate would take its constituent parts and translate them. So, for instance, it might find the string too hot a wash, or even have been put through too hot a wash, paired with a translation, somewhere in its corpus.

In fact, I just tried it and it didn't fare so well. I put it through an English-French-English process and it came back with this translation:
All my tattoos have been too hot to wash one and have narrowed
If you fiddle with the alternate translations you get there eventually, though I'm not sure how idiomatic it is. Ah well. There's jobs for human translators yet.

If you're waiting for the paperback edition of this book, in the meantime I highly recommend Mouse or rat?: Translation as negotiation by Umberto Eco. I have read that one, and it's utterly engrossing. 

