linguistlaura: vocabulary

Showing posts with label vocabulary. Show all posts

Monday, 11 April 2022

At last, someone has written about Wordle!

I've held off on blogging about Wordle, because everyone else did it, and because I didn't have anything particular to say. People tend to assume that if you're a linguist, you like word games, but I don't think that's any more true for us than for normal people. Some of us do, others don't. I happen to love crosswords (because there is a quiz or a puzzle element) and dislike Scrabble (because I'm not good at anagrams). I do, as it happens, love Wordle. I love logic puzzles like sudoku, and this is basically just a logic puzzle with an added constraint.

There is, or used to be, a board game called Mastermind which was a pure logic version of Wordle. (If you don't know what Wordle is, by the way, I don't know where you've been. It's what we all spent the early part of 2022 doing.) There, the thing you had to guess was the sequence of coloured pegs. There were only a few colours, and only a sequence of four, so much fewer than the 26 letters and five slots that Wordle involves. And you needed like ten goes to get it, rather than the six that you get with Wordle. The rules were the same: you got told if you'd got one right and in the right place, or right but in the wrong place, or wrong. You weren't told which one, though, which did make it harder in that respect (otherwise it would have been incredibly easy). I loved this game and I'm not sure why I never had my own copy (maybe no one else liked playing it with me, or maybe I never mentioned that I liked it?) but I played it when I was at other people's houses.

So yes, I do love Wordle, because of the logic puzzle aspect. The word part of it does add something interesting for me, though. I like the constraint it puts on the possible answers. It's not the case, as in Mastermind, that every combination is equally possible. Some just aren't, or are much less likely, and that's due to the rules of either languages in general, or English in particular. So an example of a language-in-general thing is that there are going to be some vowels in the word, and some consonants. An example of an English-in-particular thing is that the last letter is probably an 'e' or a consonant, because we don't have so many words that end in 'a', 'i', 'o' or 'u' (though we do have some, so it's not absolutely ruled out). Another English-in-particular thing is that if you know you've got an 'h' in there somewhere, it's possibly the first letter but if it's not, you've likely also got a 't' for 'th' or a 'g' for 'gh' in there. Not always; ahead would have stumped me in that case.

Screenshot of my Wordle stats showing a normal distribution with most words taking me four guesses to get.

I've been paying attention to how I solve them, and I usually get the answer on the fourth go. I imagine this is true for most people, as we'd expect a 'normal distribution' with very few right on the first or second go (that's a lucky guess) and few taking six (that's some bad luck or a word that has many very similar to it).

I'm not sharing any new insights on how to solve them – I just do the same as you all do and rule out the most common letters first until I can see what it's likely to be. But what interests me is how quickly you get to the point where it can only realistically be one word. This is normally where I am by guess four.

Here are a couple of recent ones, where the answers were epoxy and lowly. Just coincidence that they both end in a 'y', I think. I vary my starting words but always try to include some common letters. Sometimes I just use things I see nearby like the dogs' names. In both these cases, by the time I'd had three guesses I didn't have many right, but I had ruled out nearly all the possibilities, and there was only one possible word that I could think of in each case that could fit what I knew.

Screenshot of Wordle with the word 'lowly', correct on the fourth go with few correct letters on the previous three. The previous image shows the same but for 'epoxy', but I can't edit the alt text for some reason.

This is the most satisfying way of playing the game, I think. If you end up with only one letter to get and several possibilities, it becomes chance and annoying, and if you get it right with some lucky guesses you don't feel like you earnt it, whereas this way you feel happy that you worked it out.

I also saw Lesley Jeffries talking about doing it in other languages, and noting that her guess distribution was much more spread, presumably because her vocabulary is not a large in those languages and so she is likely to need more goes to get it right than the average speaker of that language would (and she noted that she is relying on phonotactics, which is those rules of the language that I mentioned earlier).

Monday, 14 February 2022

An ilk of that ilk

[Please note that this is a scheduled post, and I am taking part in the ongoing industrial action by my union, UCU.]

You should always learn something at a pub quiz, and I did recently: I learnt what ilk means, as in of that ilk. The question asked us about a phrase that is used generically to mean 'of that type' and specifically in Scottish English and Scots to mean 'of the same name or place'. I had no idea and was coming up with all sorts of nonsense like autochthonous. But of that ilk it was, and once we knew, it was so obvious! An example from the OED of it being used in this way is Wemyss of that ilk, meaning Wemyss of Wemyss.

Reading the OED entry is really interesting because it was used to mean family or class, and you can see how that's related to the meaning above. But its origin is in a pronoun, it seems, which has come down to Modern English as each or which (Scots is descended from the same predecessor as Modern English is) and that meant same or alike. You could use it like that for a while, as in the OED's example from 1648, During this ilk time....

It still seems to be pretty widely used in informal contexts today, as a quick twitter search turns up plenty of examples. You'll be pleased to know, I'm sure, that there's also the occasional sighting of the eggcorn version of that elk.

Monday, 13 September 2021

To the kitchen!

I walked past a board advertising a new housing development and it boasted that the new flats would have integrated appliances to kitchen. This, I felt, is a nice example of Estate Agentese, a minority variety of English with a few unusual distinctive features. This variety seems to be acquired in adulthood, rather than as a native variety, when one becomes an estate agent and acquires the norms of the community. As such, it's likely to exhibit a lot of variation.

It's characterised by unnecessary verbosity, but at the same time greater levels of omission than are usual in standard English. So here, we have a redundant specification that the appliances are in the kitchen (where else would they be in a small flat without a utility room?), and a bare definite noun with no determiner (to kitchen rather than to the kitchen), which is also a feature of headlinese and other reduced written registers. We might also find high register vocabulary: they talk of aspect and premises and being well-appointed.

It was the choice of preposition that caught my eye here, though. This is a curious feature of this variety. Which preposition you use in any given sentence is notoriously difficult. You can rationalise all you like about the meaning of the word, but it is still a bit random at times. It's totally normal, for instance, to mention the dangerous creature either to your left or on your left. However, in standard English, you would expect a description of where the appliances are to be in the kitchen, not to the kitchen, which we would associate more with direction of movement. This might be a generalisation of its use in phrases like to the rear of the house. Or maybe it's an extension of the verb form of integrate, where you might integrate the appliances into the kitchen.

If you're a speaker of this variety of English, do chip in with your thoughts!

Tuesday, 19 November 2019

The universal taxi

I was listening to the radio this morning and they were talking about linguistics. I feel very conflicted about this because I love hearing real, proper linguistics on the radio! It's so rare! but the linguist in question has expressed anti-trans attitudes in the recent past and so I can't call myself a fan. But there we go; they at least weren't discussing such issues so they didn't express them in the course of this conversation.

They were talking about demonstratives, this and that. It was a nice discussion, lots of information, and fun facts about proximal and distal demonstratives (this here vs that there) and how these change in context, and also how many languages have a three-way system with a medium-distance and a far-distance one.

And finally they talked about how pronouns are universal: all languages have them. This is indeed super cool, and basically indicates that they are a Very Important and Useful language feature. Any language that didn't have them for some reason would likely innovate them pretty sharpish, and home signs include them, for instance.

Then someone wrote in to say that 'Interestingly, one word that is universal is 'taxi''. This is interesting, but it's not the same thing. Taxi is 'universal' because it's been borrowed into lots of languages. I'd actually be surprised if it's truly universal in the sense that every single language has this word. There are presumably a few where the concept hasn't been needed and lots more where in fact there's a different word meaning 'private car and driver hired for single journeys', or whatever it is that taxi means. A quick google looks like in quite a lot of languages familiar to English speakers, the word for 'taxi' is something like taxi, which does give an impression of universality from this vantage point. And the 'universal' part is the specific form of the word ('taxi'), with a certain meaning.

Pronouns are universal in a different way. The words are not the same in every language (me, your, we, etc). They haven't been borrowed. There is a lot of historical relatedness, to be sure; it's no coincidence that we looks like German wir. But the very fact that there are pronouns is what's universal. This is fundamentally more interesting than that a particular word has been borrowed a lot. (By the way, here is a study that says that the most universal word is 'huh'.)

Monday, 21 March 2016

Mandarins and oranges, tortoises and turtles, rolls and sandwiches

Recently, a story appeared in the news about some plastic-wrapped peeled mandarins for sale in Whole Foods. Whole Foods swiftly removed them and said 'our mistake'.

Here's the tweet that the BBC story used in its report:

Nathalie uses the term 'oranges' to refer to these fruits, which the story refers to as 'mandarins'. In my own native dialect, orange refers to something different from mandarin as well, with oranges being bigger, harder to peel, full of pips and generally a nuisance to eat. Clementines and satsumas are smaller but similar tasting, easier to peel and a much more pleasant experience. Mandarins are something I hardly ever eat, but they have a sharper, almost sour taste which is quite nice but very different again.

Many of the dialect differences I've experienced come from the time when we moved from Shrewsbury to Newcastle when I was 11, and this is one of them, although I don't think it's really a regional difference: I think that it just emerged through mixing with a different peer group. Plenty of my friends did call all these orange citrus fruits oranges, and I assimilated (though I still do make the distinction myself).

This kind of variation in the semantic coverage of a term is one that often causes great debate. A surprising one was tortoise/turtle. To my mind, it's easy: tortoises live on land and turtles live in the water. Americans (I thought) simply call all of them turtles. It turns out that not only is my classification of chelonians not quite accurate, neither is my classification of Americans (they vary! who knew?). I'm yet to work this one out fully, but it sparked a full-on twitter row last time I tried.

The most bitterly-fought battle is probably the one over what different kinds of bread should be called (buns, rolls, baps, etc.) but a related one is what counts as a sandwich. An effect of moving south a couple of years ago was that I would sometimes order a bacon sandwich in a takeaway place and get bacon between two slices of bread. Now stay with me, this is complicated. At home, or in a place where I'm sitting down to order a bacon sandwich, I expect this. But in a takeaway place, I expect it to come in a bun (bap, roll, whatever). In places round here, it seems that sandwich is more restricted in meaning, and covers only those made with sliced bread. You can have a roll, but you have to ask for it specifically. A bacon roll is a taxonomic sister of a bacon sandwich, not a hyponym of it (in other words, bacon rolls and sandwiches are two different examples of bacon-in-bread items, rather than a bacon roll being a sub-type of bacon sandwich).

Wednesday, 6 January 2016

Ne translatez pas les languages

A friend posted this photo on facebook the other day:

A number of highly intelligent people then missed the point of the joke and began to comment on how bad the translations are. I think they're pretty good imitations of the languages in question, with one big error: the 'German' looks (unmistakably) more like Dutch in the second half of the sentence.

These are not intended to be translations, of course, or rather they're deliberately not accurate. They're just meant to amuse the English-speaking audience by including funny words to compound the humour of the warning in English ('avoid pouring on crotch area'). After all, we love nothing more than when a foreign language does something in a funny way (cf. Welsh popty-ping for 'microwave' or German Handy for 'mobile phone') so it's nice to imagine that these might be for real.

Let's begin with the French. The grammar is fine, as far as I know: you'd make an imperative in French with the negative and the 2nd person plural inflection, just as it's done there. The phrase dans l'area seems OK to me, too. The vocabulary used just isn't French, that's all. The verb for 'pour' should, I think (Google Translate helped) be verser so you'd have ne versez pas. I don't know if you'd also need a pronoun in there (don't pour it) or not. And, of course, no French person ever says ooh-la-la, but it's stereotypically French and referring to ones' crotch that way goes nicely with the French reputation for romance.

Next, the 'German'. This one caused a bit more controversy in the comment thread, because it very obviously isn't German. My German is less good than my French, but I'm pretty certain the word order here is wrong as well as the vocabulary and morphology. I think you would say literally 'drop you not...' rather than 'not drop...', which is what we appear to have here: nein is German for 'no'. Then the verb is obviously just English again, with droppen instead of pourez. I think they've simply selected words that have a combination of letters that resemble the language in question (so French has a word pour, for instance, but it doesn't look very Germanic).

Next, though, I think we've got a nice case of representing an accent in words that look like (or indeed exist in) the language. So ze haut kaffe does not mean 'the hot coffee' in German (haut apparently means 'skin', for instance, and the German for 'coffee' is in fact kaffee, which they could have used instead), but it looks German-ish and sounds like someone saying the hot coffee in (some kind of representation of) a German accent. It reminds me of this a bit. Notice also that the 'German' has an overt object with article, while the French has nothing at all ('don't drop _ on the crotch area' vs 'don't drop the hot coffee on the crotch area').

Then it most definitely switches to Dutch-looking words, if we hadn't already. Dutch word order is a little bit more familiar to English speakers as well, I think, although I know even less Dutch than I do German. Anyway, here we've got a dead giveaway for Dutch: that word oont. It just has a Dutch-like feel to it, though I don't know why (double 'o'? final 't'?). And then finally the lovely phrase, ze knakkers. Again, this is definitely Germanic and in fact Google Translate does give 'knackers' as the translation of this as a German word (it gives 'frankfurter cherry' if you tell it knakkers is Dutch!).

Good work, coffee-cup-humour-producing-person!

Friday, 13 November 2015

Professional toilet paper

Normally, ‘professional’ quality means better quality. Artists’ paints, for instance, come in ‘student’ and ‘professional’ grades, and the professional ones are made with real pigment instead of synthesised stuff and are correspondingly more expensive for the ones made of precious things. A professional bricklayer will do the job better than some bloke who does it in his spare time (in theory, anyway). A professional musician plays music for a living and can be assumed to be pretty good at it.

Olympic athletes are not professionals, though: they’re amateurs. It’s in the rules. If you ‘turn pro’ in boxing you can’t compete in the Olympics any more. Here, ‘professional’ means ‘does it for money’.

And the toilet paper they use in my workplace is ‘professional quality’, which in this context means ‘not the good stuff that you buy for yourself’.

I was going to photograph the actual packaging but it's been thrown away, so here's The Professionals instead.

Sunday, 13 September 2015

Gezellig

The Morris group that I dance with did some workshops for some Dutch children this week. One of the girls said that the evening was gezellig - and she asked me did we have this word? Well, we don't, of course. I tried to kind of mutate it into English and got this far:

-ig is an adjective suffix, meaning -y or equivalent.
ge- is a kind of verbal past tense thing, I believe, so this is an adjective formed from a verb

But then I got stuck (it also turns out I was wrong about the ge- bit anyway).

Just from the way it sounded, I suggested 'cosy', even though it didn't seem right in context. I looked it up when I got home and 'cosy' is one of the things it can mean, but the internet also tells me that this is an 'untranslatable' word.

Untranslatable words, it seems to me, come in two or three flavours. There's one kind where a language happens to have a word for a very specific concept. This is not untranslatable; it's just that language X encodes something in one single word that language Y does with a phrase. See, for example, German schadenfreude or Japanese origami (I have no idea how much Germans or Japanese people actually use these words). In this case, as with many others, the way we get round not having a word for this concept is to just borrow it. We also do this with foreign things like food (risotto, wasabi, pak choi, tea...).

There's also words where the translation isn't exact, although there's a bunch of words with similar meanings. See the Language Log entry on 'accountability', for instance. Prepositions are also a problem - they never seem to translate quite right from one language to another, partly because they don't have 'meaning', as such, but rather a grammatical function. These must be annoying for translators and make learning languages a little bit harder/more interesting, but we can learn what the nuances are.

Then there's the kind that seem somehow exotic because they refer to some concept that we hadn't thought about before. These have a great appeal on the internet. I suspect this is because they tend to refer to highly emotional states of mind. Nostalgia would be a good example of this in English, and saudade in Portuguese. They are often claimed to say something about the temperament of the nation that uses that word, so Portuguese or Brazilians are typically melancholic or nostalgic. We know the fallacy of attributing a characteristic to a whole nation, but nevertheless we like to do it because it helps us to label people.

Gezellig is said on Wikipedia to 'encompass the heart of Dutch culture', so it's a good example of one of these 'untranslatable words'. Wiktionary says it means 'companionable, having company with a pleasant, friendly atmosphere, cosy atmosphere or an upbeat feeling about the surroundings'.

It also says it comes from gezel, which means 'companion'. So much for my etymological analysis.

Wednesday, 26 August 2015

Esperanto 2: Warning, contains meat

Esperanto has certain suffixes for various grammatical purposes, and others that add some extra meaning. One of the latter is -aĵ, which you add to the name of an animal in order to get the word for its meat. Some examples:

Chicken (the bird) - koko, chicken (meat) - kokaĵo

Cow/bull - bovo, beef - bovaĵo

One of the sentences I had to translate was kokaĵo estas viando, which means 'chicken is meat'. Now obviously the word for 'chicken' in that sentence has the 'meat' suffix already in it, so there's a certain redundancy here. It's a bit like saying chicken meat is meat in English. (Incidentally, I don't know if any other language has a suffix specifically for 'meat', and I don't know if it can be extended to fruits, for instance, as in the flesh of a peach, which I'm sure does exist in other languages.)

I was thinking about this redundancy and its counterparts in English. We don't have exactly the same thing, of course, as our words for meat are either the same (chicken, fish) or a different word entirely (beef, pork). So I suppose what we have is a kind of semantic redundancy: 'meat' is part of the meaning of beef. In other words, beef is a hyponym of meat. But someone might not know that beef is a meat (say they were learning English and you were explaining what the word meant, for instance). That wouldn't happen with Esperanto because the meaning is right there in the word if you know what the parts mean. It's 'compositional'.

That said, people are not always that conscious of the grammatical parts of words, especially if the word is common. It's pretty usual for me to discover that many of my second and third year students can't correctly identify clauses as past or present tense, for instance. (Sorry students, if you're reading, but it's true.) They know as a native speaker what it means, but it's subconscious knowledge.

And we have comparable redundancy in English. Imagine if you said I've been hurt in the past. Well, I've been hurt is past tense so in the past isn't necessary. It is possible that it might remove the 'immediate past' meaning that we would normally understand from the perfect tense if it's uttered out of the blue, but in context it is definitely redundant and still perfectly fine to say. Similarly, a little duckling doesn't normally mean a duckling that is particularly small compared to other ducklings, and the -ling tells us it's little anyway.

I might need to find a fluent Esperantist to give me some 'native' speaker judgements on whether the sentence I had to translate has the 'explaining the meaning of the word' interpretation or not.

Incidentally, Esperanto is literally the only language that uses the character ĵ, which means it's not on my computer's keyboard and is hard to type and that's annoying.

Thursday, 29 May 2014

Never smoker

This was tweeted to Doctor Christian:

.@DoctorChristian Never smoker on BBCR2 says she used 0 nic flavoured ecigs to help with sweet cravings. Lost 1+ stone in 4 weeks. Thoughts?
— Mark Shaw (@F3zzer) May 29, 2014

I was intrigued by the use of 'never smoker' - it fills a lacuna where we need a word to designate 'someone who not only doesn't smoke now (=non-smoker) but never has done'. It's not usually necessary, which is probably why it's not very common, but here it was required, and 'never smoker' does the trick. Sounds a little odd to me, but it apparently has a specific meaning (this is the Wiktionary link but it's the US Center for Disease Control's definition). It's a pretty loose use of the word 'never' if you ask me (someone who has literally never smoked a cigarette), but whatever.

Wednesday, 14 May 2014

Gwynne's at it again

The odious Neville Gwynne is at it again, publishing books. This time, he's written a Latin book. He's so pompous, my immediate instinct is to disagree with anything he says, so when I saw an advert for it in the paper that said learning Latin would improve your English, I refuted it loudly and firmly to anyone who would listen.

I love Latin, and I think everyone should learn it. A friend who was subjected to the refutation pointed out to me several ways in which learning Latin can improve a person, and he actually mentioned things that most people never think of, such as scientific analysis (I think he said this, anyway - he said biology, so I suppose he may have meant that you'd understand binomial classifications better, which is true, but it would also help you with doing anything that requires careful, logical, rigorous analysis). This friend also agreed with Gwynne that Latin would improve your English, however, and until today I thought that I heartily disagreed with this point of view.

Latin grammar is sometimes held up to be 'better' than English, or alternatively the basis of English grammar. Although Latin grammar is a beautiful thing, it is not better and nor is it the root of our language. Gwynne says that Latin is the source of 'well over half' of English. This is sheer nonsense. I think he must be counting words, because I can accept that half of English vocabulary comes from Latin - we did borrow a lot when we were conquered by the Romans and then the French (I say 'we' - I've no idea whether I'm one of the conquerors or the conquered, or even if there's any way to tell at this distance). But even then, if you count tokens rather than types, it's nowhere near half. What that means is, rather than count the number of Latin-based words in the dictionary, you count a word each time it appears in use rather than only once. More common words tend to be Germanic rather than Latin, so the number of Latin-based words in any text is not likely to be half. Take my first paragraph: based on my instincts regarding the words' origins, only 10 of the 61 words are Latin-based. Hardly half. And then, words are so far from being the whole of language it's simply preposterous to say such a thing when our grammar is basically not Latin in any way whatsoever. So this is why I disagreed with the notion that knowing Latin improves one's English: a) There aren't really any similarities and b) it assumes that some people speak 'bad' English, and as a linguist I have to be a little bit forceful about making sure people know that there is no such thing (only inappropriate styles).

So, anyway, today I went on good old Amazon's 'look inside' thingy. And whaddaya know, I agreed with a lot of it (except that part about the influence of Latin on English). Let's look at what he says are the benefits of learning Latin:

'To know the source of any word is to understand it better'. Well, in some ways this is not true. Take a word like 'foliage'. Does it help you to know that it comes from the French for 'leaf'? I'm not sure it adds much. What about a word that's changed a lot over time, like 'nice'? It doesn't really help you use it any more accurately if you know that it comes from the Latin word for 'ignorant, unaware', although is is interesting and telling people this kind of this in detail has the nice side effect of getting rid of unwanted company at parties. And what about 'enormity', which is so misused, if you believe the pedants? Well, it has the meaning of a terrible crime, but is supposedly used wrongly by people who don't know any better to mean 'hugeness'. Do you know where it comes from? The Latin word for 'hugeness'. Clearly, we cannot base modern usage on etymology. But on the other hand, it might make your vocabulary more nuanced if you know the origin of some words. Gwynne gives the example 'radically', which comes from the Latin radix, 'root'. 'Radically' therefore means 'from the roots', which doesn't help me use it any better but might help me to think of it when I need a work that means that.
Translation from English to Latin requires you to reorganise the sentence, for which you need to understand how the sentence was put together. This is absolutely true and I have no quibble with it - it is good to understand how the mechanics of grammar works, and it does help you to write better sentences.
You must revise your translations thoroughly so they read well. Also true. This might make your writing more elegant, as you will be practising writing elegant translations and editing prose. However, the Latin is merely a means of doing this, and is not essential to the process. You might just as well translate any language, or even simply rewrite English passages.
The meaning of the Latin words and phrases we use in English will be 'very much clearer than if you were to rely solely on what a dictionary says as to their meaning'. Possibly. We don't use them all literally, so it won't always help, but I imagine it helps to remember what they mean if you recognise them, rather than simply having to memorise them. For instance, there's a mnemonic to get 'eg' and 'ie' the right way round, but I can never remember the mnemonic. I do know, however, that 'eg' stands for exempli gratia 'for example' and 'ie' stands for id est 'that is', so I get them right. (Side note: never follow 'eg' examples with 'etc' if you're writing an essay for me.)

One benefit I'd add which Gwynne doesn't mention is spelling. I think if you know the root, you are more likely to spell certain words right. For instance, 'separate' is commonly misspelt 'seperate', but it comes from the Latin parare. Since I learnt that, I've never spelt it wrong.

So, unexpectedly, I'm broadly in agreement with Gwynne for once. Everyone should learn Latin right away.

We can't finish on such a positive note though. People will think I'm going soft. Here, have some criticism: Gwynne makes a number of mistakes regarding linguistic facts. Even without looking for them, I spotted two glaring inaccuracies, both, I think, a result of paraphrasing Wikipedia without understanding it properly and therefore introducing error. Here's the first:

Latin is the direct ancestor of, between them, the five so-called Romance languages (Italian, French, Spanish, Portuguese and Romanian) of the largest European language group, and of both of the official South American languages (Spanish and Portuguese).

Even ignoring the fact that it's odd to list Spanish and Portuguese twice (we could give him the benefit of the doubt as there are many differences between the European and South American varieties), this is a strange sentence. Those five are the Romance languages with the most speakers, but he implies that they comprise an exhaustive list of the Romance languages, which is not true. There are lots more. He also implies a contrast between 'the largest European language group' and the South American languages, but the language group he refers to is presumably Indo-European, and this is a classification based on relatedness, not geography, so Spanish and Portuguese are part of it regardless of where they are spoken. And perhaps the biggest inaccuracy of all here is his use of the phrase 'both of the official South American languages', which can only mean that there are two official South American languages, and those two are Spanish and Portuguese. This is wildly wrong: English, Dutch and French are all official languages in South American countries, and oh yes, there are lots and lots of indigenous languages, which Gwynne, in the grand tradition of the superior white man, overlooks. A brief glance at Wikipedia lists Quechua, Aymara and Tupi Guarani in Bolivia, Guarani in Paraguay, more than 60 indigenous languages with official status in Colombia, and so on. None of this is relevant to Gwynne's point, but getting the facts so wrong doesn't fill the reader with confidence about his linguistic expertise.

Here's the second, in a footnote:

The language that Latin replaced [in Britain] was the Celtic language.

No. It wasn't. There is no such thing, and there wasn't when Latin was around either. There are several Celtic languages, and although they all derive from a common ancestor, that was a heck of a long time before Roman Britain. I mean, really, the fact that I'm only giving Wikipedia references here tells you how easy it would have been to check a few of these claims.

[Update: I noticed that later in the book, he says that 'no modern language comes close to approaching Latin in difficulty'. I can't even be bothered to explain why this is idiotic, so I'll just invite readers to share their contenders for 'a language more difficult than Latin' (for an English speaker, presumably, as difficulty of learning varies depending on your native language).]

Thursday, 24 October 2013

Ket... carrion or sweeties?

On Twitter today, Richard Osman bemoaned the fact that British English lacks a word that covers both 'chocolate' and 'sweets'. I remembered this northeast word that I think has that meaning: ket. It's not in my personal vocabulary, and possibly not so common these days, but it does seem to mean this, according to Wiktionary:

But just look at the etymology! It's from the word meaning 'flesh' in Icelandic/Swedish/Danish, and in other parts of northern England it means 'carrion'. Eew, and also how?

Wiktionary has two theories: either it comes via the term 'sweetmeats' (I don't know if they mean in the sense 'sweet treats' or 'testicles') or it could be that the word was used to put kids off eating too many sweets!

Saturday, 13 April 2013

Dawkins on 'not English'

You know, Richard Dawkins is a clever chap, I think he writes well and I've enjoyed many of his books. But sometimes he says such stupid things, it's like he wants people to misunderstand him.

Take this tweet:

'Grade' as in '7th grade' is not part of the English language

I mean, honestly, what a thing to say. He's making a perfectly good point, namely that if you're writing for an international audience it's more useful to state the age of a child in years rather than using a US-specific system. (In fact, because in the US children can skip or repeat a year in school, it is sometimes relevant to refer to the stage in their education that the child has reached. However, for the majority of children, this is not the case.) He clarified this, along with the statement that for a US audience, he has no problem with referring to grades. None of this is controversial.

So why on earth did he put it in such a stupid, guaranteed-to-cause-a-row way? Does he like getting into fights so much that even innocuous opinions must be stated in a controversial manner?

The bit I'm referring to, if it's not blindingly obvious, is the bit where he says that 'grade' is 'not part of the English language'. It... but... it... well, it obviously is. How can it not be? American English, or the collection of dialects of English spoken in that part of the world, are most definitely English. And if 'grade' (in this sense) is part of at least some of those dialects, then it is part of English.

It's possible that Dawkins was using some rather unusual definition of English. If you take any English speaker, let's say me, then we can agree that I speak English, I hope. And if you take another English speaker, let's say Richard Dawkins, then we can also agree that he speaks English. And so on and so on for any English speaker in the world. But our two Englishes are not precisely the same. In the case of me and Dawk, they're not far removed from each other. We're both speakers of British English dialects, although his is a bit more old-fashioned than mine. But if you compared Dawk and a teenage speaker of English from, say, California or South India or Grenada or Kiribati, then you're going to find a few more differences between the dialects. One might, then, wish to say that something is only 'English' if it is found in all dialects of English. This is a silly way to define English because it leaves you with about three words and a smattering of grammar and no sounds with which to pronounce them (I exaggerate, of course, but not much).

You can say the opposite, and say that something counts as 'English' if it is found in at least one English-speaker's dialect. But then you run into trouble defining English, as it can get a bit circular. You could also say that there is no such thing as English, merely a collection of idiolects (personal dialects) which converge with each other to a greater or lesser extent, some of which are mutually intelligible and some of which are not. Some people do say this, I think, but it's a somewhat extreme position to take. It's largely a philosophical problem, and for practical purposes one usually needs to define English in some partially arbitrary way. On any of those definitions, American English still counts as English and Dawkins is being a numpty.

Friday, 22 March 2013

When a word becomes unacceptable

There's been a spate of ableist language in my life this week (that is, language that discriminates against people with disabilities).

No word for wool in Tagalog

I was listening to The unbelievable truth the other day, which is a BBC radio programme in which contestants have to deliver a lecture made up of mostly lies, but with some truths slipped in. The trick is to hide the truths so that your opponents think they are lies (and meanwhile they're trying to spot the truths). It's quite enlightening.

In the episode I was listening to, one 'fact' offered was that there is no word for 'wool' in Tagalog (spoken in the Philippines). The other contestants all twitched, wondering if they should go for their buzzers. No word for wool! It could so easily be true! There's no way of knowing, of course, if you don't speak Tagalog. Do they have sheep in the Philippines? If not, they might not have wool, and therefore no word for it! Or maybe they just don't have one word that just means wool, they might have a word that encompasses wool and cotton! Or maybe it's a phrase, not a word, like sheep's hair!

No, it was a lie, I'm afraid. Tagalog for 'wool' is lana.

Saturday, 11 February 2012

Correlations in linguistic data

Geoff Pullum at Language Log recently reluctantly (because it's not yet published) commented on a paper by a Yale economist, Keith Chen. In this paper, Chen argues that if your language has a grammatical future tense marker, you are less likely to save money, live healthily etc because the future seems like some other time, not to be worried about now. If your language uses present tense to refer to the future, you treat is an extension of the present and you'll be much more sensible about it. Pullum is guardedly sceptical about these claims, for reasons which you can read about yourself.

He is also sceptical about this kind of claim (made based on correlations found in large amounts of data) because

I also worry that it is too easy to find correlations of this kind, and we don't have any idea just how easy until a concerted effort has been made to show that the spurious ones are not supportable. For example, if we took "has (vs. does not have) pharyngeal consonants", or "uses (vs. does not use) close front rounded vowels", would we find correlations there too? I have some colleagues here at the University of Edinburgh, within Simon Kirby's research group, who have run some informal experiments on the data Chen uses to see if dredging up spurious correlations of this kind is easy or hard, and so far they have found it jaw-droppingly easy.

He doesn't comment further on these experiments, but it reminded me of the talk Martin Haspelmath gave when our university's linguistics research centre opened a few years ago, and he told us about the World Atlas of Language Structures (WALS). After telling us what a wonderful, useful tool it is (and it is, I've found it invaluable), he ended on a note of caution. It's easy, he said, to find false correlations. For example, you can show a map of languages which have a different word for hand and arm or use the same word for both. That map shows that the languages that don't distinguish are, broadly speaking, around the warmer areas of the globe (yellow dots) and the ones that do distinguish are in colder areas (red dots):

(Map from WALS, feature 129A)

Now might one not hypothesise, asked Haspelmath, that this language fact is due to the climate? In colder countries the distinction is important, in that one wears items of clothing that cover only the hands (gloves), or sleeves that come down to the wrist. In warm countries, sleeves are not so long and gloves are not worn, so a separate word for hands never becomes necessary. A far-fetched example, but a lesson in not putting too much faith in correlations.

Sunday, 1 January 2012

Swedish words of the year

Why Swedish? Well, why not. They're so much better than our boring old candidates for this honour, anyway.

One I like is åsiktstaliban. It means 'opinion taliban'. Perhaps related is attitydinkontinens, which if you read it carefully says 'attitude incontinence', or an inability to keep your opinions to yourself.

As proof (if we needed it) that knowing the etymology doesn't affect your use of a word, they have juholtare, which describes 'a situation when someone says something hastily and then has to take it back', and terja, which is to manipulate a photograph. Both handy words, and both come from the names of people you've probably never heard of if you're not Swedish. Håkan Juholt is the leader of the Social Democratic party and apparently keeps making rash and incorrect statements. Terje Hellesø is a nature photographer who confessed to having messed about with his photos (which won an award). This also makes him unusual in having a word based on his first name rather than his surname.

appa, which means to solve a problem by using an app, is another good word which we simply don't have in English and I think might be usefully borrowed. We could just use app as a verb, as in:

What's the weather going to be like tomorrow? Hold on, I'll just app it.

Some of the words are frankly bizarre. Take Säpojogg ('Säpo jog'),

a term describing a run or race emulating how security service agents jogged in suits and ties behind a vehicle, such as at Crown Princess Victoria's wedding.

Why would you need such a word? Why would there be a race like that?

And what about mobildagis ('mobile phone daycare')? It means

a place for the collective storage of multiple mobile phones

What? What is such a place? Like the place in the house where everyone's phone lives? These crazy Swedes.

Thursday, 22 September 2011

Words I have learnt

I always learn some new vocabulary when I'm at LAGB. Sometimes from the language tutorial (every year there's an in-depth look at an unfamiliar language), sometimes just from examples in the papers. Last year, for instance, I learnt that Swahili for lion is simba, presumably where the lovable yet headstrong (and somewhat dim) character in The Lion King gets his name.

This year I learnt that Turkish for man is adam. It's also the same in Hebrew, I think, as the name of Adam (the Biblical one) is supposed to be from Hebrew. People seem to disagree over what it means though - it also means red, like the earth (which Adam was supposedly made from). However, this new word is not surprising, only mildly interesting.

I also learnt the Tundra Nenets word for bread, which is na'an. Tundra Nenets is a Samoyedic Uralic language spoken in Russia. We get the word naan from Urdu (or Persian, according to the OED), which is a whole lot different from Nenets. I don't think there's been a lot of contact between northern Russian peoples and Urdu speakers, so what the heck is going on here? Could it be just coincidence?

Friday, 26 August 2011

Vocabulary quiz

I found this vocabulary quiz. It's by Merriam-Webster Online and quite good fun, though a bit too easy. Points for getting it right, and more points for getting it right quicker.