linguistlaura: phonotactic constraints

Monday, 11 April 2022

At last, someone has written about Wordle!

I've held off on blogging about Wordle, because everyone else did it, and because I didn't have anything particular to say. People tend to assume that if you're a linguist, you like word games, but I don't think that's any more true for us than for normal people. Some of us do, others don't. I happen to love crosswords (because there is a quiz or a puzzle element) and dislike Scrabble (because I'm not good at anagrams). I do, as it happens, love Wordle. I love logic puzzles like sudoku, and this is basically just a logic puzzle with an added constraint.

There is, or used to be, a board game called Mastermind which was a pure logic version of Wordle. (If you don't know what Wordle is, by the way, I don't know where you've been. It's what we all spent the early part of 2022 doing.) There, the thing you had to guess was the sequence of coloured pegs. There were only a few colours, and only a sequence of four, so much fewer than the 26 letters and five slots that Wordle involves. And you needed like ten goes to get it, rather than the six that you get with Wordle. The rules were the same: you got told if you'd got one right and in the right place, or right but in the wrong place, or wrong. You weren't told which one, though, which did make it harder in that respect (otherwise it would have been incredibly easy). I loved this game and I'm not sure why I never had my own copy (maybe no one else liked playing it with me, or maybe I never mentioned that I liked it?) but I played it when I was at other people's houses.

So yes, I do love Wordle, because of the logic puzzle aspect. The word part of it does add something interesting for me, though. I like the constraint it puts on the possible answers. It's not the case, as in Mastermind, that every combination is equally possible. Some just aren't, or are much less likely, and that's due to the rules of either languages in general, or English in particular. So an example of a language-in-general thing is that there are going to be some vowels in the word, and some consonants. An example of an English-in-particular thing is that the last letter is probably an 'e' or a consonant, because we don't have so many words that end in 'a', 'i', 'o' or 'u' (though we do have some, so it's not absolutely ruled out). Another English-in-particular thing is that if you know you've got an 'h' in there somewhere, it's possibly the first letter but if it's not, you've likely also got a 't' for 'th' or a 'g' for 'gh' in there. Not always; ahead would have stumped me in that case.

Screenshot of my Wordle stats showing a normal distribution with most words taking me four guesses to get.

I've been paying attention to how I solve them, and I usually get the answer on the fourth go. I imagine this is true for most people, as we'd expect a 'normal distribution' with very few right on the first or second go (that's a lucky guess) and few taking six (that's some bad luck or a word that has many very similar to it).

I'm not sharing any new insights on how to solve them – I just do the same as you all do and rule out the most common letters first until I can see what it's likely to be. But what interests me is how quickly you get to the point where it can only realistically be one word. This is normally where I am by guess four.

Here are a couple of recent ones, where the answers were epoxy and lowly. Just coincidence that they both end in a 'y', I think. I vary my starting words but always try to include some common letters. Sometimes I just use things I see nearby like the dogs' names. In both these cases, by the time I'd had three guesses I didn't have many right, but I had ruled out nearly all the possibilities, and there was only one possible word that I could think of in each case that could fit what I knew.

Screenshot of Wordle with the word 'lowly', correct on the fourth go with few correct letters on the previous three. The previous image shows the same but for 'epoxy', but I can't edit the alt text for some reason.

This is the most satisfying way of playing the game, I think. If you end up with only one letter to get and several possibilities, it becomes chance and annoying, and if you get it right with some lucky guesses you don't feel like you earnt it, whereas this way you feel happy that you worked it out.

I also saw Lesley Jeffries talking about doing it in other languages, and noting that her guess distribution was much more spread, presumably because her vocabulary is not a large in those languages and so she is likely to need more goes to get it right than the average speaker of that language would (and she noted that she is relying on phonotactics, which is those rules of the language that I mentioned earlier).

Monday, 23 April 2012

Baboons can read!

Or rather, baboons can learn to recognise words and then learn the frequency of particular letters in them to predict if a new item is a word or not.

What they did was let baboons play with a computer in return for treats (so they could choose not to play at all if they didn't want to, or play a lot if they were greedy buggers.). They were presented with a whole load of words, all four letters long, all with a vowel and three consonants (as far as I can tell - I don't have access to the original paper). Some of the words were real, and some were non-words like virt or dran. The non-words were, I think, possible words (that is, they conform to the rules of the phonology of the language (phonotactic constraints). I don't know which language - the study was done in France but the phonotactic constraints of French and English are not all that different). The baboons had to press a button if they thought it was a real word, and another if they thought it wasn't. I don't think any of them did any better than chance at first, but they were given a treat if they guessed right until they had seen the words enough times to remember them and guess right 80% of the time.

This is Dan the Baboon, class swot. He remembered 300 words:

Dan the Baboon

Note that this isn't reading - that means identifying a word as a word (connecting it with a sound and/or meaning). This is just object recognition, which I think is the authors' point - that reading isn't a linguistic skill, but rather object recognition and then we have other linguistic processes going on to make the next step, which baboons don't have. Fortunately, otherwise they'd start writing us notes and making demands for better treats or more fun computer games.

Once the baboons knew a load of words, the researchers gave them new ones to look at. Now, they could guess when a word was a real word at slightly better than chance, or 60% of the time. Apparently, they were recognising letter combinations like 'th' and assuming that new words with that letter combination were also words. (Note that Language Log has comprehensively dissected and basically trashed this claim, but I don't have access to the paper so I'm going to have to take the conclusions as they say.)

So the baboons seem to be able to break the words down into their letters, or at least smaller parts. I don't know if this is a new finding. The authors say it's like how you can recognise a table you've never seen before, because you know how a table basically looks, with legs and a top and the relations between the parts.

I don't think these baboons are so clever though. I want to know how they'd do with a non-word that had the letter combination 'th' - presumably they'd wrongly think that it was a word.

They would also (I'm fairly sure) do badly if they were asked to identify possible vs impossible non-words. This is what I mentioned before: impossible words violate the phonotactic constraints of the particular language. So blod is a possible (but not real) English word, but bkod is not. The baboons would not know this (of course, as they aren't associating the words with sounds and they don't speak English (they're French baboons) and they don't have any concept of phonotactics anyway). I don't know if they could learn to recognise them either, though - I think it might be too complex and the rules just too arbitrary. Even for Dan.