linguistlaura: WALS

Showing posts with label WALS. Show all posts

Thursday, 6 December 2018

Economist finds WALS; plays with features; gets publication

Update: Since I wrote this, there's been more discussion on Twitter, with some people thinking that linguists just don't want other academics stepping on their toes. That's definitely not the case! Other suggestions are that linguists have been trained to have a knee-jerk reaction against anything Sapir-Whorfian. That probably is true for a lot of people (including me, actually, which is why I try to be aware of that), but I hope it's clear below that that isn't my problem with this article.
There has also been an open letter started and signed by many linguists asking for the paper to be retracted. I think this is a mistake; there is nothing fraudulent or ethically wrong about the paper. It's just not a very good paper, in a not very good journal. Asking for it to be withdrawn sounds a bit like censorship to me. The review process is where crappy work should be stopped, and that process failed here, so it's worth bringing that to the journal's attention, but seeing as this publicity will have brought them many more readers I doubt they're too bothered about fixing it for the future.

Anyway, here's the post as I wrote it before I thought about all these things.

===
Linguists love it when economists do linguistics. Linguist Twitter was a super fun place to be when this article came out. It argues that languages that can leave out pronoun subjects (like Spanish), are spoken by people that have lower levels of education due to their more collectivist culture. I know right?

No need for me to point out all the ways in which it is wrong and incorrect and foolish; Joe McVeigh spent a happy while with a whisky or two doing that.

The thing is, it's actually not unreasonable to write articles like this. There are a lot of very credible papers that show that our attitudes are easily influenced by factors like what our language encodes. There's ones about noun gender affecting how elegant or sturdy we think bridges are, directionality affecting our ability to orient ourselves in our environment, and so on. And psychological experiments seem to show that it only takes a bit of a reminder that we're female to make us do worse on maths tests, etc. Some of these studies maybe are not as robust as they look, and I don't know about the reliability of the psychology ones, but the point is they are by linguists and psychologists and they look credible. So why wouldn't you write an article showing how some facet of language influences your behaviour?

Well, perhaps if you're not only not a linguist, but you also don't know anything about language and don't ask anyone who does, and you don't do it very well.

I don't know the economics dataset that the author uses, but I do know the linguistics dataset very well. It's the World Atlas of Language Structures, which I love very much. This author, Feldmann, uses it because it "provides the most authoritative information on a large number of languages". It does indeed cover a large number of languages, but there is no reason to say it is "the most authoritative". It's compiled from published grammars. Many of those are careful, detailed, accurate descriptions of the language; others are a hundred years old, written by someone who didn't necessarily have much linguistic training. You have to be careful and check those sources out. His only reason for saying it's "authoritative" is that an economics reference says it is, using that same word, and then he cites them with a glaring error in a Spanish example ("yo ablo").

Another thing is that it doesn't control for languages being closely related unless you ask it to, and to do that you need the CD-ROM version, not the online free version, and there's not indication of the method the author used so we don't know if he did that. He just says that he looked at 103 languages. 711 have this information in the free version; I can no longer use my CD-ROM copy as I don't have a CD drive in my computer any more :( so I don't know what subset he took. For example, if you take all the languages spoken in Northern Europe, it's not so surprising if most of them require pronominal subjects, because they're all related. It's better to take a genus of language to avoid skewing your results. Maybe he did this; we don't know.

His citation is poor; his linguistic sources are old or eccentric or missing or simply odd choices. They look like the citations of someone who hasn't read the linguistics literature or asked anyone who has. He doesn't give any sources at all for his claims about collectivist cultures not wanting girls to be educated, which is a big claim and one that really needs backing up.

Go ahead; make claims about culture based on linguistics. They don't tend to stand up to much scrutiny, but maybe yours will. But don't exoticise those people because of it, and don't base those claims on superficial data with no referencing or linguistic research.

Thursday, 7 June 2012

Talking to real people

Today I gave a talk to some real people. Actual non-academic real people. Normally, we only ever have to explain our research to people who already have a good specialist (or at least basic) knowledge of our topic and the background and framework that underpin it all. Everyone understands the technical terms you use, and you can make certain assumptions and everyone will go along with them.

But it does one good sometimes to step outside the warm bath of academia and see what impression you can make on people who don't have a rigorous grounding in the niceties of Minimalist syntax (for example). How are you going to explain what the Final-Over-Final Constraint is to an audience if they don't know what a head is? Or a VP? Or what 'final' means?

I was participating in the Explore programme run by the Centre for Lifelong Learning. They ran a training day for postgraduates at universities in the region aimed at helping us to present our research to a public audience. This is something that I'm interested in, because linguistics is so hopelessly unknown and misunderstood in the popular media, and yet everyone is interested in it. Programmes and newspaper articles about language go down an absolute storm, but the only linguist anyone's heard of is David Crystal. Yes, they've heard of Chomsky too, but not for his linguistics. Your average person knows more about how the Large Hadron Collider works than their own language.

The interest that people have in language was illustrated today, after my talk, when the audience had a chance to ask questions and make comments. They all had something to say, offering interesting facts about other languages, or making observations about the way language is used now or might be at some other time. From their feedback it seems that they found my topic interesting. Wow. Some readers of this blog will be wondering how on earth I made my dry, dull, theoretical syntax PhD into a talk that didn't send them to sleep. It is of course all due to my captivating speaking persona.

But seriously, these are people that learn for fun, so they're willing to put a bit of effort in (although they don't want to feel like they're back at school). However, I knew that for an audience with zero assumed knowledge of syntax, I had to lose a lot of the technicalities but not lose the quality. From the practice run we did at the training day, I found that some people panic at the sight of anything vaguely technical-looking. They pretty much switched off when they saw trees and abbreviations, even though I did try to explain them clearly. For that reason, I did away completely with tree diagrams and replaced them with an analogy of a mobile. I've used that analogy for years, after pinching it from a lecturer, and it works well. Then I entirely removed labels like VP and TP and just did without them. It's convenient for linguists to use them but it turns out you don't need them. Then I filled the talk with cats. People like cats.

I followed the principle we'd learnt at the training day, that rather than start with the general background, it's good to dive straight in to the interesting fact and show some kind of visual (or otherwise memorable) prop. I showed maps from WALS illustrating different patterns of word orders, and how question particles don't look the way they're meant to. It's not the most fascinating graphic in the world, but it's better than a lot of text. I also tried to end on something that they could engage with, and compared my analysis to spurious claims about languages lacking some characteristic or other. I thought that was something they would likely have read about, and have thoughts about, and they did - that sparked some nice discussion. I wish I'd thought to use the Hopi example that came up in the questions, though.

Marcus du Sautoy looking all mathsy

I think that in the middle, some people still got a bit lost. Maybe I didn't explain everything as fully as I should have, or maybe I tried to cover too much and could have sloughed off a bit more syntax. But overall, I was pleased with how it went. I'll work on those things for next time someone's fool enough to put me in front of humans again, and this time next year I'll be the Marcus du Sautoy of linguistics.

Just nobody, not ever, suggest I host this gameshow (I would derail it with anti-prescriptivism):

Saturday, 11 February 2012

Correlations in linguistic data

Geoff Pullum at Language Log recently reluctantly (because it's not yet published) commented on a paper by a Yale economist, Keith Chen. In this paper, Chen argues that if your language has a grammatical future tense marker, you are less likely to save money, live healthily etc because the future seems like some other time, not to be worried about now. If your language uses present tense to refer to the future, you treat is an extension of the present and you'll be much more sensible about it. Pullum is guardedly sceptical about these claims, for reasons which you can read about yourself.

He is also sceptical about this kind of claim (made based on correlations found in large amounts of data) because

I also worry that it is too easy to find correlations of this kind, and we don't have any idea just how easy until a concerted effort has been made to show that the spurious ones are not supportable. For example, if we took "has (vs. does not have) pharyngeal consonants", or "uses (vs. does not use) close front rounded vowels", would we find correlations there too? I have some colleagues here at the University of Edinburgh, within Simon Kirby's research group, who have run some informal experiments on the data Chen uses to see if dredging up spurious correlations of this kind is easy or hard, and so far they have found it jaw-droppingly easy.

He doesn't comment further on these experiments, but it reminded me of the talk Martin Haspelmath gave when our university's linguistics research centre opened a few years ago, and he told us about the World Atlas of Language Structures (WALS). After telling us what a wonderful, useful tool it is (and it is, I've found it invaluable), he ended on a note of caution. It's easy, he said, to find false correlations. For example, you can show a map of languages which have a different word for hand and arm or use the same word for both. That map shows that the languages that don't distinguish are, broadly speaking, around the warmer areas of the globe (yellow dots) and the ones that do distinguish are in colder areas (red dots):

(Map from WALS, feature 129A)

Now might one not hypothesise, asked Haspelmath, that this language fact is due to the climate? In colder countries the distinction is important, in that one wears items of clothing that cover only the hands (gloves), or sleeves that come down to the wrist. In warm countries, sleeves are not so long and gloves are not worn, so a separate word for hands never becomes necessary. A far-fetched example, but a lesson in not putting too much faith in correlations.