Well, here’s the article in the Economist, and the scientific paper.

The short version is, a professor at the University of Auckland in New Zealand has taken about 500 modern languages and applied the theory of the founder effect to them, positing that the most recently diverged languages will have fewer phonemes than older languages. The finding is that language originated somewhere in Africa, big surprise. This sounds an awful lot like another attempt at “mass comparison” to me, so I thought I’d ask what a real internet linguist thinks of the issue. Do phonemes even exhibit the founder effect? I thought that was mainly something that happened with entire words, not fundamentals of words.


Unfortunately the Science article is subscriber-only, so I’m relying on the Economist summary.  And from that, it looks barmy to me.  But for all I know, the real article addresses my concerns.

Take this bit:

It has been known for a while that the less widely spoken a language is, the fewer the phonemes it has.

How firmly do we ‘know’ this?  The highest number of phonemes are found in some Khoisan languages, with very small numbers of speakers; the Caucasian languages are also notoriously consonant-happy.  Languages vary so much that sampling is a real issue.  Atkinson is taking 504 languages; that’s about 1/10 of the total.  Is that a random sample?  Almost certainly not; as I’ve found in researching my numbers list, getting information about all languages is a very difficult project.  So very likely he’s using the most readily available sources— which are going to be biased toward the most spoken languages.  That’s pretty much guaranteed to screw up looking at the # phonemes /#  speakers correlation, as several thousand less-studied, low-#-speaker languages are left out.

Here’s a paper that discusses the supposed correlation more closely; note that the researcher uses the UPSID database, which will be subject to the most-studied problem; also that the actual scatterplots are very loose; the correlation may be significant but it’s obviously not the most significant factor.

The Economist article continues:

So, as groups of people ventured ever farther from their African homeland, their phonemic repertoires should have dwindled, just as their genetic ones did.

But that doesn’t follow at all.  It’s certainly not the case that populations out of Africa are smaller than those within it, nor is it guaranteed that African population groups were stable.  (In fact it’s known that they’re not: at least half of Niger-Congo languages are Bantu, and the Bantu languages spread into most of southern Africa in historical times— nearly erasing whatever linguistic diversity existed there from prehistoric times.)

OK, forget population size then.  It looks like all he did was plot # phonemes vs. distance from Africa anyway. But his thesis depends on the idea that number of phonemes decreases over time.  First, how could this be tested by looking at contemporary languages anyway?  He has to be asserting that somehow African languages preserve their phonemes more… why would they?  Do the phonemes get lost in transit?

Do languages really lose phonemes over time?  To evaluate that we have to look at language over time, not over geographic areas.  As just one data point, Latin had about 24 phonemes; French has about 38.  Old English had about 32, modern English around 39.  I’d be really surprised if there were actually a strong tendency to lose phonemes over time.  We’ve been speaking for 50,000 years; if there were such a tendency we should all have languages with minimal inventories, like Rotokas.  I would expect there to be countervailing tendencies that restore the number of phonemes.