It's a fundamental tenet of linguistics -- that language is defined as "arbitrary symbolic communication." Arbitrary because there is no special connection between the sound of a word and its meaning, with the exception of the handful of words that are onomatopoeic (such as boom, buzz, splash, and splat). Otherwise, the phonemes that make up the word for a concept would be expected to having nothing to do with the concept itself, and therefore would vary randomly from language to language (the word bird is no more fundamentally birdy than the French word oiseau is fundamentally oiseauesque).
That idea may have to be revised. Damián E. Blasi (of the University of Zurich), Søren Wichmann (of the University of Leiden), Harald Hammarström and Peter F. Stadler (of the Max Planck Institute), and Morten H. Christiansen (of Cornell University) did an exhaustive statistical study, using dozens of basic vocabulary words representing 62% of the world's six thousand languages and 85% of its linguistic lineages and language families. And what they found was that there are some striking patterns when you look at the phonemes represented in a variety of linguistic morphemes, patterns that held true even with completely unrelated languages. Here are a few of the correspondences they found:
- The word for ‘nose’ is likely to include the sounds ‘neh’ or the ‘oo’ sound, as in ‘ooze.’
- The word for ‘tongue’ is likely to have ‘l’ or ‘u.’
- ‘Leaf’ is likely to include the sounds ‘b,’ ‘p’ or ‘l.’
- ‘Sand’ will probably use the sound ‘s.’
- The words for ‘red’ and ‘round’ often appear with ‘r.’
- The word for ‘small’ often contains the sound ‘i.’
- The word for ‘I’ is unlikely to include sounds involving u, p, b, t, s, r and l.
- ‘You’ is unlikely to include sounds involving u, o, p, t, d, q, s, r and l.
One possibility is that these correspondences are actually not arbitrary at all, but are leftovers from (extremely) ancient history -- fossils of the earliest spoken language, which all of today's languages, however distantly related, descend from. The authors write:
From a historical perspective, it has been suggested that sound–meaning associations might be evolutionarily preserved features of spoken language, potentially hindering regular sound change. Furthermore, it has been claimed that widespread sound–meaning associations might be vestiges of one or more large-scale prehistoric protolanguages. Tellingly, some of the signals found here feature prominently in reconstructed “global etymologies” that have been used for deep phylogeny inference. If signals are inherited from an ancestral language spoken in remote prehistory, we might expect them to be distributed similarly to inherited, cognate words; that is, their distribution should to a large extent be congruent with the nodes defining their linguistic phylogeny.But this point remains to be tested. And there's an argument against it; if these similarities come from common ancestry, you'd expect not only the sounds, but their positions in words, to have been conserved (such as in the English/German cognate pair laugh and lachen). In fact, that is not the case. The sounds are similar, but their positions in the word show no discernible pattern. The authors write:
We have demonstrated that a substantial proportion of words in the basic vocabulary are biased to carry or to avoid specific sound segments, both across continents and linguistic lineages. Given that our analyses suggest that phylogenetic persistence or areal dispersal are unlikely to explain the widespread presence of these signals, we are left with the alternative that the signals are due to factors common to our species, such as sound symbolism, iconicity, communicative pressures, or synesthesia... [A]lthough it is possible that the presence of signals in some families are symptomatic of a particularly pervasive cognate set, this is not the usual case. Hence, the explanation for the observed prevalence of sound–meaning associations across the world has to be found elsewhere.Which I think is both astonishing and fascinating. What possible reason could there be that the English word tree is composed of the three phonemes it contains? The arbitrariness of the sound/meaning relationship seemed so obvious to me when I first learned about it that I didn't even stop to question how we know it's true.
Generally a dangerous position for a skeptic to be in.
I hope that the research on this topic is moving forward, because it certainly would be cool to find out what's actually going on here. I'll have to keep my eyes out for any follow-ups. But now I'm going to go get a cup of coffee, which I think we can all agree is a nice, warm, comforting-sounding word.
The Skeptophilia book recommendation of the week is a must-read for anyone interested in languages -- The Last Speakers by linguist K. David Harrison. Harrison set himself a task to visit places where they speak endangered languages, such as small communities in Siberia, the Outback of Australia, and Central America (where he met a pair of elderly gentlemen who are the last two speakers of an indigenous language -- but they have hated each other for years and neither will say a word to the other).
It's a fascinating, and often elegiac, tribute to the world's linguistic diversity, and tells us a lot about how our mental representation of the world is connected to the language we speak. Brilliant reading from start to finish.