Skeptophilia (skep-to-fil-i-a) (n.) - the love of logical thought, skepticism, and thinking critically. Being an exploration of the applications of skeptical thinking to the world at large, with periodic excursions into linguistics, music, politics, cryptozoology, and why people keep seeing the face of Jesus on grilled cheese sandwiches.

Wednesday, March 20, 2013

The voices of the ancestors

One of the (many) reasons I love science is that as a process, it opens up avenues to knowledge that were previously thought closed.  Couple that with the vast improvements in technological tools, and you have a powerful combination for exploring realms that once were not considered "science" at all.

Take, for example, historical linguistics, the discipline that studies the languages spoken by our ancestors.  It is a particular fascination of mine -- in fact, it is the field I studied for my MA.  (Yes, I know I teach biology.  It's a long story.)  I can attest to the fact that it's a hard enough subject, even when you have a plethora of written records to work with, as I did (my thesis was on the effects of the Viking invasions on Old English and Old Gaelic).  When records are scanty, or worse yet, non-existent, the whole thing turns into a highly frustrating, and highly speculative, topic.

This is the field of "reconstructive linguistics" -- trying to infer the characteristics of the languages spoken by our distant ancestors, for the majority of which we have not a single written remnant.  If you look in an etymological dictionary, you will see a number of words that have starred ancestral root words, such as *tark, an inferred verb stem from Proto-Indo-European that means "to twist."  (A descendant word that has survived until today is torque.)  The asterisk means that the word is "unattested" -- i.e., there's no proof that this is what the word actually was, in the original ancestor language, because there are no written records of Proto-Indo-European.  And therein, of course, lies the problem.  Because it's an unattested word, no one can ever be sure if it's correct.  The inferred word comes not from any hard evidence, but from the application of one of the most fundamental rules of linguistics: Phonetic changes are regular.


As a quick illustration of this -- and believe me, I could write about this stuff all day -- we have Grimm's Law, which describes how stops in Proto-Indo-European became fricatives in Germanic languages, but they remained stops in other surviving (non-Germanic) Indo-European languages.  One example is the shift of /p/ to /f/, which is why we have foot (English), fod (Norwegian), Fuss (German), fótur (Icelandic), and so on, but poús (Greek), pes (Latin), peda (Lithuanian), etc.  These sorts of sound correspondences allowed us to make guesses about what the original word sounded like.

Note the use of the past tense in the previous sentence.  Because now linguists have a tool that will take a bit of the guesswork out of reconstructive linguistics -- and shows promise to bringing it into the realm of a true science.

An article in Science World Report, entitled "Ancient Languages Reconstructed by Linguistic Computer Program, a team of researchers at the University of British Columbia and the University of California - Berkeley has developed software that uses inputted lexicons to reconstruct languages.  (Read their original paper here.)  This tool automates a process that once took huge amounts of painstaking research, and even this first version has had tremendous success -- the first run of the program, using data from 637 Austronesian languages currently spoken in Asia and the South Pacific, generated proto-Austronesian roots for which 85% matched the roots derived by experts in that language family to within one phoneme or fewer.

What I'm curious about, of course, is how good the software is at deriving root words for which we do have written records.  In other words, checking its results against something other than the unverifiable speculation that historical linguists were already doing.  For example, would the software be able to take lexicons from Spanish, French, Portuguese, Italian, Catalan, Provençal, and so on, and correctly infer the Latin stems?  To me, that would be the true test; to see what the shortcomings were, you have to have something real to check its results against.  (And for any historical linguists in my readership whose hackles got raised by my use of the words "unverifiable speculation" -- c'mon, you have to admit that what you're doing does have the inherent upside of being unfalsifiable.  If you think a particular Proto-Indo-European root reconstructs as *lug and your colleague thinks it's *wuk, you can argue about it till next Sunday and you still will never be certain who's right, as there are very few Proto-Indo-Europeans around these days who could tell you for sure.)

But even so, it's a pretty nifty new tool.  Just the idea that we can make some guesses at what language our ancestors spoke six-thousand-odd years ago is stunning, and the fact that someone has written software that reduces the effort to accomplish this is cool enough to set my little Language Nerd Heart fluttering.  It is nice to see reconstructive linguistics using the tools of science, thus bringing together two of my favorite things.  Why, exactly, I find it so exciting to know that *swey may have meant "to whistle" to someone six millennia ago, I'm not sure.  But the fact that we now have a computer program that can check our guesses is pretty damn cool.

1 comment:

  1. I think there's a Proto-Indo-European living next door to me, actually, but his vocabulary might not be large enough to be useful unless you're studying the terminology of football or roof construction.

    ReplyDelete